Re: Grafana Solr Support

2016-03-20 Thread Joel Bernstein
This ticket might support Solr's JDBC driver in Solr 6.

https://github.com/grafana/grafana/issues/1542

Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, Mar 20, 2016 at 10:22 PM, Jon Drews  wrote:

> It would be really neat if Grafana supported Solr. They recently added
> Elasticsearch support and I'd hope some of that code could be reused for
> Solr support.
>
> If anyone else is interested in this, please voice your support on the
> Feature Request here:
> https://github.com/grafana/grafana/issues/4422
>
> Here's the code for the Elasicsearch datasource for Grafana:
>
> https://github.com/grafana/grafana/tree/master/public/app/plugins/datasource/elasticsearch
>
> On a side note, if anyone knows of any other Solr visualization tools
> please let me know. I've been playing around with Banana/Silk/Kibana. I've
> heard of Hue but haven't tried it out.
>
> Thanks!
>


Re: 答复: solr server can't started.

2016-03-20 Thread Erick Erickson
Why do you think Solr is hung? There are a bunch of reasons a Solr instance
may be down, simply because it's not accepting requests or not recovering
may indicate it never started correctly in the first place.

I'd first suggest you look at the Solr logs on the machines in question,
they
may give you some information that's useful, especially if you have a
problem with your config files (schema.xml etc.).

Beyond that, though, I'd suggest you contact Cloudera's user list as it's
likely
those folks will be more familiar with how to troubleshoot this using the
tools provided by CDH.

Best,
Erick

On Sun, Mar 20, 2016 at 6:47 PM, 王淇霖  wrote:

>
> Hi all,
> We have deployed solrcloud with CDH5.4. there are 4 solr server in this
> cluster.
> Now 2 solr server is down, and can’t start two, solr thread is hang.
> Please help to check out. Thanks.
>
>
> "Signal Dispatcher" #5 daemon prio=9 os_prio=0 tid=0x7fc9b81e3800
> nid=0x7c35 runnable [0x]
>java.lang.Thread.State: RUNNABLE
>
> "Surrogate Locker Thread (Concurrent GC)" #4 daemon prio=9 os_prio=0
> tid=0x7fc9b81e2800 nid=0x7c34 waiting on condition [0x]
>java.lang.Thread.State: RUNNABLE
>
> "Finalizer" #3 daemon prio=8 os_prio=0 tid=0x7fc9b81aa800 nid=0x7c33
> in Object.wait() [0x7fc1204f3000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
> - locked <0x7fc1c39b6608> (a java.lang.ref.ReferenceQueue$Lock)
> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
> at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
>
> "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x7fc9b81a8800
> nid=0x7c32 in Object.wait() [0x7fc1205f4000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:502)
> at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:157)
> - locked <0x7fc1c3a068a0> (a java.lang.ref.Reference$Lock)
>
> "main" #1 prio=5 os_prio=0 tid=0x7fc9b800a800 nid=0x7c18 waiting on
> condition [0x7fc9beed8000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x7fc1c39eff60> (a
> java.util.concurrent.FutureTask)
> at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
> at java.util.concurrent.FutureTask.get(FutureTask.java:191)
> at
> java.util.concurrent.AbstractExecutorService.invokeAll(AbstractExecutorService.java:244)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:266)
> at
> org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:193)
> at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:140)
> at
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:298)
> at
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:119)
> at
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4076)
> - locked <0x7fc1c39f12a8> (a java.util.HashMap)
> at
> org.apache.catalina.core.StandardContext.start(StandardContext.java:4730)
> - locked <0x7fc1c39b0830> (a
> org.apache.catalina.core.StandardContext)
> at
> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:802)
> at
> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
> at
> org.apache.catalina.core.StandardHost.addChild(StandardHost.java:583)
> at
> org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1080)
> at
> org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:1003)
> at
> org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:507)
> at
> org.apache.catalina.startup.HostConfig.start(HostConfig.java:1322)
> at
> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:325)
> at
> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
> at
> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1068)
> - locked <0x7fc1c3a30200> (a
> org.apache.catalina.core.StandardHost)
> at
> org.apache.catalina.core.StandardHost.start(StandardHost.java:822)
> - locked <0x7fc1c3a30200> (a
> org.apache.catalina.core.StandardHost)
> at
> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1060)
> - locked <0x7fc1c3a33a80> (a
> org.apache.catalina.core.StandardEngine)
> at
> org.apache.catalina.core.StandardEngine.start(StandardEngin

答复: solr server can't started.

2016-03-20 Thread 王淇霖

Hi all,
We have deployed solrcloud with CDH5.4. there are 4 solr server in this cluster.
Now 2 solr server is down, and can’t start two, solr thread is hang.
Please help to check out. Thanks.


"Signal Dispatcher" #5 daemon prio=9 os_prio=0 tid=0x7fc9b81e3800 
nid=0x7c35 runnable [0x]
   java.lang.Thread.State: RUNNABLE

"Surrogate Locker Thread (Concurrent GC)" #4 daemon prio=9 os_prio=0 
tid=0x7fc9b81e2800 nid=0x7c34 waiting on condition [0x]
   java.lang.Thread.State: RUNNABLE

"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x7fc9b81aa800 nid=0x7c33 in 
Object.wait() [0x7fc1204f3000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
- locked <0x7fc1c39b6608> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x7fc9b81a8800 
nid=0x7c32 in Object.wait() [0x7fc1205f4000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:157)
- locked <0x7fc1c3a068a0> (a java.lang.ref.Reference$Lock)

"main" #1 prio=5 os_prio=0 tid=0x7fc9b800a800 nid=0x7c18 waiting on 
condition [0x7fc9beed8000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x7fc1c39eff60> (a 
java.util.concurrent.FutureTask)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at 
java.util.concurrent.AbstractExecutorService.invokeAll(AbstractExecutorService.java:244)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:266)
at 
org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:193)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:140)
at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:298)
at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:119)
at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4076)
- locked <0x7fc1c39f12a8> (a java.util.HashMap)
at 
org.apache.catalina.core.StandardContext.start(StandardContext.java:4730)
- locked <0x7fc1c39b0830> (a 
org.apache.catalina.core.StandardContext)
at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:802)
at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:583)
at 
org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1080)
at 
org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:1003)
at 
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:507)
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1322)
at 
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:325)
at 
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1068)
- locked <0x7fc1c3a30200> (a org.apache.catalina.core.StandardHost)
at org.apache.catalina.core.StandardHost.start(StandardHost.java:822)
- locked <0x7fc1c3a30200> (a org.apache.catalina.core.StandardHost)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1060)
- locked <0x7fc1c3a33a80> (a 
org.apache.catalina.core.StandardEngine)
at 
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463)
at 
org.apache.catalina.core.StandardService.start(StandardService.java:525)
- locked <0x7fc1c3a33a80> (a 
org.apache.catalina.core.StandardEngine)
at 
org.apache.catalina.core.StandardServer.start(StandardServer.java:759)
- locked <0x7fc1c39ebd28> (a [Lorg.apache.catalina.Service;)
at org.apache.catalina.startup.Catalina.start(Catalina.java:595)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
 

Re: How fast indexing?

2016-03-20 Thread Erick Erickson
My guess is that Solr isn't doing much and DIH is
taking much more time to get the data from
the database than Solr is using to index it.
DIH can have some tricky bits to get it to be
fast. Among them are:

1> caching certain table results
2> batching the rows returned from the DB
3> ??? I'll leave that part to DIH experts, I
tend to prefer using a JDBC driver and using
SolrJ...

Best,
Erick

On Sun, Mar 20, 2016 at 5:11 PM, Amit Jha  wrote:

> Hi All,
>
> In my case I am using DIH to index the data and Query is having 2 join
> statements. To index 70K documents it is taking 3-4Hours. Document size
> would be around 10-20KB. DB is MSSQL and using solr4.2.10 in cloud mode.
>
> Rgds
> AJ
>
> > On 21-Mar-2016, at 05:23, Erick Erickson 
> wrote:
> >
> > In my experience, a majority of the time the bottleneck is in
> > the data acquisition, not the Solr indexing per-se. Take a look
> > at the CPU utilization on Solr, if it's not running very heavy,
> > then you need to look upstream.
> >
> > You haven't told us anything about _how_ you're indexing.
> > SolrJ? DIH? Something from some other party? so it's hard to
> > say much useful.
> >
> > You might review:
> >
> > http://wiki.apache.org/solr/UsingMailingLists
> >
> > Best,
> > Erick
> >
> > On Sun, Mar 20, 2016 at 3:31 PM, Nick Vasilyev  >
> > wrote:
> >
> >> There can be a lot of factors, can you provide a bit of additional
> >> information to get started?
> >>
> >> - How many items are you indexing per second?
> >> - How does the indexing process look like?
> >> - How large is each item?
> >> - What hardware are you using?
> >> - How is your Solr set up? JVM memory, collection layout, etc...
> >> - What is your current commit frequency?
> >> - What is the query volume while you are indexing?
> >>
> >> On Sun, Mar 20, 2016 at 6:25 PM, fabigol 
> >> wrote:
> >>
> >>> hi,
> >>> i have a soir project where i do the indexing since a database postgre.
> >>> the indexation is very long.
> >>> How i can accelerate it.
> >>> I can modify autocommit in the file solrconfig.xml?
> >>> someone has some ideas. I looking on google but I found little
> >>> help me please
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> View this message in context:
> >>> http://lucene.472066.n3.nabble.com/How-fast-indexing-tp4264994.html
> >>> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
>


Grafana Solr Support

2016-03-20 Thread Jon Drews
It would be really neat if Grafana supported Solr. They recently added
Elasticsearch support and I'd hope some of that code could be reused for
Solr support.

If anyone else is interested in this, please voice your support on the
Feature Request here:
https://github.com/grafana/grafana/issues/4422

Here's the code for the Elasicsearch datasource for Grafana:
https://github.com/grafana/grafana/tree/master/public/app/plugins/datasource/elasticsearch

On a side note, if anyone knows of any other Solr visualization tools
please let me know. I've been playing around with Banana/Silk/Kibana. I've
heard of Hue but haven't tried it out.

Thanks!


Indexing using CSV

2016-03-20 Thread Jay Potharaju
Hi,
I am trying to index some data using csv files. The data contains
description column, which can include quotes, comma, LF/CR & other special
characters.

I have it working but run into an issue with the following error

line=5,can't read line: 5 values={NO LINES AVAILABLE}.

What is the best way to debug this issue and secondly how do other people
handle indexing data using csv data.

-- 
Thanks
Jay


Re: How fast indexing?

2016-03-20 Thread Amit Jha
Hi All,

In my case I am using DIH to index the data and Query is having 2 join 
statements. To index 70K documents it is taking 3-4Hours. Document size would 
be around 10-20KB. DB is MSSQL and using solr4.2.10 in cloud mode.

Rgds
AJ

> On 21-Mar-2016, at 05:23, Erick Erickson  wrote:
> 
> In my experience, a majority of the time the bottleneck is in
> the data acquisition, not the Solr indexing per-se. Take a look
> at the CPU utilization on Solr, if it's not running very heavy,
> then you need to look upstream.
> 
> You haven't told us anything about _how_ you're indexing.
> SolrJ? DIH? Something from some other party? so it's hard to
> say much useful.
> 
> You might review:
> 
> http://wiki.apache.org/solr/UsingMailingLists
> 
> Best,
> Erick
> 
> On Sun, Mar 20, 2016 at 3:31 PM, Nick Vasilyev 
> wrote:
> 
>> There can be a lot of factors, can you provide a bit of additional
>> information to get started?
>> 
>> - How many items are you indexing per second?
>> - How does the indexing process look like?
>> - How large is each item?
>> - What hardware are you using?
>> - How is your Solr set up? JVM memory, collection layout, etc...
>> - What is your current commit frequency?
>> - What is the query volume while you are indexing?
>> 
>> On Sun, Mar 20, 2016 at 6:25 PM, fabigol 
>> wrote:
>> 
>>> hi,
>>> i have a soir project where i do the indexing since a database postgre.
>>> the indexation is very long.
>>> How i can accelerate it.
>>> I can modify autocommit in the file solrconfig.xml?
>>> someone has some ideas. I looking on google but I found little
>>> help me please
>>> 
>>> 
>>> 
>>> 
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/How-fast-indexing-tp4264994.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 


Re: How fast indexing?

2016-03-20 Thread Erick Erickson
In my experience, a majority of the time the bottleneck is in
the data acquisition, not the Solr indexing per-se. Take a look
at the CPU utilization on Solr, if it's not running very heavy,
then you need to look upstream.

You haven't told us anything about _how_ you're indexing.
SolrJ? DIH? Something from some other party? so it's hard to
say much useful.

You might review:

http://wiki.apache.org/solr/UsingMailingLists

Best,
Erick

On Sun, Mar 20, 2016 at 3:31 PM, Nick Vasilyev 
wrote:

> There can be a lot of factors, can you provide a bit of additional
> information to get started?
>
> - How many items are you indexing per second?
> - How does the indexing process look like?
> - How large is each item?
> - What hardware are you using?
> - How is your Solr set up? JVM memory, collection layout, etc...
> - What is your current commit frequency?
> - What is the query volume while you are indexing?
>
> On Sun, Mar 20, 2016 at 6:25 PM, fabigol 
> wrote:
>
> > hi,
> > i have a soir project where i do the indexing since a database postgre.
> > the indexation is very long.
> > How i can accelerate it.
> > I can modify autocommit in the file solrconfig.xml?
> > someone has some ideas. I looking on google but I found little
> > help me please
> >
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/How-fast-indexing-tp4264994.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


Re: How fast indexing?

2016-03-20 Thread Nick Vasilyev
There can be a lot of factors, can you provide a bit of additional
information to get started?

- How many items are you indexing per second?
- How does the indexing process look like?
- How large is each item?
- What hardware are you using?
- How is your Solr set up? JVM memory, collection layout, etc...
- What is your current commit frequency?
- What is the query volume while you are indexing?

On Sun, Mar 20, 2016 at 6:25 PM, fabigol  wrote:

> hi,
> i have a soir project where i do the indexing since a database postgre.
> the indexation is very long.
> How i can accelerate it.
> I can modify autocommit in the file solrconfig.xml?
> someone has some ideas. I looking on google but I found little
> help me please
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-fast-indexing-tp4264994.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


How fast indexing?

2016-03-20 Thread fabigol
hi,
i have a soir project where i do the indexing since a database postgre.
the indexation is very long.
How i can accelerate it.
I can modify autocommit in the file solrconfig.xml?
someone has some ideas. I looking on google but I found little
help me please




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-fast-indexing-tp4264994.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: service solr status

2016-03-20 Thread Shawn Heisey
On 3/20/2016 7:20 AM, Adel Mohamed Khalifa wrote:
> Why I had this failure?!!!

> Jan 03 08:29:11 ubuntutestdev solr[598]: Java not found, or an error was
> enc
>
> Jan 03 08:29:11 ubuntutestdev solr[598]: A working Java 7 or later is
> requir...!
>
> Jan 03 08:29:11 ubuntutestdev solr[598]: Please install Java or fix
> JAVA_HOM
>
> Jan 03 08:29:11 ubuntutestdev solr[598]: Command that we tried: 'java
> -version'

The problem is stated in the error message you pasted. Java could not be
found on your system, or it produced an error when it was started.

You'll need to install Java 7.  If the "java" executable is not in the
normal system path, you will need to set the JAVA_HOME variable in your
solr.in.sh config script.  We recommend the latest version from Oracle,
which at this time is version 7u80 or 8u74.  OpenJDK is also acceptable
-- again, use the latest 7 or 8 version you can.

Thanks,
Shawn



Re: Shard splitting for immediate performance boost?

2016-03-20 Thread Erick Erickson
Well, I do tend to go on

As Shawn mentioned memory is usually the most
precious resource and splitting to more shards, assuming
they're in separate JVMs and preferably on separate
machines certainly will relieve some of that pressure.

My only caution there is that splitting to more shards may
be masking some kind of underlying configuration issues
by throwing more hardware at the problem. So you _may_
be able to keep the same topology if you uncover those.

That said, depending on how much your hardware costs,
it may not be worth the engineering effort

Best,
Erick

On Sat, Mar 19, 2016 at 12:55 PM, Robert Brown  wrote:

> Thanks Erick,
>
> I have another index with the same infrastructure setup, but only 10m
> docs, and never see these slow-downs, that's why my first instinct was to
> look at creating more shards.
>
> I'll definitely make a point of investigating further tho with all the
> things you and Shawn mentioned, time is unfortunately against me.
>
> Cheers,
> Rob
>
>
>
> On 19/03/16 19:11, Erick Erickson wrote:
>
>> Be _very_ cautious when you're looking at these timings. Random
>> spikes are often due to opening a new searcher (assuming
>> you're indexing as you query) and are eminently tunable by
>> autowarming. Obviously you can't fire the same query again and again,
>> but if you collect a set of "bad" queries and, say, measure them after
>> Solr has been running for a while (without any indexing going on) and
>> the times are radically better, then autowarming is where I'd look first.
>>
>> Second, what are you measuring? Time until the client displays the
>> results? There's a bunch of other things going on there, it might be
>> network issues. The QTime is a rough measure of how long Solr is
>> taking, although it doesn't include the time spent assembling the return
>> packet.
>>
>> Third, "randomly requesting facets" is something of a red flag. Make
>> really sure the facets are realistic. Fields with high cardinality make
>> solr work harder. For instance, let's say you have a date field with
>> millisecond resolution. I'd bet that faceting on that field is not
>> something
>> you'll ever support. NOTE: I'm talking about just setting
>> facet.field=datefield
>> here. A range facet on the field is totally reasonable. Really, I'm saying
>> to insure that your queries are realistic before jumping into sharding.
>>
>> Fourth, garbage collection (especially "stop the world" GCs) won't be
>> helped
>> by just splitting into shards.
>>
>> And the list goes on and on. Really what both Shawn and I are saying is
>> that you really need to identify _what's_ slowing you down before trying
>> a solution like sharding. And you need to be able to quantify that rather
>> than
>> "well, sometimes when I put stuff in it seems slow" or you'll spend a
>> large
>> amount of time chasing the wrong thing (at least I have).
>>
>> 30M docs per shard is well within a reasonable range, although the
>> complexity of your docs may push that number up or down. You haven't told
>> us much about how much memory you have on your machine, how much
>> RAM you're allocating to Solr and the like so it's hard to say much other
>> than generalities
>>
>> Best,
>> Erick
>>
>> On Sat, Mar 19, 2016 at 10:41 AM, Shawn Heisey 
>> wrote:
>>
>> On 3/19/2016 11:12 AM, Robert Brown wrote:
>>>
 I have an index of 60m docs split across 2 shards (each with a replica).

 When load testing queries (picking random keywords I know exist), and
 randomly requesting facets too, 95% of my responses are under 0.5s.

 However, during some random manual tests, sometimes I see searches
 taking between 1-2 seconds.

 Should I expect a simple shard split to assist with the speed
 immediately?  Even with the 2 new shards still being on the original
 servers?

 Will move them to their own dedicated hosts, but just want to
 understand what I should expect during the process.

>>> Maybe.  It depends on why the responses are slow in the first place.
>>>
>>> If your queries are completely CPU-bound, then splitting into more
>>> shards and either putting those shards on additional machines or taking
>>> advantage of idle CPUs will make performance better.  Note that if your
>>> query rate is extremely high, you should only have one shard replica on
>>> each server -- all your CPU power will be needed for handling query
>>> volume, so none of your CPUs will be idle.
>>>
>>> Most of the time, Solr installations are actually I/O bound, because
>>> there's not enough unused RAM to effectively cache the index.  If this
>>> is what's happening and you don't add memory (which you can do by adding
>>> machines and adding/removing replicas to move them), then you'll make
>>> performance worse by splitting into more shards.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>>
>


Re: service solr status

2016-03-20 Thread Binoy Dalal
Like it says, Java was not found.
So either install Java if it isn't, or upgrade it.
If the version is correct, check that JAVA_HOME variable is set to point to
your Java installation.

On Sun, 20 Mar 2016, 18:50 Adel Mohamed Khalifa, 
wrote:

> Why I had this failure?!!!
>
>
>
> root@ubuntutestdev:/opt# service solr status
>
> ● solr.service - LSB: Controls Apache Solr as a Service
>
>Loaded: loaded (/etc/init.d/solr)
>
>Active: failed (Result: exit-code) since Wed 2029-01-03 08:29:11 EET; 12
> years 9 months left
>
>  Docs: man:systemd-sysv-generator(8)
>
>   Process: 598 ExecStart=/etc/init.d/solr start (code=exited,
> status=1/FAILURE)
>
>
>
> Jan 03 08:29:11 ubuntutestdev solr[598]: Java not found, or an error was
> enc
>
> Jan 03 08:29:11 ubuntutestdev solr[598]: A working Java 7 or later is
> requir...!
>
> Jan 03 08:29:11 ubuntutestdev solr[598]: Please install Java or fix
> JAVA_HOM
>
> Jan 03 08:29:11 ubuntutestdev solr[598]: Command that we tried: 'java
> -version'
>
> Jan 03 08:29:11 ubuntutestdev solr[598]: Active Path:
>
> Jan 03 08:29:11 ubuntutestdev solr[598]:
> /usr/local/bin:/usr/bin:/bin:/usr/l...s
>
> Jan 03 08:29:11 ubuntutestdev systemd[1]: solr.service: Control process
> exit...1
>
> Jan 03 08:29:11 ubuntutestdev systemd[1]: Failed to start LSB: Controls
> Apac
>
> Jan 03 08:29:11 ubuntutestdev systemd[1]: solr.service: Unit entered failed
> 
>
> Jan 03 08:29:11 ubuntutestdev systemd[1]: solr.service: Failed with result
> '
>
> Hint: Some lines were ellipsized, use -l to show in full.
>
>
>
>
>
> Regards,
> Adel
>
>
>
> --
Regards,
Binoy Dalal


Re: publish solr on galsshfish server

2016-03-20 Thread Shawn Heisey
On 3/20/2016 5:46 AM, Adel Mohamed Khalifa wrote:
> Thanks Shawn, 
>
> This " 
> Core.solrResourceBundle.getString("//http://172.16.0.1:8983/SearchCore";)" 
> return the solr search core (( http://server:port/core))

Generally speaking, Solr lives on the "/solr" URL context, so the most
of the time the correct format of a Solr core URL is this:

http://server:port/solr/core

The HttpSolrServer constructor will not throw an exception with an
invalid but correctly formatted URL, though.  The newest versions of
Solr won't even throw an exception if you send complete gibberish for
the URL.  Older versions would throw a MalformedURLException.

As I have said elsewhere on this problem, the constructor does *not*
contact the server.  This only happens when you send a request, and one
of the ways to send a request is with the SolrServer#query method.

The Solr *server* probably won't run on glassfish, and Solr 5.x has no
official support for third-party containers at all.

https://issues.apache.org/jira/browse/SOLR-5109
https://wiki.apache.org/solr/WhyNoWar

SolrJ programs will probably be fine on glassfish, though.
 
Thanks,
Shawn



Re: Send solr to Production

2016-03-20 Thread Shawn Heisey
On 3/20/2016 5:39 AM, Adel Mohamed Khalifa wrote:
> How I need to do or config for sending solr to production.

This is an extremely vague question.  When you say "config", this could
mean the solr config, the core/collection config, the OS config, or
possibly even something else.  Provide a lot more detail, please.

https://wiki.apache.org/solr/UsingMailingLists

There *is* a section of the documentation all about taking Solr to
production:

https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production

If you have already read this and other parts of the documentation, then
please let us know what part you find confusing, or exactly what you
would like to know that is not covered.  The documentation definitely
needs more work, especially for less experienced users.

Thanks,
Shawn



RE: Explain style json? Without using wt=json...

2016-03-20 Thread jimi.hullegard
Forgot to add that we use Solr 4.6.0.

-Original Message-
From: jimi.hulleg...@svensktnaringsliv.se 
[mailto:jimi.hulleg...@svensktnaringsliv.se] 
Sent: Wednesday, March 16, 2016 9:39 PM
To: solr-user@lucene.apache.org
Subject: Explain style json? Without using wt=json...

Hi,

We are using Solrj to query our solr server, and it works great. However, it 
uses the binary format wt=javabin, and now when I'm trying to get better debug 
output, I notice a problem with this. The thing is, I want to include the 
explain data for each search result, by adding "[explain]" as a field for the 
fl parameter. And when using [explain style=nl] combined with wt=json, the 
explain output is proper and valid json. However, when I use something other 
than wt=json, the explain output is not proper json.

Is there any way for the explain segment to be proper, valid json, without 
using wt=json? Because Solrj forces wt=javabin, without any option to change 
it, as far as I can see.

And, the reason I want to explain segment in proper json format, is that I want 
to turn it into a JSONObject, in order to get proper indentation for easier 
reading. Because the regular output doesn't have proper indentation.

Regards
/Jimi


Re: Making managed schema unmutable correctly?

2016-03-20 Thread Alexandre Rafalovitch
Thank you very much Erick,

It does look like I was staring at the Reference Guide for too long
and between the files and APIs and release notes, my mind just played
tricks at 1am :-)

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 17 March 2016 at 04:08, Erick Erickson  wrote:
> I think you're mixing up schema and config? The
> message about not hand-modifying is for
> schema.xml (well, managed-schema). To lock
> it down you need to modify solrconfig.xml...
>
> There shouldn't need to be any need to unload, just
> reload?
>
> And I just skipped the e-mail so maybe I'm way off base.
>
> Best,
> Erick
>
> On Wed, Mar 16, 2016 at 12:14 AM, Alexandre Rafalovitch
>  wrote:
>> So, I am looking at the Solr 5.5 examples with their all-in by-default
>> managed schemas. And I am scratching my head on the workflow users are
>> expected to follow.
>>
>> One example is straight from documentation:
>> "With the above configuration, you can use the Schema API to modify
>> the schema as much as you want, and then later change the value of
>> mutable to false if you wish to "lock" the schema in place and prevent
>> future changes."
>>
>> Which sounds great, except right above the definition in the
>> solrconfig.xml, it says:
>> "Do NOT hand edit the managed schema - external modifications will be
>> ignored and overwritten as a result of schema modification REST API
>> calls."
>>
>> And the Config API does not seem to provide any API to switch that
>> property (schemaFactory.mutable is not an editable property).
>>
>> So, how is one supposed to lock the schema after modifying it? In the
>> default, non-cloud, example!
>>
>> So far, the nearest I get is to unload the core (losing
>> core.properties), manually modify solrconfig.xml in violation of
>> instructions and add the core back. What am I missing?
>>
>> Regards,
>>Alex.
>>
>> 
>> Newsletter and resources for Solr beginners and intermediates:
>> http://www.solr-start.com/


Connection refused: no further information

2016-03-20 Thread manohar
Hi ,

   I am getting error in windows server after starting the zookeeper server
, i entered this command 
"solr start -cloud -p 8983 -s C:\solr\server\solr\node1\solr -z
16.254.6.88:2181" .Then i got this error

Waiting up to 30 to see Solr running on port 8983

ERROR: Solr at http://localhost:8983/solr did not come online within 30
seconds!

and in logs file i am getting this error 
"
java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
"

I dont know why it is coming.
Can you please tell me how setup 2 nodes solr clouds with one zookeeper in
windows server(I am using solr5.4.1 and zookeeper3.4.6)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Default-query-error-Server-refused-connection-tp4010806p4264080.html
Sent from the Solr - User mailing list archive at Nabble.com.


service solr status

2016-03-20 Thread Adel Mohamed Khalifa
Why I had this failure?!!!

 

root@ubuntutestdev:/opt# service solr status

● solr.service - LSB: Controls Apache Solr as a Service

   Loaded: loaded (/etc/init.d/solr)

   Active: failed (Result: exit-code) since Wed 2029-01-03 08:29:11 EET; 12
years 9 months left

 Docs: man:systemd-sysv-generator(8)

  Process: 598 ExecStart=/etc/init.d/solr start (code=exited,
status=1/FAILURE)

 

Jan 03 08:29:11 ubuntutestdev solr[598]: Java not found, or an error was
enc

Jan 03 08:29:11 ubuntutestdev solr[598]: A working Java 7 or later is
requir...!

Jan 03 08:29:11 ubuntutestdev solr[598]: Please install Java or fix
JAVA_HOM

Jan 03 08:29:11 ubuntutestdev solr[598]: Command that we tried: 'java
-version'

Jan 03 08:29:11 ubuntutestdev solr[598]: Active Path:

Jan 03 08:29:11 ubuntutestdev solr[598]:
/usr/local/bin:/usr/bin:/bin:/usr/l...s

Jan 03 08:29:11 ubuntutestdev systemd[1]: solr.service: Control process
exit...1

Jan 03 08:29:11 ubuntutestdev systemd[1]: Failed to start LSB: Controls
Apac

Jan 03 08:29:11 ubuntutestdev systemd[1]: solr.service: Unit entered failed


Jan 03 08:29:11 ubuntutestdev systemd[1]: solr.service: Failed with result
'

Hint: Some lines were ellipsized, use -l to show in full. 

 

 

Regards,
Adel 

 



RE: Solr debug 'explain' values differ from the Solr score

2016-03-20 Thread Rick Sullivan
I still have the problem even without using the phonetic field. 

For example, the following query will result in some exact name matches having 
scores of 4.64, while others get 2.32. All debug info has final values of 4.64.

    &q=( ( (firstName:john~)^0.5 (firstName:john) )^4)

I expect all exact matches to score the same, as the debug response seems to 
indicate they should be. 

I'm not having success reproducing the issue on a small amount of exported data 
indexed using post.jar. The issue still appears when I reduced the data pulled 
by the DIH to only the first 1,000,000 first names, however. Could this be due 
to some indexing issue with the DIH?

Thanks,
-Rick


> Date: Tue, 15 Mar 2016 15:40:18 -0700
> From: hossman_luc...@fucit.org
> To: solr-user@lucene.apache.org
> Subject: RE: Solr debug 'explain' values differ from the Solr score
>
>
> Sounds like a mismatch in the way the BooleanQuery explanation generation
> code is handling situations where there is/isn't a coord factor involved
> in computing the score itself. (the bug is almost certainly in the
> "explain" code, since that is less rigorously tested in most cases, and
> the score itself is probably correct)
>
> I tried to trivially reproduce the symptoms you described using the
> techproducts example and was unable to generate a discrepency using a
> simple boolean query w/a fuzzy clause...
>
> http://localhost:8983/solr/techproducts/query?q=ipod~%20belkin&fl=id,name,score&debug=query&debug=results&debug.explain.structured=true
>
> ...can you distill one of your problematic queries down to a
> shorter/simpler reproducible example, and/or provide us with the field &
> fieldType details for all of the fields used in your example?
>
> (i'm guessing it probably relates to your firstName_phonetic field?)
>
>
>
> : Date: Tue, 15 Mar 2016 13:17:04 -0700
> : From: Rick Sullivan 
> : Reply-To: solr-user@lucene.apache.org
> : To: "solr-user@lucene.apache.org" 
> : Subject: RE: Solr debug 'explain' values differ from the Solr score
> :
> : After some digging and experimentation, here are some more details on the 
> issue I'm seeing.
> :
> :
> : 1. The adjusted documents' scores are always exactly (debug_score/N), where 
> N is the number of OR items in the query.
> :
> : For example, `&q=firstName:gabby~ firstName_phonetic:gabby 
> firstName_tokens:(gabby)` will result in some of the documents with 
> firstName==GABBY receiving a score 1/3 of the score of other GABBY documents, 
> even though the debug explanation shows that they generated the same score.
> :
> :
> : 2. This doesn't appear to be a brand new issue, or an issue with SolrCloud.
> :
> : I've tested the problem using SolrCloud 5.5.0, Solr 5.5.0 (not cloud), and 
> Solr 5.4.1.
> :
> :
> : Anyone have any ideas?
> :
> : Thanks,
> : -Rick
> :
> : From: r...@ricksullivan.net
> : To: solr-user@lucene.apache.org
> : Subject: Solr debug 'explain' values differ from the Solr score
> : Date: Thu, 10 Mar 2016 08:34:30 -0800
> :
> : Hi,
> :
> : I'm seeing behavior in Solr 5.5.0 where the top-level values I see in the 
> debug response don't always correspond with the scores Solr assigns to the 
> matched documents.
> :
> : For example, here is the top-level debug information for two documents 
> matched by a query:
> :
> : 114628: Object
> : description: "sum of:"
> : details: Array[2]
> : match: true
> : value: 20.542768
> :
> : 357547: Object
> : description: "sum of:"
> : details: Array[2]
> : match: true
> : value: 26.517654
> :
> : But they have scores
> :
> : 114628: 20.542767
> : 357547: 13.258826
> :
> : I expect the second document to be the most relevant for my query, and the 
> debug values seem to agree. However, in the final score I receive, that 
> document's score has been adjusted down.
> :
> : The relevant debug response information can be found here: 
> http://apaste.info/mju
> :
> : Does anyone have an idea why the Solr score may differ from the debug value?
> :
> : Thanks,
> : -Rick
>
> -Hoss
> http://www.lucidworks.com/
  

Re: solr index size issue

2016-03-20 Thread Zheng Lin Edwin Yeo
Did you check if your index still contains 500 docs, or is there more?

Regards,
Edwin

On 12 March 2016 at 22:54, Toke Eskildsen  wrote:

> sara hajili  wrote:
> > why solr index size become bigger and bigger without adding  any new doc?
>
> Solr does not change the index unprovoked. It sounds like your external
> document feeding process is still running.
>
> - Toke Eskildsen
>


Actual (specific) RT Search?

2016-03-20 Thread John Smith
Hi,

The purpose of the project is an actual RT Search, not NRT, but with a
specific condition: when an updated document meets a fixed criteria, it
should be filtered out from future results (no reuse of the document).
This criteria is present in the search query but of course doesn't work
for uncommitted documents.

What I wrote is a combination of the following:
- an UpdateRequestProcessor in the update chain storing the document
unique key in a local cache when the condition is met
- a postCommit listener clearing the cache
- a PostFilter collecting documents that aren't found in the cache,
activated in the search query as a fq parameter

Functionally it does the job, however for large indexes the filter takes
a hit. The index that poses problem has 18 mil documents in 13Gb, and
queries return an average of 25,000 docs in results. The VM has 8 cores
and 20Gb RAM, and uses nimble storage (combination of ssd & hd). Without
the code Solr works like a charm. My guess so far is that the filter has
to fetch the unique key for all documents in results, which consumes a
lot of resources.

What would be your advice?
- Could I use the internal document id instead of a field value? This id
would have to be available both in the UpdateRequestProcessor and
PostFilter: is it the case and how can I access it? I suppose the
SolrInputDocument in the request processor doesn't have it yet anyway.
- If I reduce the autoSoftCommit maxDocs value (how far?), would it be
wise (and feasible) to convert the PostFilter into a plain filter query
such as "*:* NOT (id:1 OR id:2)" or something similar? How could I
implement this and how to estimate the filter cost in order for Solr to
execute it at the right position?
- Maybe I took the wrong path altogether?

Thanks in advance
John



RE: publish solr on galsshfish server

2016-03-20 Thread Adel Mohamed Khalifa
Thanks Shawn, 

This " 
Core.solrResourceBundle.getString("//http://172.16.0.1:8983/SearchCore";)" 
return the solr search core (( http://server:port/core))


-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Thursday, March 17, 2016 4:09 PM
To: solr-user@lucene.apache.org
Subject: Re: publish solr on galsshfish server

On 3/17/2016 4:20 AM, Adel Mohamed Khalifa wrote:
> And on servlet I coded :-
> Try{
> SolrServer server = new
> HttpSolrServer(Core.solrResourceBundle.getString("//http://172.16.0.1:8983/S
> earchCore"));
> ModifiableSolrParams params = new ModifiableSolrParams();
> String qqq = new
> String(request.getParameter("qu").getBytes("ISO-8859-1"), "UTF-8");
> params.set("q", new
> String(request.getParameter("qu").getBytes("ISO-8859-1"), "UTF-8"));
> params.set("rows", "3");
> params.set("spellcheck", "on");
> params.set("spellcheck.build", "true");
> params.set("wt", "json");
> params.set("indent", "true");
> 
> QueryResponse res = server.query(params);
> SolrDocumentList results = res.getResults();
> Iterator i = results.iterator();
> request.setAttribute("size", results.size());
> String json = new Gson().toJson(res.getResults());
> out.write(json);
> } catch (SolrServerException ex) {
> oout.write(ex.toString());
> }
> }
>
> It's break on SolrServer server = new
> HttpSolrServer(Core.solrResourceBundle.getString("//http://172.16.0.1:8983/S
> earchCore")); without any exception just stop.

This code looks different than the last code you said you were using,
when you posted to the dev list.  Have you included the real code this
time?  I can see at least two typos that would make the code refuse to
compile, so I'm guessing you're still changing it, not showing the
actual code.

What exactly does the following code return?

Core.solrResourceBundle.getString("//http://172.16.0.1:8983/SearchCore";)

What version of SolrJ are you using?  I'm not asking about the Solr
server version here although that would be good to know as well. I'm
guessing a 4.x version, or I think you'd be using HttpSolrClient, not
HttpSolrServer.  The latter will work on 5.x, but it is deprecated.

Thanks, Shawn




Send solr to Production

2016-03-20 Thread Adel Mohamed Khalifa
Hello All,

 

How I need to do or config for sending solr to production.

 

Regards,
Adel Khalifa

 



Re: Boosts for relevancy (shopping products)

2016-03-20 Thread Robert Brown
It's also worth mentioning that our platform contains shopping products 
in every single category, and will be searched by absolutely anyone, via 
an API made available to various websites, some niche, some not.


If those websites are category specific, ie, electrical goods, then we 
could boost on certain categories for a given website, but if they're 
also broad, is this even possible?


I guess we could track individual users and build up search-histories to 
try and guide us, but I don't see many hits being made on repeat users.


Recording clicks on products could also be used to boost individual 
products for specific keywords - I'm beginning to think this is actually 
our best hope?  e.g.  A multi-valued field containing keywords that 
resulted in a click on that product.





On 03/18/2016 04:14 PM, Robert Brown wrote:

That does sound rather useful!

We currently have it set to 0.1



On 03/18/2016 04:13 PM, Nick Vasilyev wrote:
Tie does quite a bit, without it only the highest weighted field that 
has

the term will be included in relevance score. Tie let's you include the
other fields that match as well.
On Mar 18, 2016 10:40 AM, "Robert Brown"  wrote:


Thanks for the added input.

I'll certainly look into the machine learning aspect, will be good 
to put

some basic knowledge I have into practice.

I'd been led to believe the tie parameter didn't actually do a lot. :-/



On 03/18/2016 12:07 PM, Nick Vasilyev wrote:

I work with a similar catalog; except our data is especially bad.  
We've

found that several things helped:

- Item level grouping (group same item sold by multiple vendors). Rank
items with more vendors a bit higher.
- Include a boost function for other attributes, such as an 
original image

of the product
- Rank items a bit higher if they have data from an external 
catalog like

IceCat
- For relevance and performance, we have several fields that we 
copy data
into. High value fields get copied into a high weighted field, 
while lower

value fields like description get copied into a lower weighted field.
These
fields are the backbone of our qf parameter, with other fields adding
additional boost.
- Play around with the tie parameter for edismax, we found that it 
makes

quite a big difference.

Hope this helps.

On Fri, Mar 18, 2016 at 6:19 AM, Alessandro Benedetti <
abenede...@apache.org


wrote:
In a relevancy problem I would repeat what my colleagues already 
pointed

out :
Data is key. We need to understand first of all our data before we 
can

understand what is relevant and what is not.
Once we specify a groundfloor which make sense ( and your basic 
approach

+
proper schema configuration as suggested + properly configured 
request

handler , seems a good start to me ) .

At this point if you are still not happy with the relevancy (i.e. 
you are

not happy with the different boosts you assigned ) my strongest
suggestion
at this time is to move to machine learning.
You need a good amount of data to feed the learner and make it 
your Super

Business Expert) .
I have been recently working with the Learn To Rank Bloomberg 
Plugin [1]

.
In  my opinion will be key for all the business that have many 
features

in
the game, that can help to evaluate a proper ranking.
For that you need to be able to collect and process signals, and 
you need

to carefully tune the features of your interest.
But the results could be surprising .

[1] https://issues.apache.org/jira/browse/SOLR-8542
[2] Learning to Rank in Solr <
https://www.youtube.com/watch?v=M7BKwJoh96s>

Cheers

On Thu, Mar 17, 2016 at 10:15 AM, Robert Brown 
wrote:

Thanks Scott and John,
As luck would have it I've got a PhD graduate coming for an 
interview

today, who just happened to do her research thesis on information


retrieval


with quantum theory and machine learning  :)

John, it sounds like you're describing my system! Shopping products
from
multiple sources.  (De-duplication is going to be fun soon).

I already copy fields like merchant, brand, category, to string 
fields

to
use them as facets/filters.  I was contemplating removing the
description
due to the spammy issue you mentioned, I didn't know about the
RemoveDuplicatesTokenFilterFactory, so I'm sure that's going to be a
huge
help.

Thanks a lot,
Rob



On 03/17/2016 10:01 AM, John Smith wrote:

Hi,

For once I might be of some help: I've had a similar configuration
(large set of products from various sources). It's very 
difficult to

find the right balance between all parameters and requires a lot of
tweaking, most often in the dark unfortunately.

What I've found is that omitNorms=true is a real breakthrough: 
without
it results tend to favor small texts, which is not what's wanted 
for
product names. I also added a RemoveDuplicatesTokenFilterFactory 
for

the
name as it's a common practice for spammers to repeat some key 
words in
order to be better placed in results. Stemming and custom stop 
words

(e.g. "cheap", "sale", ...) are other potential i

Re: publish solr on galsshfish server

2016-03-20 Thread Shawn Heisey
On 3/17/2016 4:20 AM, Adel Mohamed Khalifa wrote:
> And on servlet I coded :-
> Try{
> SolrServer server = new
> HttpSolrServer(Core.solrResourceBundle.getString("//http://172.16.0.1:8983/S
> earchCore"));
> ModifiableSolrParams params = new ModifiableSolrParams();
> String qqq = new
> String(request.getParameter("qu").getBytes("ISO-8859-1"), "UTF-8");
> params.set("q", new
> String(request.getParameter("qu").getBytes("ISO-8859-1"), "UTF-8"));
> params.set("rows", "3");
> params.set("spellcheck", "on");
> params.set("spellcheck.build", "true");
> params.set("wt", "json");
> params.set("indent", "true");
> 
> QueryResponse res = server.query(params);
> SolrDocumentList results = res.getResults();
> Iterator i = results.iterator();
> request.setAttribute("size", results.size());
> String json = new Gson().toJson(res.getResults());
> out.write(json);
> } catch (SolrServerException ex) {
> oout.write(ex.toString());
> }
> }
>
> It's break on SolrServer server = new
> HttpSolrServer(Core.solrResourceBundle.getString("//http://172.16.0.1:8983/S
> earchCore")); without any exception just stop.

This code looks different than the last code you said you were using,
when you posted to the dev list.  Have you included the real code this
time?  I can see at least two typos that would make the code refuse to
compile, so I'm guessing you're still changing it, not showing the
actual code.

What exactly does the following code return?

Core.solrResourceBundle.getString("//http://172.16.0.1:8983/SearchCore";)

What version of SolrJ are you using?  I'm not asking about the Solr
server version here although that would be good to know as well. I'm
guessing a 4.x version, or I think you'd be using HttpSolrClient, not
HttpSolrServer.  The latter will work on 5.x, but it is deprecated.

Thanks, Shawn



Re: Making managed schema unmutable correctly?

2016-03-20 Thread Jay Potharaju
Thanks  appreciate the feedback.

On Wed, Mar 16, 2016 at 8:23 PM, Shawn Heisey  wrote:

> On 3/16/2016 7:51 PM, Jay Potharaju wrote:
> > Does using schema API mean that no upconfig to zookeeper and no reloading
> > of all the nodes in my solrcloud? In which scenario should I not use
> schema
> > API, if any?
>
> The documentation says that a reload occurs automatically after the
> schema modification.  You will not need to upconfig and reload.
>
> https://cwiki.apache.org/confluence/display/solr/Schema+API
>
> I can't really tell you when you should or shouldn't use the API.
> That's something you'll have to decide.  If the API will do everything
> you need with regard to schema changes, then you could use it
> exclusively.  Or you could never use it, and the only thing that would
> change is the name of the file that you upload -- managed-schema instead
> of schema.xml.
>
> You can also reconfigure Solr to use the classic schema instead of the
> managed schema, and rename managed-schema to schema.xml.
>
> Thanks,
> Shawn
>
>


-- 
Thanks
Jay Potharaju


Re: how to update billions of docs

2016-03-20 Thread Ishan Chattopadhyaya
Hi Mohsin,
There's some work in progress for in-place updates to docValued fields,
https://issues.apache.org/jira/browse/SOLR-5944. Can you try the latest
patch there (or ping me if you need a git branch)?
It would be nice to know how fast the updates go for your usecase with that
patch. Please note that for that patch, both the version field and the
updated field needs to have stored=false, indexed=false, docValues=true.
Regards,
Ishan


On Thu, Mar 17, 2016 at 10:55 PM, Jack Krupansky 
wrote:

> It would be nice to have a wiki/doc for "Bulk Field Update" that listed all
> of these techniques and tricks.
>
> And, of course, it would be so much better to have an explicit Lucene
> feature for this. It could work in the background like merge and process
> one segment at a time as efficiently as possible.
>
> Have several modes:
>
> 1. Set a field of all documents to explicit value.
> 2. Set a field of query documents to an explicit value.
> 3. Increment by n.
> 4. Add new field to all document, or maybe by query.
> 5. Delete existing field for all documents.
> 6. Delete field value for all documents or a specified query.
>
>
> -- Jack Krupansky
>
> On Thu, Mar 17, 2016 at 12:31 PM, Ken Krugler  >
> wrote:
>
> > As others noted, currently updating a field means deleting and inserting
> > the entire document.
> >
> > Depending on how you use the field, you might be able to create another
> > core/container with that one field (plus the key field), and use join
> > support.
> >
> > Note that https://issues.apache.org/jira/browse/LUCENE-6352 is an
> > improvement, which looks like it's in the 5.x code line, though I don't
> see
> > a fix version.
> >
> > -- Ken
> >
> > > From: Mohsin Beg Beg
> > > Sent: March 16, 2016 3:52:47pm PDT
> > > To: solr-user@lucene.apache.org
> > > Subject: how to update billions of docs
> > >
> > > Hi,
> > >
> > > I have a requirement to replace a value of a field in 100B's of docs in
> > 100's of cores.
> > > The field is multiValued=false docValues=true type=StrField stored=true
> > indexed=true.
> > >
> > > Atomic Updates performance is on the order of 5K docs per sec per core
> > in solr 5.3 (other fields are quite big).
> > >
> > > Any suggestions ?
> > >
> > > -Mohsin
> >
> >
> > --
> > Ken Krugler
> > +1 530-210-6378
> > http://www.scaleunlimited.com
> > custom big data solutions & training
> > Hadoop, Cascading, Cassandra & Solr
> >
> >
> >
> >
> >
> >
>


Re: High Cpu sys usage

2016-03-20 Thread Otis Gospodnetić
Hi,

On Wed, Mar 16, 2016 at 10:59 AM, Patrick Plaatje 
wrote:

> Hi,
>
> From the sar output you supplied, it looks like you might have a memory
> issue on your hosts. The memory usage just before your crash seems to be
> *very* close to 100%. Even the slightest increase (Solr itself, or possibly
> by a system service) could caused the system crash. What are the
> specifications of your hosts and how much memory are you allocating?


That's normal actually - http://www.linuxatemyram.com/

You *want* Linux to be using all your memory - you paid for it :)

Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/




>


>
>
> On 16/03/2016, 14:52, "YouPeng Yang"  wrote:
>
> >Hi
> > It happened again,and worse thing is that my system went to crash.we can
> >even not connect to it with ssh.
> > I use the sar command to capture the statistics information about it.Here
> >are my details:
> >
> >
> >[1]cpu(by using sar -u),we have to restart our system just as the red font
> >LINUX RESTART in the logs.
>
> >--
> >03:00:01 PM all  7.61  0.00  0.92  0.07  0.00
> >91.40
> >03:10:01 PM all  7.71  0.00  1.29  0.06  0.00
> >90.94
> >03:20:01 PM all  7.62  0.00  1.98  0.06  0.00
> >90.34
> >03:30:35 PM all  5.65  0.00 31.08  0.04  0.00
> >63.23
> >03:42:40 PM all 47.58  0.00 52.25  0.00  0.00
> > 0.16
> >Average:all  8.21  0.00  1.57  0.05  0.00
> >90.17
> >
> >04:42:04 PM   LINUX RESTART
> >
> >04:50:01 PM CPU %user %nice   %system   %iowait%steal
> >%idle
> >05:00:01 PM all  3.49  0.00  0.62  0.15  0.00
> >95.75
> >05:10:01 PM all  9.03  0.00  0.92  0.28  0.00
> >89.77
> >05:20:01 PM all  7.06  0.00  0.78  0.05  0.00
> >92.11
> >05:30:01 PM all  6.67  0.00  0.79  0.06  0.00
> >92.48
> >05:40:01 PM all  6.26  0.00  0.76  0.05  0.00
> >92.93
> >05:50:01 PM all  5.49  0.00  0.71  0.05  0.00
> >93.75
>
> >--
> >
> >[2]mem(by using sar -r)
>
> >--
> >03:00:01 PM   1519272 196633272 99.23361112  76364340 143574212
> >47.77
> >03:10:01 PM   1451764 196700780 99.27361196  76336340 143581608
> >47.77
> >03:20:01 PM   1453400 196699144 99.27361448  76248584 143551128
> >47.76
> >03:30:35 PM   1513844 196638700 99.24361648  76022016 143828244
> >47.85
> >03:42:40 PM   1481108 196671436 99.25361676  75718320 144478784
> >48.07
> >Average:  5051607 193100937 97.45362421  81775777 142758861
> >47.50
> >
> >04:42:04 PM   LINUX RESTART
> >
> >04:50:01 PM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit
> >%commit
> >05:00:01 PM 154357132  43795412 22.10 92012  18648644 134950460
> >44.90
> >05:10:01 PM 136468244  61684300 31.13219572  31709216 134966548
> >44.91
> >05:20:01 PM 135092452  63060092 31.82221488  32162324 134949788
> >44.90
> >05:30:01 PM 133410464  64742080 32.67233848  32793848 134976828
> >44.91
> >05:40:01 PM 132022052  66130492 33.37235812  33278908 135007268
> >44.92
> >05:50:01 PM 130630408  67522136 34.08237140  33900912 135099764
> >44.95
> >Average:136996792  61155752 30.86206645  30415642 134991776
> >44.91
>
> >--
> >
> >
> >As the blue font parts show that my hardware crash from 03:30:35.It is
> hung
> >up until I restart it manually at 04:42:04
> >ALl the above information just snapshot the performance when it crashed
> >while there is nothing cover the reason.I have also
> >check the /var/log/messages and find nothing useful.
> >
> >Note that I run the command- sar -v .It shows something abnormal:
>
> >
> >02:50:01 PM  11542262  9216 76446   258
> >03:00:01 PM  11645526  9536 76421   258
> >03:10:01 PM  11748690  9216 76451   258
> >03:20:01 PM  11850191  9152 76331   258
> >03:30:35 PM  11972313 10112132625   258
> >03:42:40 PM  12177319 13760340227   258
> >Average:  8293601  8950 68187   161
> >
> >04:42:04 PM   LINUX RESTART
> >
> >04:50:01 PM dentunusd   file-nr  inode-nrpty-nr
> >05:00:01 PM 35410  7616 35223 4
> >05:10:01 PM137320  7296 42632 6
> >05:20:01 PM247010  7296 

Re: Why is multiplicative boost prefered over additive?

2016-03-20 Thread Walter Underwood
I used a popularity score based on the DVD being in people’s queues and the 
streaming views. The Peter Jackson films were DVD only. They were in about 100 
subscriber queues. The first Twilight film was in 1.25 million queues.

Now think about the query “twilight zone”. How do you make “Twilight” not be 
the first hit for that?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Mar 18, 2016, at 8:48 AM,  
>  wrote:
> 
> On Friday, March 18, 2016 4:25 PM, wun...@wunderwood.org wrote:
>> 
>> That works fine if you have a query that matches things with a wide range of 
>> popularities. But that is the easy case.
>> 
>> What about the query "twilight", which matches all the Twilight movies, all 
>> of which are popular (millions of views).
> 
> Well, like I said, I focused on our use case. And we deal with articles, not 
> movies. And the raw popularity value is basically just "the number of page 
> views the last N days". We want to boost documents that many people have 
> visited recently, but don't really care about the exact search result 
> position when comparing documents with roughly the same popularity. So if all 
> the matched documents have *roughly* the same popularity, then we basically 
> don't want the popularity to influence the score much at all.
> 
>> Or "Lord of the Rings" which only matches movies with hundreds of views? 
>> People really will notice when 
>> the 1978 animated version shows up before the Peter Jackson films.
> 
> Well, doesn't the Peter Jackson "Lord of the Rings" films have more than just 
> a few hundred views?
> 
> /Jimi



RE: SolrCloud App Unit Testing

2016-03-20 Thread Davis, Daniel (NIH/NLM) [C]
MiniSolrCloudCluster is intended for building unit tests for cloud commands 
within Solr itself.

What most people do to test applications based on Solr (and their Solr 
configurations) is to start solr either on their CI server or in the cloud 
(more likely the later), and then point their application at that Solr instance 
through configuration for the unit tests.   They may also have separate tests 
to test the Solr collection/core configuration itself.

You can have your CI tool (Travis/etc.) or unit test scripts start-up Solr 
locally, or in the cloud, using various tools and concoctions.   Part of the 
core of that is the solr command-line in SOLR_HOME/bin, post tool in 
SOLR_HOME/bin, and zkcli in SOLR_HOME/server/scripts/cloud-scripts.

To start Solr in the cloud, you should look towards something that exists:
https://github.com/lucidworks/solr-scale-tk 
https://github.com/vkhatri/chef-solrcloud

Hope this helps,

-Dan

-Original Message-
From: Madhire, Naveen [mailto:naveen.madh...@capitalone.com] 
Sent: Thursday, March 17, 2016 11:24 AM
To: solr-user@lucene.apache.org
Subject: FW: SolrCloud App Unit Testing


Hi,

I am writing a Solr Application, can anyone please let me know how to Unit test 
the application?

I see we have MiniSolrCloudCluster class available in Solr, but I am confused 
about how to use that for Unit testing.

How should I create a embedded server for unit testing?



Thanks,
Naveen


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.