Re: Concurrent Indexing and Searching in Solr.

2015-08-07 Thread Nitin Solanki
Thanks Erick for your suggestion. I will remove commit = true and use solr
5.2 and then get back to you again. For further help. Thanks.

On Sat, Aug 8, 2015 at 4:07 AM Erick Erickson 
wrote:

> bq: So, How much minimum concurrent threads should I run?
>
> I really can't answer that in the abstract, you'll simply have to
> test.
>
> I'd prefer SolrJ to post.jar. If you're not going to SolrJ, I'd imagine
> that
> moving from Python to post.jar isn't all that useful.
>
> But before you do anything, see what really happens when you remove th
> commit=true. That's likely way more important than the rest.
>
> Best,
> Erick
>
> On Fri, Aug 7, 2015 at 3:15 PM, Nitin Solanki 
> wrote:
> > Hi Erick,
> > posting files to Solr via curl =>
> > Rather than posting files via curl. Which is better SolrJ or post.jar...
> I
> > don't use both things. I wrote a python script for indexing and using
> > urllib and urllib2 for indexing data via http.. I don't have any  option
> to
> > use SolrJ Right now. How can I do same thing via post.jar in python? Any
> > help Please.
> >
> > indexing with 100 threads is going to eat up a lot of CPU cycles
> > => So, How much minimum concurrent threads should I run? And I also need
> > concurrent searching. So, How much?
> >
> > And Thanks for solr 5.2, I will go through that. Thanking for reply.
> Please
> > help me..
> >
> > On Fri, Aug 7, 2015 at 11:51 PM Erick Erickson 
> > wrote:
> >
> >> bq: How much limitations does Solr has related to indexing and searching
> >> simultaneously? It means that how many simultaneously calls, I made for
> >> searching and indexing once?
> >>
> >> None a-priori. It all depends on the hardware you're throwing at it.
> >> Obviously
> >> indexing with 100 threads is going to eat up a lot of CPU cycles that
> >> can't then
> >> be devoted to satisfying queries. You need to strike a balance. Do
> >> seriously
> >> consider using some other method than posting files to Solr via curl
> >> or the like,
> >> that's rarely a robust solution for production.
> >>
> >> As for adding the commit=true, this shouldn't be affecting the index
> size,
> >> I
> >> suspect you were mislead by something else happening.
> >>
> >> Really, remove it or you'll beat up your system hugely. As for the soft
> >> commit
> >> interval, that's totally irrelevant when you're committing every
> >> document. But do
> >> lengthen it as much as you can. Most of the time when people say "real
> >> time",
> >> it turns out that 10 seconds is OK. Or 60 seconds is OK.  You have to
> check
> >> what the _real_ requirement is, it's often not what's stated.
> >>
> >> bq: I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
> >> indexing and searching data.
> >>
> >> Did you read the link I provided? With replicas, 5.2 will index almost
> >> twice as
> >> fast. That means (roughly) half the work on the followers is being done,
> >> freeing up cycles for performing queries.
> >>
> >> Best,
> >> Erick
> >>
> >>
> >> On Fri, Aug 7, 2015 at 2:06 PM, Nitin Solanki 
> >> wrote:
> >> > Hi Erick,
> >> >   You said that soft commit should be more than 3000 ms.
> >> > Actually, I need Real time searching and that's why I need soft commit
> >> fast.
> >> >
> >> > commit=true => I made commit=true because , It reduces by indexed data
> >> size
> >> > from 1.5GB to 500MB on* each shard*. When I did commit=false then, my
> >> > indexed data size was 1.5GB. After changing it to commit=true, then
> size
> >> > reduced to 500MB only. I am not getting how is it?
> >> >
> >> > I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
> >> > indexing and searching data.
> >> >
> >> > How much limitations does Solr has related to indexing and searching
> >> > simultaneously? It means that how many simultaneously calls, I made
> for
> >> > searching and indexing once?
> >> >
> >> >
> >> > On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson <
> erickerick...@gmail.com>
> >> > wrote:
> >> >
> >> >> Your soft commit time of 3 seconds is quite aggressive,
> >> >> I'd lengthen it to as long as possible.
> >> >>
> >> >> Ugh, looked at your query more closely. Adding commit=true to every
> >> update
> >> >> request is horrible performance wise. Let your autocommit process
> >> >> handle the commits is the first thing I'd do. Second, I'd try going
> to
> >> >> SolrJ
> >> >> and batching up documents (I usually start with 1,000) or using the
> >> >> post.jar
> >> >> tool rather than sending them via a raw URL.
> >> >>
> >> >> I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what
> >> >> version of Solr?
> >> >> There was a 2x speedup in Solr 5.2, see:
> >> >>
> >>
> http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
> >> >>
> >> >> One symptom was that the followers were doing way more work than
> the
> >> >> leader
> >> >> (BTW, using master/slave when talking SolrCloud is a bit
> confusing...)
> >> >> which will
> >> >> affect query respon

Re: New Solr installation fails to create core

2015-08-07 Thread Erick Erickson
There's not nearly enough information here to help.
What was the error? What was the command? What did the logs show?

You might review:

http://wiki.apache.org/solr/UsingMailingLists

Best,
Erick

On Fri, Aug 7, 2015 at 5:07 PM, Scott Derrick  wrote:
> I upgraded to Java 1.8.51 no change,
>
> I downgraded solr to 5.1.0 no change?
>
> It must have something to do with the OS, this server is CentOS 6
>
> Scott
>
> --
> Even if you're on the right track, you'll get run over if you just sit
> there.
> Will Rogers
>


re:New Solr installation fails to create core

2015-08-07 Thread Scott Derrick

I upgraded to Java 1.8.51 no change,

I downgraded solr to 5.1.0 no change?

It must have something to do with the OS, this server is CentOS 6

Scott

--
Even if you're on the right track, you'll get run over if you just sit there.
Will Rogers



Re: Concurrent Indexing and Searching in Solr.

2015-08-07 Thread Erick Erickson
bq: So, How much minimum concurrent threads should I run?

I really can't answer that in the abstract, you'll simply have to
test.

I'd prefer SolrJ to post.jar. If you're not going to SolrJ, I'd imagine that
moving from Python to post.jar isn't all that useful.

But before you do anything, see what really happens when you remove th
commit=true. That's likely way more important than the rest.

Best,
Erick

On Fri, Aug 7, 2015 at 3:15 PM, Nitin Solanki  wrote:
> Hi Erick,
> posting files to Solr via curl =>
> Rather than posting files via curl. Which is better SolrJ or post.jar... I
> don't use both things. I wrote a python script for indexing and using
> urllib and urllib2 for indexing data via http.. I don't have any  option to
> use SolrJ Right now. How can I do same thing via post.jar in python? Any
> help Please.
>
> indexing with 100 threads is going to eat up a lot of CPU cycles
> => So, How much minimum concurrent threads should I run? And I also need
> concurrent searching. So, How much?
>
> And Thanks for solr 5.2, I will go through that. Thanking for reply. Please
> help me..
>
> On Fri, Aug 7, 2015 at 11:51 PM Erick Erickson 
> wrote:
>
>> bq: How much limitations does Solr has related to indexing and searching
>> simultaneously? It means that how many simultaneously calls, I made for
>> searching and indexing once?
>>
>> None a-priori. It all depends on the hardware you're throwing at it.
>> Obviously
>> indexing with 100 threads is going to eat up a lot of CPU cycles that
>> can't then
>> be devoted to satisfying queries. You need to strike a balance. Do
>> seriously
>> consider using some other method than posting files to Solr via curl
>> or the like,
>> that's rarely a robust solution for production.
>>
>> As for adding the commit=true, this shouldn't be affecting the index size,
>> I
>> suspect you were mislead by something else happening.
>>
>> Really, remove it or you'll beat up your system hugely. As for the soft
>> commit
>> interval, that's totally irrelevant when you're committing every
>> document. But do
>> lengthen it as much as you can. Most of the time when people say "real
>> time",
>> it turns out that 10 seconds is OK. Or 60 seconds is OK.  You have to check
>> what the _real_ requirement is, it's often not what's stated.
>>
>> bq: I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
>> indexing and searching data.
>>
>> Did you read the link I provided? With replicas, 5.2 will index almost
>> twice as
>> fast. That means (roughly) half the work on the followers is being done,
>> freeing up cycles for performing queries.
>>
>> Best,
>> Erick
>>
>>
>> On Fri, Aug 7, 2015 at 2:06 PM, Nitin Solanki 
>> wrote:
>> > Hi Erick,
>> >   You said that soft commit should be more than 3000 ms.
>> > Actually, I need Real time searching and that's why I need soft commit
>> fast.
>> >
>> > commit=true => I made commit=true because , It reduces by indexed data
>> size
>> > from 1.5GB to 500MB on* each shard*. When I did commit=false then, my
>> > indexed data size was 1.5GB. After changing it to commit=true, then size
>> > reduced to 500MB only. I am not getting how is it?
>> >
>> > I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
>> > indexing and searching data.
>> >
>> > How much limitations does Solr has related to indexing and searching
>> > simultaneously? It means that how many simultaneously calls, I made for
>> > searching and indexing once?
>> >
>> >
>> > On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson 
>> > wrote:
>> >
>> >> Your soft commit time of 3 seconds is quite aggressive,
>> >> I'd lengthen it to as long as possible.
>> >>
>> >> Ugh, looked at your query more closely. Adding commit=true to every
>> update
>> >> request is horrible performance wise. Let your autocommit process
>> >> handle the commits is the first thing I'd do. Second, I'd try going to
>> >> SolrJ
>> >> and batching up documents (I usually start with 1,000) or using the
>> >> post.jar
>> >> tool rather than sending them via a raw URL.
>> >>
>> >> I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what
>> >> version of Solr?
>> >> There was a 2x speedup in Solr 5.2, see:
>> >>
>> http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
>> >>
>> >> One symptom was that the followers were doing way more work than the
>> >> leader
>> >> (BTW, using master/slave when talking SolrCloud is a bit confusing...)
>> >> which will
>> >> affect query response rates.
>> >>
>> >> Basically, if query response is paramount, you really need to throttle
>> >> your indexing,
>> >> there's just a whole lot of work going on here..
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Fri, Aug 7, 2015 at 11:23 AM, Upayavira  wrote:
>> >> > How many CPUs do you have? 100 concurrent indexing calls seems like
>> >> > rather a lot. You're gonna end up doing a lot of context switching,
>> >> > hence degraded performance. Dunno what others would say, b

Re: docValues

2015-08-07 Thread Shawn Heisey
On 8/7/2015 11:47 AM, naga sharathrayapati wrote:
> JVM-Memory has gone up from 3% to 17.1%

In my experience, a healthy Java application (after the heap size has
stabilized) will have a heap utilization graph where the low points are
between 50 and 75 percent.  If the low points in heap utilization are
consistently below 25 percent, you would be better off reducing the heap
size and allowing the OS to use that memory instead.

If you want to track heap utilization, JVM-Memory in the Solr dashboard
is a very poor tool.  Use tools like visualvm or jconsole.

https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

I need to add what I said about very low heap utilization to that wiki page.

Thanks,
Shawn



Re: Concurrent Indexing and Searching in Solr.

2015-08-07 Thread Nitin Solanki
Hi Erick,
posting files to Solr via curl =>
Rather than posting files via curl. Which is better SolrJ or post.jar... I
don't use both things. I wrote a python script for indexing and using
urllib and urllib2 for indexing data via http.. I don't have any  option to
use SolrJ Right now. How can I do same thing via post.jar in python? Any
help Please.

indexing with 100 threads is going to eat up a lot of CPU cycles
=> So, How much minimum concurrent threads should I run? And I also need
concurrent searching. So, How much?

And Thanks for solr 5.2, I will go through that. Thanking for reply. Please
help me..

On Fri, Aug 7, 2015 at 11:51 PM Erick Erickson 
wrote:

> bq: How much limitations does Solr has related to indexing and searching
> simultaneously? It means that how many simultaneously calls, I made for
> searching and indexing once?
>
> None a-priori. It all depends on the hardware you're throwing at it.
> Obviously
> indexing with 100 threads is going to eat up a lot of CPU cycles that
> can't then
> be devoted to satisfying queries. You need to strike a balance. Do
> seriously
> consider using some other method than posting files to Solr via curl
> or the like,
> that's rarely a robust solution for production.
>
> As for adding the commit=true, this shouldn't be affecting the index size,
> I
> suspect you were mislead by something else happening.
>
> Really, remove it or you'll beat up your system hugely. As for the soft
> commit
> interval, that's totally irrelevant when you're committing every
> document. But do
> lengthen it as much as you can. Most of the time when people say "real
> time",
> it turns out that 10 seconds is OK. Or 60 seconds is OK.  You have to check
> what the _real_ requirement is, it's often not what's stated.
>
> bq: I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
> indexing and searching data.
>
> Did you read the link I provided? With replicas, 5.2 will index almost
> twice as
> fast. That means (roughly) half the work on the followers is being done,
> freeing up cycles for performing queries.
>
> Best,
> Erick
>
>
> On Fri, Aug 7, 2015 at 2:06 PM, Nitin Solanki 
> wrote:
> > Hi Erick,
> >   You said that soft commit should be more than 3000 ms.
> > Actually, I need Real time searching and that's why I need soft commit
> fast.
> >
> > commit=true => I made commit=true because , It reduces by indexed data
> size
> > from 1.5GB to 500MB on* each shard*. When I did commit=false then, my
> > indexed data size was 1.5GB. After changing it to commit=true, then size
> > reduced to 500MB only. I am not getting how is it?
> >
> > I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
> > indexing and searching data.
> >
> > How much limitations does Solr has related to indexing and searching
> > simultaneously? It means that how many simultaneously calls, I made for
> > searching and indexing once?
> >
> >
> > On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson 
> > wrote:
> >
> >> Your soft commit time of 3 seconds is quite aggressive,
> >> I'd lengthen it to as long as possible.
> >>
> >> Ugh, looked at your query more closely. Adding commit=true to every
> update
> >> request is horrible performance wise. Let your autocommit process
> >> handle the commits is the first thing I'd do. Second, I'd try going to
> >> SolrJ
> >> and batching up documents (I usually start with 1,000) or using the
> >> post.jar
> >> tool rather than sending them via a raw URL.
> >>
> >> I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what
> >> version of Solr?
> >> There was a 2x speedup in Solr 5.2, see:
> >>
> http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
> >>
> >> One symptom was that the followers were doing way more work than the
> >> leader
> >> (BTW, using master/slave when talking SolrCloud is a bit confusing...)
> >> which will
> >> affect query response rates.
> >>
> >> Basically, if query response is paramount, you really need to throttle
> >> your indexing,
> >> there's just a whole lot of work going on here..
> >>
> >> Best,
> >> Erick
> >>
> >> On Fri, Aug 7, 2015 at 11:23 AM, Upayavira  wrote:
> >> > How many CPUs do you have? 100 concurrent indexing calls seems like
> >> > rather a lot. You're gonna end up doing a lot of context switching,
> >> > hence degraded performance. Dunno what others would say, but I'd aim
> for
> >> > approx one indexing thread per CPU.
> >> >
> >> > Upayavira
> >> >
> >> > On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote:
> >> >> Hello Everyone,
> >> >>   I have indexed 16 million documents in Solr
> >> >> Cloud. Created 4 nodes and 8 shards with single replica.
> >> >> I am trying to make concurrent indexing and searching on those
> indexed
> >> >> documents. Trying to make 100 concurrent indexing calls along with
> 100
> >> >> concurrent searching calls.
> >> >> It *degrades searching and indexing* performance both.
> >> >>
> >> 

Re: Concurrent Indexing and Searching in Solr.

2015-08-07 Thread Erick Erickson
bq: How much limitations does Solr has related to indexing and searching
simultaneously? It means that how many simultaneously calls, I made for
searching and indexing once?

None a-priori. It all depends on the hardware you're throwing at it. Obviously
indexing with 100 threads is going to eat up a lot of CPU cycles that can't then
be devoted to satisfying queries. You need to strike a balance. Do seriously
consider using some other method than posting files to Solr via curl
or the like,
that's rarely a robust solution for production.

As for adding the commit=true, this shouldn't be affecting the index size, I
suspect you were mislead by something else happening.

Really, remove it or you'll beat up your system hugely. As for the soft commit
interval, that's totally irrelevant when you're committing every
document. But do
lengthen it as much as you can. Most of the time when people say "real time",
it turns out that 10 seconds is OK. Or 60 seconds is OK.  You have to check
what the _real_ requirement is, it's often not what's stated.

bq: I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
indexing and searching data.

Did you read the link I provided? With replicas, 5.2 will index almost twice as
fast. That means (roughly) half the work on the followers is being done,
freeing up cycles for performing queries.

Best,
Erick


On Fri, Aug 7, 2015 at 2:06 PM, Nitin Solanki  wrote:
> Hi Erick,
>   You said that soft commit should be more than 3000 ms.
> Actually, I need Real time searching and that's why I need soft commit fast.
>
> commit=true => I made commit=true because , It reduces by indexed data size
> from 1.5GB to 500MB on* each shard*. When I did commit=false then, my
> indexed data size was 1.5GB. After changing it to commit=true, then size
> reduced to 500MB only. I am not getting how is it?
>
> I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
> indexing and searching data.
>
> How much limitations does Solr has related to indexing and searching
> simultaneously? It means that how many simultaneously calls, I made for
> searching and indexing once?
>
>
> On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson 
> wrote:
>
>> Your soft commit time of 3 seconds is quite aggressive,
>> I'd lengthen it to as long as possible.
>>
>> Ugh, looked at your query more closely. Adding commit=true to every update
>> request is horrible performance wise. Let your autocommit process
>> handle the commits is the first thing I'd do. Second, I'd try going to
>> SolrJ
>> and batching up documents (I usually start with 1,000) or using the
>> post.jar
>> tool rather than sending them via a raw URL.
>>
>> I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what
>> version of Solr?
>> There was a 2x speedup in Solr 5.2, see:
>> http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
>>
>> One symptom was that the followers were doing way more work than the
>> leader
>> (BTW, using master/slave when talking SolrCloud is a bit confusing...)
>> which will
>> affect query response rates.
>>
>> Basically, if query response is paramount, you really need to throttle
>> your indexing,
>> there's just a whole lot of work going on here..
>>
>> Best,
>> Erick
>>
>> On Fri, Aug 7, 2015 at 11:23 AM, Upayavira  wrote:
>> > How many CPUs do you have? 100 concurrent indexing calls seems like
>> > rather a lot. You're gonna end up doing a lot of context switching,
>> > hence degraded performance. Dunno what others would say, but I'd aim for
>> > approx one indexing thread per CPU.
>> >
>> > Upayavira
>> >
>> > On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote:
>> >> Hello Everyone,
>> >>   I have indexed 16 million documents in Solr
>> >> Cloud. Created 4 nodes and 8 shards with single replica.
>> >> I am trying to make concurrent indexing and searching on those indexed
>> >> documents. Trying to make 100 concurrent indexing calls along with 100
>> >> concurrent searching calls.
>> >> It *degrades searching and indexing* performance both.
>> >>
>> >> Configuration :
>> >>
>> >>   "commitWithin":{"softCommit":true},
>> >>   "autoCommit":{
>> >> "maxDocs":-1,
>> >> "maxTime":6,
>> >> "openSearcher":false},
>> >>   "autoSoftCommit":{
>> >> "maxDocs":-1,
>> >> "maxTime":3000}},
>> >>
>> >>   "indexConfig":{
>> >>   "maxBufferedDocs":-1,
>> >>   "maxMergeDocs":-1,
>> >>   "maxIndexingThreads":8,
>> >>   "mergeFactor":-1,
>> >>   "ramBufferSizeMB":100.0,
>> >>   "writeLockTimeout":-1,
>> >>   "lockType":"native"}}}
>> >>
>> >> AND  2
>> >>
>> >> I don't have know that how master and slave works. Normally, I created 8
>> >> shards and indexed documents using :
>> >>
>> >>
>> >>
>> >>
>> >> *http://localhost:8983/solr/test_commit_fast/update/json?commit=true
>> >> 
>> -H
>> >> 'Content-type:a

Re: Concurrent Indexing and Searching in Solr.

2015-08-07 Thread Nitin Solanki
Hi Erick,
  You said that soft commit should be more than 3000 ms.
Actually, I need Real time searching and that's why I need soft commit fast.

commit=true => I made commit=true because , It reduces by indexed data size
from 1.5GB to 500MB on* each shard*. When I did commit=false then, my
indexed data size was 1.5GB. After changing it to commit=true, then size
reduced to 500MB only. I am not getting how is it?

I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
indexing and searching data.

How much limitations does Solr has related to indexing and searching
simultaneously? It means that how many simultaneously calls, I made for
searching and indexing once?


On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson 
wrote:

> Your soft commit time of 3 seconds is quite aggressive,
> I'd lengthen it to as long as possible.
>
> Ugh, looked at your query more closely. Adding commit=true to every update
> request is horrible performance wise. Let your autocommit process
> handle the commits is the first thing I'd do. Second, I'd try going to
> SolrJ
> and batching up documents (I usually start with 1,000) or using the
> post.jar
> tool rather than sending them via a raw URL.
>
> I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what
> version of Solr?
> There was a 2x speedup in Solr 5.2, see:
> http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
>
> One symptom was that the followers were doing way more work than the
> leader
> (BTW, using master/slave when talking SolrCloud is a bit confusing...)
> which will
> affect query response rates.
>
> Basically, if query response is paramount, you really need to throttle
> your indexing,
> there's just a whole lot of work going on here..
>
> Best,
> Erick
>
> On Fri, Aug 7, 2015 at 11:23 AM, Upayavira  wrote:
> > How many CPUs do you have? 100 concurrent indexing calls seems like
> > rather a lot. You're gonna end up doing a lot of context switching,
> > hence degraded performance. Dunno what others would say, but I'd aim for
> > approx one indexing thread per CPU.
> >
> > Upayavira
> >
> > On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote:
> >> Hello Everyone,
> >>   I have indexed 16 million documents in Solr
> >> Cloud. Created 4 nodes and 8 shards with single replica.
> >> I am trying to make concurrent indexing and searching on those indexed
> >> documents. Trying to make 100 concurrent indexing calls along with 100
> >> concurrent searching calls.
> >> It *degrades searching and indexing* performance both.
> >>
> >> Configuration :
> >>
> >>   "commitWithin":{"softCommit":true},
> >>   "autoCommit":{
> >> "maxDocs":-1,
> >> "maxTime":6,
> >> "openSearcher":false},
> >>   "autoSoftCommit":{
> >> "maxDocs":-1,
> >> "maxTime":3000}},
> >>
> >>   "indexConfig":{
> >>   "maxBufferedDocs":-1,
> >>   "maxMergeDocs":-1,
> >>   "maxIndexingThreads":8,
> >>   "mergeFactor":-1,
> >>   "ramBufferSizeMB":100.0,
> >>   "writeLockTimeout":-1,
> >>   "lockType":"native"}}}
> >>
> >> AND  2
> >>
> >> I don't have know that how master and slave works. Normally, I created 8
> >> shards and indexed documents using :
> >>
> >>
> >>
> >>
> >> *http://localhost:8983/solr/test_commit_fast/update/json?commit=true
> >> 
> -H
> >> 'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching
> >> using
> >> *: http://localhost:8983/solr/test_commit_fast/select
> >> *?q=< field_name:
> >> search_string>
> >>
> >> Please any help on it. To make searching and indexing fast concurrently.
> >> Thanks.
> >>
> >>
> >> Regards,
> >> Nitin
>


Re: Concurrent Indexing and Searching in Solr.

2015-08-07 Thread Nitin Solanki
Hi, Upayavira

RAM = 28GB
CPU = 4 processes..


On Fri, Aug 7, 2015 at 8:53 PM Upayavira  wrote:

> How many CPUs do you have? 100 concurrent indexing calls seems like
> rather a lot. You're gonna end up doing a lot of context switching,
> hence degraded performance. Dunno what others would say, but I'd aim for
> approx one indexing thread per CPU.
>
> Upayavira
>
> On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote:
> > Hello Everyone,
> >   I have indexed 16 million documents in Solr
> > Cloud. Created 4 nodes and 8 shards with single replica.
> > I am trying to make concurrent indexing and searching on those indexed
> > documents. Trying to make 100 concurrent indexing calls along with 100
> > concurrent searching calls.
> > It *degrades searching and indexing* performance both.
> >
> > Configuration :
> >
> >   "commitWithin":{"softCommit":true},
> >   "autoCommit":{
> > "maxDocs":-1,
> > "maxTime":6,
> > "openSearcher":false},
> >   "autoSoftCommit":{
> > "maxDocs":-1,
> > "maxTime":3000}},
> >
> >   "indexConfig":{
> >   "maxBufferedDocs":-1,
> >   "maxMergeDocs":-1,
> >   "maxIndexingThreads":8,
> >   "mergeFactor":-1,
> >   "ramBufferSizeMB":100.0,
> >   "writeLockTimeout":-1,
> >   "lockType":"native"}}}
> >
> > AND  2
> >
> > I don't have know that how master and slave works. Normally, I created 8
> > shards and indexed documents using :
> >
> >
> >
> >
> > *http://localhost:8983/solr/test_commit_fast/update/json?commit=true
> >  -H
> > 'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching
> > using
> > *: http://localhost:8983/solr/test_commit_fast/select
> > *?q=< field_name:
> > search_string>
> >
> > Please any help on it. To make searching and indexing fast concurrently.
> > Thanks.
> >
> >
> > Regards,
> > Nitin
>


Re: docValues

2015-08-07 Thread naga sharathrayapati
JVM-Memory has gone up from 3% to 17.1%

On Fri, Aug 7, 2015 at 12:10 PM, Shawn Heisey  wrote:

> On 8/7/2015 10:25 AM, naga sharathrayapati wrote:
> > i have added docValues="true" to my existing schema and I have seen
> > exponential increase in the size of the index.
> >
> > The reason in going with docValues is to improve the faceting query time.
> >
> > schema:
> >  > multiValued="false" docValues="true"/>
> >
> > Is this ok? am i doing anything wrong in the schema?
>
> An exponential increase would REALLY surprise me.  I have seen indexes
> nearly double in size with the addition of docValues on the primary
> search field.  You are storing another complete copy of the original
> value sent to Solr.  It will be compressed if your Solr version is at
> least 4.2, just like the stored value is in 4.1 and later.
>
> If you are only adding docValues to a date field, I would not expect
> that to affect your index size very much at all, but if you are adding
> them to large text fields, it would.
>
> Performance tweaks are nearly always a trade -- typically your space and
> memory utilization goes up, and it runs faster.  If you don't have
> enough memory, then either performance will go down or the system will
> stop working entirely.
>
> Thanks,
> Shawn
>
>


Re: docValues

2015-08-07 Thread Shawn Heisey
On 8/7/2015 10:25 AM, naga sharathrayapati wrote:
> i have added docValues="true" to my existing schema and I have seen
> exponential increase in the size of the index.
>
> The reason in going with docValues is to improve the faceting query time.
>
> schema:
>  multiValued="false" docValues="true"/>
>
> Is this ok? am i doing anything wrong in the schema?

An exponential increase would REALLY surprise me.  I have seen indexes
nearly double in size with the addition of docValues on the primary
search field.  You are storing another complete copy of the original
value sent to Solr.  It will be compressed if your Solr version is at
least 4.2, just like the stored value is in 4.1 and later.

If you are only adding docValues to a date field, I would not expect
that to affect your index size very much at all, but if you are adding
them to large text fields, it would.

Performance tweaks are nearly always a trade -- typically your space and
memory utilization goes up, and it runs faster.  If you don't have
enough memory, then either performance will go down or the system will
stop working entirely.

Thanks,
Shawn



Re: docValues

2015-08-07 Thread Erick Erickson
My crude approximation is that docValues increases the disk size,
but that's mostly due to serializing data that would be built in-memory,
basically serializing these structures. So while they increase
disk size, the memory requirements aren't increased. AFAIK, the
in-memory size is a bit smaller and (I think) some of the data can be
in MMapDirectory space which decreases pressure on the JVM heap,
which is a good thing.

So the measure of whether anything's wrong is really whether your JVM
memory goes up or down afterwards.

I'm surprised by the word "exponential" here, if you're just measuring with
a few documents the growth rate is probably deceptive.

And be sure to re-index the entire corpus after adding docValues, I usually
remove the entire data directory when I change the schema.

Best,
Erick

On Fri, Aug 7, 2015 at 12:25 PM, naga sharathrayapati
 wrote:
> i have added docValues="true" to my existing schema and I have seen
> exponential increase in the size of the index.
>
> The reason in going with docValues is to improve the faceting query time.
>
> schema:
>  multiValued="false" docValues="true"/>
>
> Is this ok? am i doing anything wrong in the schema?


docValues

2015-08-07 Thread naga sharathrayapati
i have added docValues="true" to my existing schema and I have seen
exponential increase in the size of the index.

The reason in going with docValues is to improve the faceting query time.

schema:


Is this ok? am i doing anything wrong in the schema?


Re: Suggester needed for returning suggestions when term is not start of field value

2015-08-07 Thread Erick Erickson
You might consider different implementations. The FST-based suggesters
only work in order by design.

AnalyzingInfixSuggester or perhaps FreeTextSuggester are possibilities.

Here's something you might find useful:

http://lucidworks.com/blog/solr-suggester/

Best,
Erick

On Fri, Aug 7, 2015 at 5:15 AM, Thomas Michael Engelke
 wrote:
>  Hey,
>
> I'm playing around with the suggester component, and it works perfectly
> as described: Suggestions for 'logitech mouse' include 'logitech mouse
> g500' and 'logitech mouse gaming'.
>
> However, when the words in the record supplying the suggester do not
> follow each other as in the search terms, nothing is returned.
> Suggestions for 'logitech mouse' do not include 'logitech g500 mouse'.
>
> Is there a suggester implementation that can suggest records that way?
>
> Best wishes.


RE: Only indexing changed documents

2015-08-07 Thread Davis, Daniel (NIH/NLM) [C]
On 8/7/2015 11:48 AM, Shawn Heisey  wrote:
>  On 8/7/2015 8:56 AM, Davis, Daniel (NIH/NLM) [C] wrote:
> > ... snip... 
> > Each document has an id I wish to use as the unique ID, but I also want to 
> > compute a signature.   Is there some way I can use an
> > updateRequestProcessorChain to throw away a document if its signature and 
> > document id match based on real-time get?
> 
> My main Solr indexes are each generated from a MySQL database.  One contains 
> over 100 million rows, another over 200 million.  
> A third contains about 18 million.  Here's how we handle the requirement you 
> asked about:
> 
> The main table has a delete id column that is its primary key.  This is an 
> autoincrement column.  There is another unique index 
> on another column in that table, which is the canonical unique identifier, 
> used as Solr's uniqueKey.
> 
> The main table also has triggers for DELETE and UPDATE which add records to 
> the idx_delete table (contains delete id values) 
> and idx_reinsert table (contains unique key values).  These extra tables each 
> have a primary key on an autoincrement column.
> The build program (written in Java using SolrJ) tracks three values for every 
> update -- the last did value in the main table, and 
> the last id value in idx_delete and idx_reinsert.

Thanks, Shawn - this is a better solution, and I've used something similar with 
PostgreSQL in the past.   
I don't control the schema, but I can make the suggestion.


Re: Solr 5.2 index time field boost not working as expected

2015-08-07 Thread Erick Erickson
Add &debug=all&debug.explain.structured=true to your query.
What you're looking for is evidence that your boost is being used.

Likely your problem is that the score is indeed taking into account
your boost, but other scoring factors are keeping your specific
document from coming out at the top.

If that doesn't help, can you post the results of the debug=true?

Best,
Erick

On Fri, Aug 7, 2015 at 9:30 AM, dinesh naik  wrote:
> Hi all,
>
> We need to boost a field in a document if field matches certain criteria.
>
> For example:
>
> if title contains "Secrete" , then we want to boost the field to 100 .
>
> For this we have the below code in solrj api while indexing the document:
>
>
> Collection docs = new ArrayList();
>
> SolrInputDocument doc = new SolrInputDocument();
> doc.addField("title", "Secrete" , 100.0f); // Field Boost
> doc.addField("id", 11);
> doc.addField("modelnumber", "AK10005");
> doc.addField("name", "XX5");
>
> docs.add(doc);
>
> Also , we made omitNorms="false" for this field in schema.xml
>
>  required="true" omitNorms="false" />
>
> But still we do not see this document coming at the top. Is there any other
> setting which has to be done for index time boosting?
>
>
> Best Regards,
> Dinesh Naik
>
>
> --
> Best Regards,
> Dinesh Naik


Re: Concurrent Indexing and Searching in Solr.

2015-08-07 Thread Erick Erickson
Your soft commit time of 3 seconds is quite aggressive,
I'd lengthen it to as long as possible.

Ugh, looked at your query more closely. Adding commit=true to every update
request is horrible performance wise. Let your autocommit process
handle the commits is the first thing I'd do. Second, I'd try going to SolrJ
and batching up documents (I usually start with 1,000) or using the post.jar
tool rather than sending them via a raw URL.

I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what
version of Solr?
There was a 2x speedup in Solr 5.2, see:
http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/

One symptom was that the followers were doing way more work than the leader
(BTW, using master/slave when talking SolrCloud is a bit confusing...)
which will
affect query response rates.

Basically, if query response is paramount, you really need to throttle
your indexing,
there's just a whole lot of work going on here..

Best,
Erick

On Fri, Aug 7, 2015 at 11:23 AM, Upayavira  wrote:
> How many CPUs do you have? 100 concurrent indexing calls seems like
> rather a lot. You're gonna end up doing a lot of context switching,
> hence degraded performance. Dunno what others would say, but I'd aim for
> approx one indexing thread per CPU.
>
> Upayavira
>
> On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote:
>> Hello Everyone,
>>   I have indexed 16 million documents in Solr
>> Cloud. Created 4 nodes and 8 shards with single replica.
>> I am trying to make concurrent indexing and searching on those indexed
>> documents. Trying to make 100 concurrent indexing calls along with 100
>> concurrent searching calls.
>> It *degrades searching and indexing* performance both.
>>
>> Configuration :
>>
>>   "commitWithin":{"softCommit":true},
>>   "autoCommit":{
>> "maxDocs":-1,
>> "maxTime":6,
>> "openSearcher":false},
>>   "autoSoftCommit":{
>> "maxDocs":-1,
>> "maxTime":3000}},
>>
>>   "indexConfig":{
>>   "maxBufferedDocs":-1,
>>   "maxMergeDocs":-1,
>>   "maxIndexingThreads":8,
>>   "mergeFactor":-1,
>>   "ramBufferSizeMB":100.0,
>>   "writeLockTimeout":-1,
>>   "lockType":"native"}}}
>>
>> AND  2
>>
>> I don't have know that how master and slave works. Normally, I created 8
>> shards and indexed documents using :
>>
>>
>>
>>
>> *http://localhost:8983/solr/test_commit_fast/update/json?commit=true
>>  -H
>> 'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching
>> using
>> *: http://localhost:8983/solr/test_commit_fast/select
>> *?q=< field_name:
>> search_string>
>>
>> Please any help on it. To make searching and indexing fast concurrently.
>> Thanks.
>>
>>
>> Regards,
>> Nitin


Re: Only indexing changed documents

2015-08-07 Thread Shawn Heisey
On 8/7/2015 8:56 AM, Davis, Daniel (NIH/NLM) [C] wrote:
> I have an application that knows enough to tell me that a document has been 
> updated, but not which document has been updated.There aren't that many 
> documents in this core/collection - just a couple of 1000.   So far I've just 
> been pumping them all to the update handler every week, but the business folk 
> really want the database and the index to be synchronized when the back-end 
> staff make an update.As is typical in indexing, updates are more frequent 
> than searchers (or at least are expected to be once things pick-up - we may 
> even reach a whopping 10k documents at some point :))
> 
> Each document has an id I wish to use as the unique ID, but I also want to 
> compute a signature.   Is there some way I can use an 
> updateRequestProcessorChain to throw away a document if its signature and 
> document id match based on real-time get?
> 
> My apologies if this is a duplicate of a prior question - solr-user is faily 
> high traffic.

My main Solr indexes are each generated from a MySQL database.  One
contains over 100 million rows, another over 200 million.  A third
contains about 18 million.  Here's how we handle the requirement you
asked about:

The main table has a delete id column that is its primary key.  This is
an autoincrement column.  There is another unique index on another
column in that table, which is the canonical unique identifier, used as
Solr's uniqueKey.

The main table also has triggers for DELETE and UPDATE which add records
to the idx_delete table (contains delete id values) and idx_reinsert
table (contains unique key values).  These extra tables each have a
primary key on an autoincrement column.

The build program (written in Java using SolrJ) tracks three values for
every update -- the last did value in the main table, and the last id
value in idx_delete and idx_reinsert.

An update cycle (which we run at least once a minute) consists of
reading new records in the idx tables, doing the deletes and reinserts
using the main table identifiers found there, and then indexing new
records from the main table.  In each of those three tables, new records
are identified by looking for rows with a primary key value that's
higher than the last-recorded number.

The build program has a "full rebuild" capability that leverages the
dataimport handler on a set of build cores, which are swapped with the
live cores when the rebuild completes.  If the destination is SolrCloud,
then core swapping won't work, but SolrCloud has the collection alias
feature which can work much the same as core swapping.

This works very well.  There are many additional details to the
implementation, but that's a high-level description of one way to keep a
Solr index in sync with a database.

I don't think I'd bother with the signature requirement you mentioned.
As long as your uniqueKey is properly set up, indexing the same document
again will just replace it in the index, and you won't need to worry
about whether it is exactly the same as the previous version.  If you
actually want to do this, it looks like you were given a method by
Upayavira.

Thanks,
Shawn



RE: Only indexing changed documents

2015-08-07 Thread Davis, Daniel (NIH/NLM) [C]
Thanks - key is that signature field will not be id, and overwriteDupes will be 
false:

  false
  sig

-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk] 
Sent: Friday, August 07, 2015 11:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Only indexing changed documents

Use the DedupUpdateProcessor, which can compute a signature based upon the 
specified fields.

Upayavira

On Fri, Aug 7, 2015, at 03:56 PM, Davis, Daniel (NIH/NLM) [C] wrote:
> I have an application that knows enough to tell me that a document has
> been updated, but not which document has been updated.There aren't
> that many documents in this core/collection - just a couple of 1000.   So
> far I've just been pumping them all to the update handler every week, 
> but the business folk really want the database and the index to be
> synchronized when the back-end staff make an update.As is typical in
> indexing, updates are more frequent than searchers (or at least are 
> expected to be once things pick-up - we may even reach a whopping 10k 
> documents at some point :))
> 
> Each document has an id I wish to use as the unique ID, but I also want
> to compute a signature.   Is there some way I can use an
> updateRequestProcessorChain to throw away a document if its signature 
> and document id match based on real-time get?
> 
> My apologies if this is a duplicate of a prior question - solr-user is 
> faily high traffic.
> 
> Dan Davis, Systems/Applications Architect (Contractor), Office of 
> Computer and Communications Systems, National Library of Medicine, NIH
> 


Re: Concurrent Indexing and Searching in Solr.

2015-08-07 Thread Upayavira
How many CPUs do you have? 100 concurrent indexing calls seems like
rather a lot. You're gonna end up doing a lot of context switching,
hence degraded performance. Dunno what others would say, but I'd aim for
approx one indexing thread per CPU.

Upayavira

On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote:
> Hello Everyone,
>   I have indexed 16 million documents in Solr
> Cloud. Created 4 nodes and 8 shards with single replica.
> I am trying to make concurrent indexing and searching on those indexed
> documents. Trying to make 100 concurrent indexing calls along with 100
> concurrent searching calls.
> It *degrades searching and indexing* performance both.
> 
> Configuration :
> 
>   "commitWithin":{"softCommit":true},
>   "autoCommit":{
> "maxDocs":-1,
> "maxTime":6,
> "openSearcher":false},
>   "autoSoftCommit":{
> "maxDocs":-1,
> "maxTime":3000}},
> 
>   "indexConfig":{
>   "maxBufferedDocs":-1,
>   "maxMergeDocs":-1,
>   "maxIndexingThreads":8,
>   "mergeFactor":-1,
>   "ramBufferSizeMB":100.0,
>   "writeLockTimeout":-1,
>   "lockType":"native"}}}
> 
> AND  2
> 
> I don't have know that how master and slave works. Normally, I created 8
> shards and indexed documents using :
> 
> 
> 
> 
> *http://localhost:8983/solr/test_commit_fast/update/json?commit=true
>  -H
> 'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching
> using
> *: http://localhost:8983/solr/test_commit_fast/select
> *?q=< field_name:
> search_string>
> 
> Please any help on it. To make searching and indexing fast concurrently.
> Thanks.
> 
> 
> Regards,
> Nitin


Re: Only indexing changed documents

2015-08-07 Thread Upayavira
Use the DedupUpdateProcessor, which can compute a signature based upon
the specified fields.

Upayavira

On Fri, Aug 7, 2015, at 03:56 PM, Davis, Daniel (NIH/NLM) [C] wrote:
> I have an application that knows enough to tell me that a document has
> been updated, but not which document has been updated.There aren't
> that many documents in this core/collection - just a couple of 1000.   So
> far I've just been pumping them all to the update handler every week, but
> the business folk really want the database and the index to be
> synchronized when the back-end staff make an update.As is typical in
> indexing, updates are more frequent than searchers (or at least are
> expected to be once things pick-up - we may even reach a whopping 10k
> documents at some point :))
> 
> Each document has an id I wish to use as the unique ID, but I also want
> to compute a signature.   Is there some way I can use an
> updateRequestProcessorChain to throw away a document if its signature and
> document id match based on real-time get?
> 
> My apologies if this is a duplicate of a prior question - solr-user is
> faily high traffic.
> 
> Dan Davis, Systems/Applications Architect (Contractor),
> Office of Computer and Communications Systems,
> National Library of Medicine, NIH
> 


Only indexing changed documents

2015-08-07 Thread Davis, Daniel (NIH/NLM) [C]
I have an application that knows enough to tell me that a document has been 
updated, but not which document has been updated.There aren't that many 
documents in this core/collection - just a couple of 1000.   So far I've just 
been pumping them all to the update handler every week, but the business folk 
really want the database and the index to be synchronized when the back-end 
staff make an update.As is typical in indexing, updates are more frequent 
than searchers (or at least are expected to be once things pick-up - we may 
even reach a whopping 10k documents at some point :))

Each document has an id I wish to use as the unique ID, but I also want to 
compute a signature.   Is there some way I can use an 
updateRequestProcessorChain to throw away a document if its signature and 
document id match based on real-time get?

My apologies if this is a duplicate of a prior question - solr-user is faily 
high traffic.

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH



Re: how to extend JavaBinCodec and make it available in solrj api

2015-08-07 Thread Shalin Shekhar Mangar
The thing is that you are trying to introduce custom xml tags which
require changing the response writers. Instead, if you just used
nested maps/lists or SimpleOrderedMap/NamedList then every response
writer should be able to just directly write the output. Nesting is
not a problem.

On Fri, Aug 7, 2015 at 6:09 PM, Dmitry Kan  wrote:
> Shawn:
>
> thanks, we found an intermediate solution by serializing our data structure
> using string representation, perhaps less optimal than using binary format
> directly.
>
> In the original router with JavaBinCodec we found, that
> BinaryResponseWriter should also be extended. But the following method is
> static and does allow extending:
>
> public static NamedList getParsedResponse(SolrQueryRequest
> req, SolrQueryResponse rsp) {
>   try {
> Resolver resolver = new Resolver(req, rsp.getReturnFields());
>
> ByteArrayOutputStream out = new ByteArrayOutputStream();
> new JavaBinCodec(resolver).marshal(rsp.getValues(), out);
>
> InputStream in = new ByteArrayInputStream(out.toByteArray());
> return (NamedList) new JavaBinCodec(resolver).unmarshal(in);
>   }
>   catch (Exception ex) {
> throw new RuntimeException(ex);
>   }
> }
>
>
>
> Shalin:
>
> We needed new data structure in highlighter with more nested levels,
> than just one. Something like this (in xml representation):
>
> 
>   
> 
>   
>
>  id1
>
>  Snippet text goes here
>
>  
>
>   
>
> 
>
>
>   
>
> Can this be modelled with existing types?
>
>
> On Thu, Aug 6, 2015 at 9:47 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> What do you mean by a custom format? As long as your custom component
>> is writing primitives or NamedList/SimpleOrderedMap or collections
>> such as List/Map, any response writer should be able to handle them.
>>
>> On Wed, Aug 5, 2015 at 5:08 PM, Dmitry Kan  wrote:
>> > Hello,
>> >
>> > Solr: 5.2.1
>> > class: org.apache.solr.common.util.JavaBinCodec
>> >
>> > I'm working on a custom data structure for the highlighter. The data
>> > structure is ready in JSON and XML formats. I need also JavaBin format.
>> The
>> > data structure is already made serializable by extending the
>> WritableValue
>> > class (methods write and resolve).
>> >
>> > To receive the custom format on the client via solrj api, the data
>> > structure needs to be parseable by JavaBinCodec. Is this correct
>> > assumption? Can we introduce the custom data structure consumer on the
>> > solrj api without complete overhaul of the api? Is there plugin framework
>> > such that JavaBinCodec is extended and used for the new data structure?
>> >
>> >
>> >
>> > --
>> > Dmitry Kan
>> > Luke Toolbox: http://github.com/DmitryKey/luke
>> > Blog: http://dmitrykan.blogspot.com
>> > Twitter: http://twitter.com/dmitrykan
>> > SemanticAnalyzer: www.semanticanalyzer.info
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
>
>
> --
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info



-- 
Regards,
Shalin Shekhar Mangar.


Re: Filtering documents using payloads

2015-08-07 Thread Jamie Johnson
Looks like my issue is that my nextDoc call is consuming the first
position, and then on the call to nextPosition it's moving past where I
want it to be.  I believe that I have this working properly now by checking
if the current position should or shouldn't be incremented.

On Thu, Aug 6, 2015 at 7:35 PM, Jamie Johnson  wrote:

> I am attempting to put together a DocsAndPositionsEnum that can hide terms
> given the payload on the term.  The idea is that if a term has a particular
> access control and the user does not I don't want it to be visible.  I have
> based this off of
>
>
> https://github.com/roshanp/lucure-core/blob/master/src/main/java/com/lucure/core/codec/AccessFilteredDocsAndPositionsEnum.java
>
> with some modifications to try and preserve the position information that
> is consumed as part of the hasAccess method.  The current iteration that I
> am working with seems to be providing wrong positions to the
> ExactPhraseScorer.phraseFreq() method in the ChunkState's that are
> calculated.
>
> Below is my current iteration on this, but I can't narrow down why exactly
> the position information isn't what I expect.  Does anything jump out?
>
> package com.lucure.core.codec;
>
> import com.lucure.core.AuthorizationsHolder;
> import com.lucure.core.security.Authorizations;
> import com.lucure.core.security.FieldVisibility;
> import com.lucure.core.security.VisibilityParseException;
> import org.apache.lucene.index.DocsAndPositionsEnum;
> import org.apache.lucene.util.AttributeSource;
> import org.apache.lucene.util.BytesRef;
>
> import java.io.IOException;
> import java.util.Arrays;
>
> import static com.lucure.core.codec.AccessFilteredDocsAndPositionsEnum
>   .AllAuthorizationsHolder.ALLAUTHSHOLDER;
>
> /**
>  * Enum to read and restrict access to a document based on the payload
> which
>  * is expected to store the visibility
>  */
> public class AccessFilteredDocsAndPositionsEnum extends
> DocsAndPositionsEnum {
>
> /**
>  * This placeholder allows for lucene specific operations such as
>  * merge to read data with all authorizations enabled. This should
> never
>  * be used outside of the Codec.
>  */
> static class AllAuthorizationsHolder extends AuthorizationsHolder {
>
> static final AllAuthorizationsHolder ALLAUTHSHOLDER = new
> AllAuthorizationsHolder();
>
> private AllAuthorizationsHolder() {
> super(Authorizations.EMPTY);
> }
> }
>
> static void enableMergeAuthorizations() {
> AuthorizationsHolder.threadAuthorizations.set(ALLAUTHSHOLDER);
> }
>
> static void disableMergeAuthorizations() {
> AuthorizationsHolder.threadAuthorizations.remove();
> }
>
> private final DocsAndPositionsEnum docsAndPositionsEnum;
> private final AuthorizationsHolder authorizationsHolder;
>
> public AccessFilteredDocsAndPositionsEnum(
>   DocsAndPositionsEnum docsAndPositionsEnum) {
> this(docsAndPositionsEnum,
> AuthorizationsHolder.threadAuthorizations.get());
> }
>
> public AccessFilteredDocsAndPositionsEnum(
>   DocsAndPositionsEnum docsAndPositionsEnum,
>   AuthorizationsHolder authorizationsHolder) {
> this.docsAndPositionsEnum = docsAndPositionsEnum;
> this.authorizationsHolder = authorizationsHolder;
> }
>
> long cost;
> int endOffset, startOffset, currentPosition, freq, docId;
> BytesRef payload;
>
> @Override
> public int nextPosition() throws IOException {
>
> while (!hasAccess()) {
>
> }
>
> return currentPosition + 1;
> }
>
> @Override
> public int startOffset() throws IOException {
> return startOffset;
> }
>
> @Override
> public int endOffset() throws IOException {
> return endOffset;
> }
>
> @Override
> public BytesRef getPayload() throws IOException {
> return payload;
> }
>
> @Override
> public int freq() throws IOException {
> return docsAndPositionsEnum.freq();
> }
>
> @Override
> public int docID() {
> return docsAndPositionsEnum.docID();
> }
>
> @Override
> public int nextDoc() throws IOException {
>
> while (docsAndPositionsEnum.nextDoc() != NO_MORE_DOCS) {
>
> if (hasAccess()) {
> return docID();
> }
> }
> return NO_MORE_DOCS;
>
> }
>
> @Override
> public int advance(int target) throws IOException {
> int advance = docsAndPositionsEnum.advance(target);
> if (advance != NO_MORE_DOCS) {
> if (hasAccess()) {
> return docID();
> } else {
> //seek to next available
> int doc;
> while ((doc = nextDoc()) < target) {
> }
> return doc;
> }
> }
> return NO_MORE_DOCS;
> }
>
> @Override
>   

Concurrent Indexing and Searching in Solr.

2015-08-07 Thread Nitin Solanki
Hello Everyone,
  I have indexed 16 million documents in Solr
Cloud. Created 4 nodes and 8 shards with single replica.
I am trying to make concurrent indexing and searching on those indexed
documents. Trying to make 100 concurrent indexing calls along with 100
concurrent searching calls.
It *degrades searching and indexing* performance both.

Configuration :

  "commitWithin":{"softCommit":true},
  "autoCommit":{
"maxDocs":-1,
"maxTime":6,
"openSearcher":false},
  "autoSoftCommit":{
"maxDocs":-1,
"maxTime":3000}},

  "indexConfig":{
  "maxBufferedDocs":-1,
  "maxMergeDocs":-1,
  "maxIndexingThreads":8,
  "mergeFactor":-1,
  "ramBufferSizeMB":100.0,
  "writeLockTimeout":-1,
  "lockType":"native"}}}

AND  2

I don't have know that how master and slave works. Normally, I created 8
shards and indexed documents using :




*http://localhost:8983/solr/test_commit_fast/update/json?commit=true
 -H
'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching using
*: http://localhost:8983/solr/test_commit_fast/select
*?q=< field_name:
search_string>

Please any help on it. To make searching and indexing fast concurrently.
Thanks.


Regards,
Nitin


Solr 5.2 index time field boost not working as expected

2015-08-07 Thread dinesh naik
Hi all,

We need to boost a field in a document if field matches certain criteria.

For example:

if title contains "Secrete" , then we want to boost the field to 100 .

For this we have the below code in solrj api while indexing the document:


Collection docs = new ArrayList();

SolrInputDocument doc = new SolrInputDocument();
doc.addField("title", "Secrete" , 100.0f); // Field Boost
doc.addField("id", 11);
doc.addField("modelnumber", "AK10005");
doc.addField("name", "XX5");

docs.add(doc);

Also , we made omitNorms="false" for this field in schema.xml



But still we do not see this document coming at the top. Is there any other
setting which has to be done for index time boosting?


Best Regards,
Dinesh Naik


-- 
Best Regards,
Dinesh Naik


Re: Streaming API running a simple query

2015-08-07 Thread Joel Bernstein
Hi,

There is a new error handling framework in trunk (SOLR-7441) for the
Streaming API, Streaming Expressions.

So if you're purely in testing mode, it will be much easier to work in
trunk then Solr 5.2.

If you run into errors in trunk that are still confusing please continue to
report them so we can get all the error messages covered.

Thanks,

Joel


Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Aug 7, 2015 at 6:19 AM, Selvam  wrote:

> Hi,
>
> Sorry, it is working now.
>
> curl --data-urlencode
> 'stream=search(gettingstarted,q="*:*",fl="id",sort="id asc")'
> http://localhost:8983/solr/gettingstarted/stream
>
> I missed *'asc'* in sort :)
>
> Thanks for the help Shawn Heisey.
>
> On Fri, Aug 7, 2015 at 3:46 PM, Selvam  wrote:
>
> > Hi,
> >
> > Thanks for your update, yes, I was missing the cloud mode, I am new to
> the
> > world of Solr cloud. Now I have enabled a single node (with two shards &
> > replicas) that runs on 8983 port along with zookeeper running on 9983
> port.
> > When I run,
> >
> >  curl --data-urlencode
> > 'stream=search(gettingstarted,q="*:*",fl="id",sort="id")'
> > http://localhost:8983/solr/gettingstarted/stream
> >
> > Again, I get
> >
> > "Unable to construct instance of
> > org.apache.solr.client.solrj.io.stream.CloudSolrStream
> > .
> > .
> >
> > Caused by: java.lang.reflect.InvocationTargetException
> > .
> > .
> > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1"
> >
> > I tried different port, 9983 as well, which returns "Empty reply from
> > server". I think I miss some obvious configuration.
> >
> >
> >
> >
> > On Fri, Aug 7, 2015 at 2:04 PM, Shawn Heisey 
> wrote:
> >
> >> On 8/7/2015 1:37 AM, Selvam wrote:
> >> >
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
> >> >
> >> > I tried this from my linux terminal,
> >> > 1)   curl --data-urlencode
> >> > 'stream=search(gettingstarted,q="*:*",fl="id",sort="id")'
> >> > http://localhost:8983/solr/gettingstarted/stream
> >> >
> >> > Threw zkHost error. Then tried with,
> >> >
> >> > 2)   curl --data-urlencode
> >> >
> >>
> 'stream=search(gettingstarted,zkHost="localhost:8983",q="*:*",fl="id",sort="id")'
> >> > http://localhost:8983/solr/gettingstarted/stream
> >> >
> >> > It throws me "java.lang.ArrayIndexOutOfBoundsException: 1\n\tat
> >> >
> >>
> org.apache.solr.client.solrj.io.stream.CloudSolrStream.parseComp(CloudSolrStream.java:260)"
> >>
> >> The documentation page you linked seems to indicate that this is a
> >> feature that only works in SolrCloud.  Your inclusion of
> >> "localhost:8983" as the zkHost suggests that either you are NOT running
> >> in cloud mode, or that you do not understand what zkHost means.
> >>
> >> Zookeeper runs on a different port than Solr.  8983 is Solr's port.  If
> >> you are running a 5.x cloud with the embedded zookeeper, it is most
> >> likely running on port 9983.  If you are running in cloud mode with a
> >> properly configured external zookeeper, then your zkHost parameter will
> >> probably have three hosts in it with port 2181.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
> >
> >
> > --
> > Regards,
> > Selvam
> > KnackForge 
> > Acquia Service Partner
> > No. 1, 12th Line, K.K. Road, Venkatapuram,
> > Ambattur, Chennai,
> > Tamil Nadu, India.
> > PIN - 600 053.
> >
>
>
>
> --
> Regards,
> Selvam
> KnackForge 
> Acquia Service Partner
> No. 1, 12th Line, K.K. Road, Venkatapuram,
> Ambattur, Chennai,
> Tamil Nadu, India.
> PIN - 600 053.
>


Re: how to extend JavaBinCodec and make it available in solrj api

2015-08-07 Thread Dmitry Kan
Shawn:

thanks, we found an intermediate solution by serializing our data structure
using string representation, perhaps less optimal than using binary format
directly.

In the original router with JavaBinCodec we found, that
BinaryResponseWriter should also be extended. But the following method is
static and does allow extending:

public static NamedList getParsedResponse(SolrQueryRequest
req, SolrQueryResponse rsp) {
  try {
Resolver resolver = new Resolver(req, rsp.getReturnFields());

ByteArrayOutputStream out = new ByteArrayOutputStream();
new JavaBinCodec(resolver).marshal(rsp.getValues(), out);

InputStream in = new ByteArrayInputStream(out.toByteArray());
return (NamedList) new JavaBinCodec(resolver).unmarshal(in);
  }
  catch (Exception ex) {
throw new RuntimeException(ex);
  }
}



Shalin:

We needed new data structure in highlighter with more nested levels,
than just one. Something like this (in xml representation):


  

  

 id1

 Snippet text goes here

 

  




  

Can this be modelled with existing types?


On Thu, Aug 6, 2015 at 9:47 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> What do you mean by a custom format? As long as your custom component
> is writing primitives or NamedList/SimpleOrderedMap or collections
> such as List/Map, any response writer should be able to handle them.
>
> On Wed, Aug 5, 2015 at 5:08 PM, Dmitry Kan  wrote:
> > Hello,
> >
> > Solr: 5.2.1
> > class: org.apache.solr.common.util.JavaBinCodec
> >
> > I'm working on a custom data structure for the highlighter. The data
> > structure is ready in JSON and XML formats. I need also JavaBin format.
> The
> > data structure is already made serializable by extending the
> WritableValue
> > class (methods write and resolve).
> >
> > To receive the custom format on the client via solrj api, the data
> > structure needs to be parseable by JavaBinCodec. Is this correct
> > assumption? Can we introduce the custom data structure consumer on the
> > solrj api without complete overhaul of the api? Is there plugin framework
> > such that JavaBinCodec is extended and used for the new data structure?
> >
> >
> >
> > --
> > Dmitry Kan
> > Luke Toolbox: http://github.com/DmitryKey/luke
> > Blog: http://dmitrykan.blogspot.com
> > Twitter: http://twitter.com/dmitrykan
> > SemanticAnalyzer: www.semanticanalyzer.info
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Re: How to do sorting instead of using bq

2015-08-07 Thread rachun
Hi Upayavira,

Yes, I have create boost which is come from outside and I can't index them
into the doc because it need to be realtime sort. 

How can i do?
other solution for the sort in this case?

Thank you very much,
Chun.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-do-sorting-instead-of-using-bq-tp4221475p4221605.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search for All CAPS words

2015-08-07 Thread rks_lucene
Took me a while but I tried it and its works perfectly. Thanks a lot !!
Ritesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-for-All-CAPS-words-tp4219893p4221597.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Streaming API running a simple query

2015-08-07 Thread Selvam
Hi,

Sorry, it is working now.

curl --data-urlencode
'stream=search(gettingstarted,q="*:*",fl="id",sort="id asc")'
http://localhost:8983/solr/gettingstarted/stream

I missed *'asc'* in sort :)

Thanks for the help Shawn Heisey.

On Fri, Aug 7, 2015 at 3:46 PM, Selvam  wrote:

> Hi,
>
> Thanks for your update, yes, I was missing the cloud mode, I am new to the
> world of Solr cloud. Now I have enabled a single node (with two shards &
> replicas) that runs on 8983 port along with zookeeper running on 9983 port.
> When I run,
>
>  curl --data-urlencode
> 'stream=search(gettingstarted,q="*:*",fl="id",sort="id")'
> http://localhost:8983/solr/gettingstarted/stream
>
> Again, I get
>
> "Unable to construct instance of
> org.apache.solr.client.solrj.io.stream.CloudSolrStream
> .
> .
>
> Caused by: java.lang.reflect.InvocationTargetException
> .
> .
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1"
>
> I tried different port, 9983 as well, which returns "Empty reply from
> server". I think I miss some obvious configuration.
>
>
>
>
> On Fri, Aug 7, 2015 at 2:04 PM, Shawn Heisey  wrote:
>
>> On 8/7/2015 1:37 AM, Selvam wrote:
>> > https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
>> >
>> > I tried this from my linux terminal,
>> > 1)   curl --data-urlencode
>> > 'stream=search(gettingstarted,q="*:*",fl="id",sort="id")'
>> > http://localhost:8983/solr/gettingstarted/stream
>> >
>> > Threw zkHost error. Then tried with,
>> >
>> > 2)   curl --data-urlencode
>> >
>> 'stream=search(gettingstarted,zkHost="localhost:8983",q="*:*",fl="id",sort="id")'
>> > http://localhost:8983/solr/gettingstarted/stream
>> >
>> > It throws me "java.lang.ArrayIndexOutOfBoundsException: 1\n\tat
>> >
>> org.apache.solr.client.solrj.io.stream.CloudSolrStream.parseComp(CloudSolrStream.java:260)"
>>
>> The documentation page you linked seems to indicate that this is a
>> feature that only works in SolrCloud.  Your inclusion of
>> "localhost:8983" as the zkHost suggests that either you are NOT running
>> in cloud mode, or that you do not understand what zkHost means.
>>
>> Zookeeper runs on a different port than Solr.  8983 is Solr's port.  If
>> you are running a 5.x cloud with the embedded zookeeper, it is most
>> likely running on port 9983.  If you are running in cloud mode with a
>> properly configured external zookeeper, then your zkHost parameter will
>> probably have three hosts in it with port 2181.
>>
>> Thanks,
>> Shawn
>>
>>
>
>
> --
> Regards,
> Selvam
> KnackForge 
> Acquia Service Partner
> No. 1, 12th Line, K.K. Road, Venkatapuram,
> Ambattur, Chennai,
> Tamil Nadu, India.
> PIN - 600 053.
>



-- 
Regards,
Selvam
KnackForge 
Acquia Service Partner
No. 1, 12th Line, K.K. Road, Venkatapuram,
Ambattur, Chennai,
Tamil Nadu, India.
PIN - 600 053.


Re: are facets or MatchAllDocsQuery not cached?

2015-08-07 Thread Bernd Fehling
Hi Mikhail, you are my hero!

I added your patch to 4.10.4, recompiled, tested and pushed from testing stage
to online system. What a difference!

Right after restart the performance increase for faceting is times 10.
Qtime for MatchAllDocsQuery(*:*) and docValues and faceting went down from
around 35 seconds to 3.5 seconds for faceting.
After 1 hour under load the qtime is somewhere around 15 percent performance
increase for faceting.

This patch is a must have!

Regards
Bernd


Am 07.08.2015 um 08:45 schrieb Bernd Fehling:
> 
> 
> Am 06.08.2015 um 17:48 schrieb Mikhail Khludnev:
>> On Thu, Aug 6, 2015 at 3:56 PM, Bernd Fehling <
>> bernd.fehl...@uni-bielefeld.de> wrote:
>>
>>>
>>>
>>> Am 06.08.2015 um 14:33 schrieb Upayavira:
 Typically such performance issues with faceting are to do with the time
 spend uninverting the index before calculating the facet counts.

 If you indexed the fields with docValues enabled, perhaps you could then
 use them for faceting, which might improve performance.
>>>
>>> Well, this is against my observations. When I used uninverted fields
>>> without
>>> docValues I had a much better 99 percentile qtime but a very high heap
>>> consumption.
>>> Now with docValues the heap usage went down, but the 99 percentile
>>> qtime for MatchAllDocsQuery(*:*) went up to above 33 seconds.
>>>
>>
>>  Note about performance optimization for DocValues faceting in forthcoming
>> 5.3
>>
> 
> Thanks for pointing this out.
> Do you think that it is possible to merge this patch into 4.10.4 or
> is 5.3 to far away from 4.10.4 in this area of code?
> 
> 
>>
>>>

 If you are using a non-docValues field, and the second query is faster,
 then you could add the query to your static warming, look for
 newSearcher in your solrconfig.xml. That will execute your query,
 warming the caches used by faceting, before a new searcher is made
 available for searches.
>>>
>>> The q=*.* with sorting and facetting is always the first query I'm doing
>>> at static warming and it helped until switching to docValues :-(
>>>
>>> Bernd
>>>

 Upayavira

 On Thu, Aug 6, 2015, at 12:38 PM, Toke Eskildsen wrote:
> On Thu, 2015-08-06 at 13:00 +0200, Bernd Fehling wrote:
>> Single Index Solr 4.10.4, optimized Index, 76M docs, 235GB index size.
>>
>> I was analysing my solr logs and it turned out that I have some queries
>> which are above 30 seconds qtime while normally the qtime is below 1
>>> second.
>> Looking closer about the queries it turned out that this is for
>>> MatchAllDocsQuery(*:*).
>> Next was to turn debugQuery on and see where the bottleneck is.
>> The result was that the facetting is consuming most of the qtime.
>>
>> So the question is, are facets or is facetting not cached?
>
> As far as I know it is not. 35 seconds for a match-all faceting sounds
> fairly on par with what we are seeing (250M docs, 900GB shard).
>
> Of course response time is very depending on the field itself. If you
> have very few unique values in your facet field(s), you might try
> facet.method=enum. If that is not the case, your best bet would probably
> be to cache the match-all outside of Solr.
>
>> My assumption is that the queryResultCache is catching such a
>>> MatchAllDocsQuery(*:*).
>
> It only stores the docIDs.
>
> I don't know why there is is no all_parameters -> complete_response
> cache in Solr.
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>>>
> 

-- 
*
Bernd FehlingBielefeld University Library
Dipl.-Inform. (FH)LibTec - Library Technology
Universitätsstr. 25  and Knowledge Management
33615 Bielefeld
Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de

BASE - Bielefeld Academic Search Engine - www.base-search.net
*


Re: Streaming API running a simple query

2015-08-07 Thread Selvam
Hi,

Thanks for your update, yes, I was missing the cloud mode, I am new to the
world of Solr cloud. Now I have enabled a single node (with two shards &
replicas) that runs on 8983 port along with zookeeper running on 9983 port.
When I run,

 curl --data-urlencode
'stream=search(gettingstarted,q="*:*",fl="id",sort="id")'
http://localhost:8983/solr/gettingstarted/stream

Again, I get

"Unable to construct instance of
org.apache.solr.client.solrj.io.stream.CloudSolrStream
.
.

Caused by: java.lang.reflect.InvocationTargetException
.
.
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1"

I tried different port, 9983 as well, which returns "Empty reply from
server". I think I miss some obvious configuration.




On Fri, Aug 7, 2015 at 2:04 PM, Shawn Heisey  wrote:

> On 8/7/2015 1:37 AM, Selvam wrote:
> > https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
> >
> > I tried this from my linux terminal,
> > 1)   curl --data-urlencode
> > 'stream=search(gettingstarted,q="*:*",fl="id",sort="id")'
> > http://localhost:8983/solr/gettingstarted/stream
> >
> > Threw zkHost error. Then tried with,
> >
> > 2)   curl --data-urlencode
> >
> 'stream=search(gettingstarted,zkHost="localhost:8983",q="*:*",fl="id",sort="id")'
> > http://localhost:8983/solr/gettingstarted/stream
> >
> > It throws me "java.lang.ArrayIndexOutOfBoundsException: 1\n\tat
> >
> org.apache.solr.client.solrj.io.stream.CloudSolrStream.parseComp(CloudSolrStream.java:260)"
>
> The documentation page you linked seems to indicate that this is a
> feature that only works in SolrCloud.  Your inclusion of
> "localhost:8983" as the zkHost suggests that either you are NOT running
> in cloud mode, or that you do not understand what zkHost means.
>
> Zookeeper runs on a different port than Solr.  8983 is Solr's port.  If
> you are running a 5.x cloud with the embedded zookeeper, it is most
> likely running on port 9983.  If you are running in cloud mode with a
> properly configured external zookeeper, then your zkHost parameter will
> probably have three hosts in it with port 2181.
>
> Thanks,
> Shawn
>
>


-- 
Regards,
Selvam
KnackForge 
Acquia Service Partner
No. 1, 12th Line, K.K. Road, Venkatapuram,
Ambattur, Chennai,
Tamil Nadu, India.
PIN - 600 053.


Suggester needed for returning suggestions when term is not start of field value

2015-08-07 Thread Thomas Michael Engelke
 Hey,

I'm playing around with the suggester component, and it works perfectly
as described: Suggestions for 'logitech mouse' include 'logitech mouse
g500' and 'logitech mouse gaming'.

However, when the words in the record supplying the suggester do not
follow each other as in the search terms, nothing is returned.
Suggestions for 'logitech mouse' do not include 'logitech g500 mouse'.

Is there a suggester implementation that can suggest records that way?

Best wishes. 

Re: Streaming API running a simple query

2015-08-07 Thread Shawn Heisey
On 8/7/2015 1:37 AM, Selvam wrote:
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
> 
> I tried this from my linux terminal,
> 1)   curl --data-urlencode
> 'stream=search(gettingstarted,q="*:*",fl="id",sort="id")'
> http://localhost:8983/solr/gettingstarted/stream
> 
> Threw zkHost error. Then tried with,
> 
> 2)   curl --data-urlencode
> 'stream=search(gettingstarted,zkHost="localhost:8983",q="*:*",fl="id",sort="id")'
> http://localhost:8983/solr/gettingstarted/stream
> 
> It throws me "java.lang.ArrayIndexOutOfBoundsException: 1\n\tat
> org.apache.solr.client.solrj.io.stream.CloudSolrStream.parseComp(CloudSolrStream.java:260)"

The documentation page you linked seems to indicate that this is a
feature that only works in SolrCloud.  Your inclusion of
"localhost:8983" as the zkHost suggests that either you are NOT running
in cloud mode, or that you do not understand what zkHost means.

Zookeeper runs on a different port than Solr.  8983 is Solr's port.  If
you are running a 5.x cloud with the embedded zookeeper, it is most
likely running on port 9983.  If you are running in cloud mode with a
properly configured external zookeeper, then your zkHost parameter will
probably have three hosts in it with port 2181.

Thanks,
Shawn



Streaming API running a simple query

2015-08-07 Thread Selvam
Hi All,

I am trying to use Streaming API in Solr 5.2.

For eg as per
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions

I tried this from my linux terminal,
1)   curl --data-urlencode
'stream=search(gettingstarted,q="*:*",fl="id",sort="id")'
http://localhost:8983/solr/gettingstarted/stream

Threw zkHost error. Then tried with,

2)   curl --data-urlencode
'stream=search(gettingstarted,zkHost="localhost:8983",q="*:*",fl="id",sort="id")'
http://localhost:8983/solr/gettingstarted/stream

It throws me "java.lang.ArrayIndexOutOfBoundsException: 1\n\tat
org.apache.solr.client.solrj.io.stream.CloudSolrStream.parseComp(CloudSolrStream.java:260)"

Kindly let me know if you have worked with Streaming API and a way to fix
this issue.

-- 
Regards,
Selvam
KnackForge 


Re: Limits in individual filter sub queries

2015-08-07 Thread Selvam
Hi All,

I think Solr 5.1+ supports streaming API that can be used for my need.
Though it is not working for me right now. I will send another email for
that.





On Thu, Aug 6, 2015 at 3:08 PM, Selvam  wrote:

> Dear Toke,
>
> Thanks for your input. Infact my scenario is much more complex, let me
> give you an example,
>
> q=*.*&fq=(country:india AND age:[25 TO 40] AND sex:male) OR (country:iran AND
> income:[5 TO 9])
>
>
> You can see each subquery has different parameters, I may want to limit
> the first subquery count to 60 while second one to 40. Yes, I need to give
> different limit for each subquery. Could you suggest me a way to do this?
>
> Thanks again.
>
>
>
> On Thu, Aug 6, 2015 at 2:55 PM, Toke Eskildsen 
> wrote:
>
>> On Thu, 2015-08-06 at 12:32 +0530, Selvam wrote:
>> > Good day, I wanted to run a filter query (fq), say, I need to run
>> >
>> > q=*.*&fq=(country:india) OR (country:iran)&limit=100
>> >
>> > Now it may return me 100 records that might contain 70 Indians & 30 Iran
>> > records. Now how can I force to fetch 50 Indian & 50 Iran records using
>> a
>> > single SOLR query?
>>
>> q=*.*&fq=(country:india) OR (country:iran)
>> &group=true&group.field=country&group.limit=50
>>
>> https://cwiki.apache.org/confluence/display/solr/Result+Grouping
>>
>> - Toke Eskildsen, State and University Library, Denmark
>>
>>
>>
>
>
> --
> Regards,
> Selvam
>
>


-- 
Regards,
Selvam
KnackForge 
Acquia Service Partner
No. 1, 12th Line, K.K. Road, Venkatapuram,
Ambattur, Chennai,
Tamil Nadu, India.
PIN - 600 053.