master slave replication taking time

2017-06-27 Thread Midas A
Hi,

we have around 2000 documents and our master to slave replication is
taking time  upto 20 second.

What should i check ?


Re: Using of Streaming to join between shards

2017-06-27 Thread mganeshs
Hi Joel,

Thanks for confirming that Streaming would be too costly for high qps loads.

Regards,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-of-Streaming-to-join-between-shards-tp4342563p4343104.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tlogs not being deleted/truncated

2017-06-27 Thread Webster Homer
We also have the same collections in our development and QA environments.
In our Dev environment which is not using CDCR replication, but does have
autoCommit set, we have 440 tlog files. The only difference in the
configuration is that dev doesn't have the cdcr request handler configured.
It does have the solr.CdCrUpdateLog set.


  ${solr.ulog.dir:}



On Tue, Jun 27, 2017 at 3:11 PM, Webster Homer 
wrote:

> It appears right how that we are  not seeing an issue with the target
> collections, we definitely see a problem with the source collection.
> numRecordsToKeep and maxNumLogsToKeep are set to the default values of
> 100 and 10 respectively. We probably don't need 10 tlog files around.
>
>
> On Tue, Jun 27, 2017 at 11:32 AM, Webster Homer 
> wrote:
>
>> Commits were definitely not happening. We ran out of filesystem space.
>> The admins deleted old tlogs and restartd. The collection in question was
>> missing a lot of data. We reloaded it, and then we saw some commits. In
>> Solrcloud they look like this:
>> 2017-06-23 17:28:06.441 INFO  (commitScheduler-56-thread-1)
>> [c:sial-content-citations s:shard1 r:core_node2
>> x:sial-content-citations_shard1_replica1] o.a.s.u.DirectUpdateHandler2
>> start commit{,optimize=false,openSearcher=true,waitSearcher=true,e
>> xpungeDeletes=false,softCommit=true,prepareCommit=false}
>> 2017-06-23 17:28:07.823 INFO  (commitScheduler-56-thread-1)
>> [c:sial-content-citations s:shard1 r:core_node2
>> x:sial-content-citations_shard1_replica1] o.a.s.s.SolrIndexSearcher
>> Opening [Searcher@1c6a3bf1[sial-content-citations_shard1_replica1] main]
>> 2017-06-23 17:28:07.824 INFO  (commitScheduler-56-thread-1)
>> [c:sial-content-citations s:shard1 r:core_node2
>> x:sial-content-citations_shard1_replica1] o.a.s.u.DirectUpdateHandler2
>> end_commit_flush
>> 2017-06-23 17:28:49.665 INFO  (commitScheduler-66-thread-1)
>> [c:ehs-catalog-qmdoc s:shard2 r:core_node2 
>> x:ehs-catalog-qmdoc_shard2_replica1]
>> o.a.s.u.DirectUpdateHandler2 start commit{,optimize=false,openSea
>> rcher=false,waitSearcher=true,expungeDeletes=false,softCommi
>> t=false,prepareCommit=false}
>> 2017-06-23 17:28:49.742 INFO  (commitScheduler-66-thread-1)
>> [c:ehs-catalog-qmdoc s:shard2 r:core_node2 
>> x:ehs-catalog-qmdoc_shard2_replica1]
>> o.a.s.c.SolrDeletionPolicy SolrDeletionPolicy.onCommit: commits: num=2
>> commit{dir=NRTCachingDirectory(MMapDirectory@/var/solr/data/
>> ehs-catalog-qmdoc_shard2_replica1/data/index
>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@597830aa;
>> maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_2gb,generation=3179}
>>
>> I have been busy and couldn't get back to this issue until now. The
>> problem started happening again. I manually sent a commit and that seemed
>> to help for a time. Unfortunately I don't have access to our Production
>> solrs. we use logstash for the logs, but not all logs were being captured,
>> the commit messages above were not.
>>
>>
>> On Tue, Jun 20, 2017 at 5:34 PM, Erick Erickson 
>> wrote:
>>
>>> bq: Neither in our source collection nor in our target collections.
>>>
>>> Hmmm. You should see messages similar to the following which I just
>>> generated on Solr 6.2 (stand-alone I admit but that code should be the
>>> same):
>>>
>>> INFO  - 2017-06-20 21:11:55.424; [   x:techproducts]
>>> org.apache.solr.update.DirectUpdateHandler2; start
>>> commit{,optimize=false,openSearcher=false,waitSearcher=true,
>>> expungeDeletes=false,softCommit=false,prepareCommit=false}
>>>
>>> INFO  - 2017-06-20 21:11:55.425; [   x:techproducts]
>>> org.apache.solr.update.SolrIndexWriter; Calling setCommitData with
>>> IW:org.apache.solr.update.SolrIndexWriter@4862d97c
>>>
>>> INFO  - 2017-06-20 21:11:55.663; [   x:techproducts]
>>> org.apache.solr.core.SolrDeletionPolicy; SolrDeletionPolicy.onCommit:
>>> commits: num=2
>>>
>>> commit{dir=NRTCachingDirectory(MMapDirectory@/Users/Erick/ap
>>> ache/solrVersions/playspace/solr/example/techproducts/solr/t
>>> echproducts/data/index
>>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@3d8e7c06;
>>> maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_c,generation=12}
>>>
>>> commit{dir=NRTCachingDirectory(MMapDirectory@/Users/Erick/ap
>>> ache/solrVersions/playspace/solr/example/techproducts/solr/t
>>> echproducts/data/index
>>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@3d8e7c06;
>>> maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_d,generation=13}
>>>
>>> INFO  - 2017-06-20 21:11:55.663; [   x:techproducts]
>>> org.apache.solr.core.SolrDeletionPolicy; newest commit generation = 13
>>>
>>> INFO  - 2017-06-20 21:11:55.668; [   x:techproducts]
>>> org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
>>>
>>> whenever commits kick in.
>>>
>>> So how sure are you of your autocommit settings? Is there any chance
>>> the sysvar "solr.autoCommit.maxTime" is somehow being set to -1?
>>> Unlikely frankly since you have multiple tlogs.
>>>
>>> Another possibility is that somehow 

Re: SolrJ 6.6.0 Connection pool shutdown

2017-06-27 Thread Shawn Heisey
On 6/27/2017 6:50 AM, Markus Jelsma wrote:
> We have a proces checking presence of many documents in a collection, just a 
> simple client.getById(id). It sometimes begins throwing lots of  these 
> exceptions in a row:
>
> org.apache.solr.client.solrj.SolrServerException: 
> java.lang.IllegalStateException: Connection pool shut down
>
> Then, as suddenly as it appeared, it's gone again a no longer a problem. I 
> would expect SolrJ not to throw this but to wait until it the connection 
> pool, or whatever mechanism is there, to recover.
>
> Did i miss a magic parameter for SolrJ?\

That error message will be much longer than what you've provided here. 
It will have a java stacktrace that's typically a dozen or so lines
long.  There may also be one or more "Caused by" sections after the
stacktrace, each with a stacktrace of its own.  Can you share the full
error message?  Is the server also running 6.6.0, or a different version?

It would also be helpful if you can share the SolrJ code you've written,
cleanly redacted to remove anything sensitive.

That particular message ("Connection pool shut down") sounds like it
probably came from HttpClient, which SolrJ uses ... and I would expect
that to only happen if you close/shutdown the HttpClient or the
SolrClient.  After closing, a client can't be used any more.  Normally
the only time you should close a client is right before exiting the
program, although if the program's about to exit, it's generally
unnecessary, so in my opinion for *most* usages, closing the client
likely never needs to happen.

Thanks,
Shawn



Re: Tlogs not being deleted/truncated

2017-06-27 Thread Webster Homer
It appears right how that we are  not seeing an issue with the target
collections, we definitely see a problem with the source collection.
numRecordsToKeep and maxNumLogsToKeep are set to the default values of 100
and 10 respectively. We probably don't need 10 tlog files around.


On Tue, Jun 27, 2017 at 11:32 AM, Webster Homer 
wrote:

> Commits were definitely not happening. We ran out of filesystem space. The
> admins deleted old tlogs and restartd. The collection in question was
> missing a lot of data. We reloaded it, and then we saw some commits. In
> Solrcloud they look like this:
> 2017-06-23 17:28:06.441 INFO  (commitScheduler-56-thread-1)
> [c:sial-content-citations s:shard1 r:core_node2
> x:sial-content-citations_shard1_replica1] o.a.s.u.DirectUpdateHandler2
> start commit{,optimize=false,openSearcher=true,waitSearcher=true,e
> xpungeDeletes=false,softCommit=true,prepareCommit=false}
> 2017-06-23 17:28:07.823 INFO  (commitScheduler-56-thread-1)
> [c:sial-content-citations s:shard1 r:core_node2
> x:sial-content-citations_shard1_replica1] o.a.s.s.SolrIndexSearcher
> Opening [Searcher@1c6a3bf1[sial-content-citations_shard1_replica1] main]
> 2017-06-23 17:28:07.824 INFO  (commitScheduler-56-thread-1)
> [c:sial-content-citations s:shard1 r:core_node2
> x:sial-content-citations_shard1_replica1] o.a.s.u.DirectUpdateHandler2
> end_commit_flush
> 2017-06-23 17:28:49.665 INFO  (commitScheduler-66-thread-1)
> [c:ehs-catalog-qmdoc s:shard2 r:core_node2 
> x:ehs-catalog-qmdoc_shard2_replica1]
> o.a.s.u.DirectUpdateHandler2 start commit{,optimize=false,openSea
> rcher=false,waitSearcher=true,expungeDeletes=false,softCommi
> t=false,prepareCommit=false}
> 2017-06-23 17:28:49.742 INFO  (commitScheduler-66-thread-1)
> [c:ehs-catalog-qmdoc s:shard2 r:core_node2 
> x:ehs-catalog-qmdoc_shard2_replica1]
> o.a.s.c.SolrDeletionPolicy SolrDeletionPolicy.onCommit: commits: num=2
> commit{dir=NRTCachingDirectory(MMapDirectory@/var/solr/data/
> ehs-catalog-qmdoc_shard2_replica1/data/index
> lockFactory=org.apache.lucene.store.NativeFSLockFactory@597830aa;
> maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_2gb,generation=3179}
>
> I have been busy and couldn't get back to this issue until now. The
> problem started happening again. I manually sent a commit and that seemed
> to help for a time. Unfortunately I don't have access to our Production
> solrs. we use logstash for the logs, but not all logs were being captured,
> the commit messages above were not.
>
>
> On Tue, Jun 20, 2017 at 5:34 PM, Erick Erickson 
> wrote:
>
>> bq: Neither in our source collection nor in our target collections.
>>
>> Hmmm. You should see messages similar to the following which I just
>> generated on Solr 6.2 (stand-alone I admit but that code should be the
>> same):
>>
>> INFO  - 2017-06-20 21:11:55.424; [   x:techproducts]
>> org.apache.solr.update.DirectUpdateHandler2; start
>> commit{,optimize=false,openSearcher=false,waitSearcher=true,
>> expungeDeletes=false,softCommit=false,prepareCommit=false}
>>
>> INFO  - 2017-06-20 21:11:55.425; [   x:techproducts]
>> org.apache.solr.update.SolrIndexWriter; Calling setCommitData with
>> IW:org.apache.solr.update.SolrIndexWriter@4862d97c
>>
>> INFO  - 2017-06-20 21:11:55.663; [   x:techproducts]
>> org.apache.solr.core.SolrDeletionPolicy; SolrDeletionPolicy.onCommit:
>> commits: num=2
>>
>> commit{dir=NRTCachingDirectory(MMapDirectory@/Users/Erick/ap
>> ache/solrVersions/playspace/solr/example/techproducts/solr/
>> techproducts/data/index
>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@3d8e7c06;
>> maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_c,generation=12}
>>
>> commit{dir=NRTCachingDirectory(MMapDirectory@/Users/Erick/ap
>> ache/solrVersions/playspace/solr/example/techproducts/solr/
>> techproducts/data/index
>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@3d8e7c06;
>> maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_d,generation=13}
>>
>> INFO  - 2017-06-20 21:11:55.663; [   x:techproducts]
>> org.apache.solr.core.SolrDeletionPolicy; newest commit generation = 13
>>
>> INFO  - 2017-06-20 21:11:55.668; [   x:techproducts]
>> org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
>>
>> whenever commits kick in.
>>
>> So how sure are you of your autocommit settings? Is there any chance
>> the sysvar "solr.autoCommit.maxTime" is somehow being set to -1?
>> Unlikely frankly since you have multiple tlogs.
>>
>> Another possibility is that somehow you've changed the number of docs
>> to keep in the transaction logs and the number of transaction logs
>> through:
>> numRecordsToKeep
>> and
>> maxNumLogsToKeep
>>
>> And you'll note that the tlogs stat at 00082, which means many
>> have been deleted. Does the minimum number keep going up? If that's
>> the case tlogs _are_ being rolled over, just not very often. Why is a
>> mystery of course.
>>
>> So what happens if you issue manual commits on both source and target?
>>
>> It's unlikely that autocommit is totally 

Re: Dynamic fields vs parent child

2017-06-27 Thread Susheel Kumar
Do you have any close count of how many max dynamic fields you may have
(1k, 2k or 3k etc.). In one of our index we have a total around 2K dynamic
fields across all documents.

My suggestion would be to try out dynamic fields for the use case you are
describing and do some real performance test.

Thanks,
Susheel

On Tue, Jun 27, 2017 at 3:01 PM, Saurabh Sethi 
wrote:

> We have key-value pairs that need to be searchable. We are looking for best
> approach, both in terms of indexing (fast as well as space efficient) as
> well as retrieval (fast search).
>
> Right now, the two approaches that we have are: Nested docs or dynamic
> fields (myField_*_time:some date)
>
> The number of dynamic fields would definitely be > 1k.
>
> We wanted to get an idea which of these approaches work best or if there a
> third approach which is better than nested and dynamic fields.
>
> On Tue, Jun 27, 2017 at 5:39 AM, Susheel Kumar 
> wrote:
>
> > Can you describe your use case in terms of what business functionality
> you
> > are looking to achieve.
> >
> > Thanks,
> > Susheel
> >
> > On Mon, Jun 26, 2017 at 4:26 PM, Saurabh Sethi <
> saurabh.se...@sendgrid.com
> > >
> > wrote:
> >
> > > Number of dynamic fields will be in thousands (millions of users +
> > > thousands of events shared between subsets of users).
> > >
> > > We also thought about indexing in one field with value being
> > > fieldname_fieldvalue. Since we support range queries for dates and
> > numbers,
> > > it won't work out of box.
> > >
> > > On Mon, Jun 26, 2017 at 1:05 PM, Erick Erickson <
> erickerick...@gmail.com
> > >
> > > wrote:
> > >
> > > > How many distinct fields do you expect across _all_ documents? That
> > > > is, if doc1 has 10 dynamic fields and doc2 has 10 dynamic fields,
> will
> > > > there be exactly 10 fields total or more than 10 when you consider
> > > > both documents?
> > > >
> > > > 100s of fields total across all documents is a tractable problem.
> > > > thousands of dynamic fields total is going to be a problem.
> > > >
> > > > One technique that people do use is to index one field with a prefix
> > > > rather than N dynamic fields. So you have something like
> > > > dyn1_val1
> > > > dyn1_val2
> > > > dyn4_val67
> > > >
> > > > Only really works with string fields of course.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > On Mon, Jun 26, 2017 at 10:11 AM, Saurabh Sethi
> > > >  wrote:
> > > > > We have two requirements:
> > > > >
> > > > > 1. Indexing and storing event id and its timestamp.
> > > > > 2. Indexing and storing custom field name and value. The fields can
> > be
> > > of
> > > > > any type, but for now lets say they are of types string, date and
> > > number.
> > > > >
> > > > > The events and custom fields for any solr document can easily be in
> > > > > hundreds.
> > > > >
> > > > > We are looking at two different approaches to handle these
> scenarios:
> > > > >
> > > > > 1. *Dynamic fields* - Have the fields name start with a particular
> > > > pattern
> > > > > like for string, the pattern could be like str_* and for event
> could
> > be
> > > > > eventid_*
> > > > > 2. *Parent/child fields* - This seems to be an overkill for our use
> > > case
> > > > > since it's more for hierarchical data. Also, the parent and all its
> > > > > children need to be reindexed on update which defeats the purpose -
> > we
> > > > are
> > > > > now reindexing multiple docs instead of one with dynamic fields.
> But
> > it
> > > > > allows us to store custom field name along with its value unlike
> > > dynamic
> > > > > fields where we will have to map user supplied custom field to some
> > > other
> > > > > name based on type.
> > > > >
> > > > > Has anyone handled similar scenarios with Solr? If so, which
> approach
> > > > would
> > > > > you recommend based on your experience?
> > > > >
> > > > > We are using solr 6.6
> > > > >
> > > > > Thanks,
> > > > > Saurabh
> > > >
> > >
> > >
> > >
> > > --
> > > Saurabh Sethi
> > > Principal Engineer I | Engineering
> > >
> >
>
>
>
> --
> Saurabh Sethi
> Principal Engineer I | Engineering
>


Sharding of index data takes long time.

2017-06-27 Thread chandrushanmugasundaram
I am just trying to shard my index data of size 22GB(1.7M documents) into
three shards.

The total time for splitting takes about 7 hours.

In used the same query that is mentioned in solr collections API.

Is there anyway to do that quicker.

Can i use REBALANCE API . is that secured??

Is there any benchmark that is available for sharding the index .



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sharding-of-index-data-takes-long-time-tp4343029.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dynamic fields vs parent child

2017-06-27 Thread Saurabh Sethi
We have key-value pairs that need to be searchable. We are looking for best
approach, both in terms of indexing (fast as well as space efficient) as
well as retrieval (fast search).

Right now, the two approaches that we have are: Nested docs or dynamic
fields (myField_*_time:some date)

The number of dynamic fields would definitely be > 1k.

We wanted to get an idea which of these approaches work best or if there a
third approach which is better than nested and dynamic fields.

On Tue, Jun 27, 2017 at 5:39 AM, Susheel Kumar 
wrote:

> Can you describe your use case in terms of what business functionality you
> are looking to achieve.
>
> Thanks,
> Susheel
>
> On Mon, Jun 26, 2017 at 4:26 PM, Saurabh Sethi  >
> wrote:
>
> > Number of dynamic fields will be in thousands (millions of users +
> > thousands of events shared between subsets of users).
> >
> > We also thought about indexing in one field with value being
> > fieldname_fieldvalue. Since we support range queries for dates and
> numbers,
> > it won't work out of box.
> >
> > On Mon, Jun 26, 2017 at 1:05 PM, Erick Erickson  >
> > wrote:
> >
> > > How many distinct fields do you expect across _all_ documents? That
> > > is, if doc1 has 10 dynamic fields and doc2 has 10 dynamic fields, will
> > > there be exactly 10 fields total or more than 10 when you consider
> > > both documents?
> > >
> > > 100s of fields total across all documents is a tractable problem.
> > > thousands of dynamic fields total is going to be a problem.
> > >
> > > One technique that people do use is to index one field with a prefix
> > > rather than N dynamic fields. So you have something like
> > > dyn1_val1
> > > dyn1_val2
> > > dyn4_val67
> > >
> > > Only really works with string fields of course.
> > >
> > > Best,
> > > Erick
> > >
> > > On Mon, Jun 26, 2017 at 10:11 AM, Saurabh Sethi
> > >  wrote:
> > > > We have two requirements:
> > > >
> > > > 1. Indexing and storing event id and its timestamp.
> > > > 2. Indexing and storing custom field name and value. The fields can
> be
> > of
> > > > any type, but for now lets say they are of types string, date and
> > number.
> > > >
> > > > The events and custom fields for any solr document can easily be in
> > > > hundreds.
> > > >
> > > > We are looking at two different approaches to handle these scenarios:
> > > >
> > > > 1. *Dynamic fields* - Have the fields name start with a particular
> > > pattern
> > > > like for string, the pattern could be like str_* and for event could
> be
> > > > eventid_*
> > > > 2. *Parent/child fields* - This seems to be an overkill for our use
> > case
> > > > since it's more for hierarchical data. Also, the parent and all its
> > > > children need to be reindexed on update which defeats the purpose -
> we
> > > are
> > > > now reindexing multiple docs instead of one with dynamic fields. But
> it
> > > > allows us to store custom field name along with its value unlike
> > dynamic
> > > > fields where we will have to map user supplied custom field to some
> > other
> > > > name based on type.
> > > >
> > > > Has anyone handled similar scenarios with Solr? If so, which approach
> > > would
> > > > you recommend based on your experience?
> > > >
> > > > We are using solr 6.6
> > > >
> > > > Thanks,
> > > > Saurabh
> > >
> >
> >
> >
> > --
> > Saurabh Sethi
> > Principal Engineer I | Engineering
> >
>



-- 
Saurabh Sethi
Principal Engineer I | Engineering


Re: Solr PDF parsing failing with java error

2017-06-27 Thread Erick Erickson
Take a look at the solr logs, they'll give you a more explicit message.

My guess: Someone went into the Solr admin UI, clicked "core admin"
and then said "I wonder what this 'new core' button does?". The
default name is, you guessed it, "new_core". And if you don't have the
underlying directories set up already, there's no conf directory. And
so there's no solrconfig.xml to parse. And...

They can get past this by going into /var/solr/data and 'rm -rf new_core'.

Or, safer is to go in to

/var/solr/data/new_core/

and rename core.properties to anything else. Solr won't try to load
the core then. This assumes a relatively modern Solr, 4.x and above.
Or at least one that does not have cores defined in solr.xml.

Best,
Erick

On Tue, Jun 27, 2017 at 9:00 AM, MatthewMeredith
 wrote:
> Erick Erickson wrote
>> Sure, someone changed the system variable "solr.install.dir" (i.e.
>> -Dsolr.install.dir=some other place). Or removed it. Or changed the
>> startup script. Or
>>
>> I've gotten very skeptical of "we didn't change anything but suddenly
>> it stopped working". Usually it's something someone's changed
>> unbeknownst to the person you're interacting with.
>>
>> The solr log usually shows the paths where everything gets loaded
>> from. You should be able to track where Solr is looking for all its
>> resources.
>>
>> It's also possible one of the jars was corrupted on disk (disks do go
>> bad).
>>
>> So you can also inspect the jars to see if that class. Here's a way to
>> look for one:
>>
>> find . -name '*jar' -exec bash -c 'jar tvf {} | grep
>> ParserDiscoveryAnnotation' \; -print
>>
>> where ParserDiscoveryAnnotation is the class you're not finding.
>>
>> Best,
>> Erick
>
> Erick,
>
> Don't worry, I'm equally as sceptical of the situation... But my client
> doesn't have access to the server and I haven't been on in months... So
> unless my web host went tinkering :P Could an update have caused an issue?
>
> If I type in:
>
> cd $SOLR_INSTALL
>
> as per the README file, I'm taken to /root. This doesn't seem right, does
> it? In my Solr Admin, the CWD is listed as /opt/solr-6.0.1/server and my
> core instance is at /var/solr/data/comox_core
>
> I tried going to the contrib/extraction/lib folder and running that find
> command, but I just got:
>
> bash: jar: command not found
>
> a bunch of times (once per .jar file, I assume).
>
> Another interesting thing is that when I opened my Solr Admin this morning,
> I was shown the following error:
>
> SolrCore Initialization Failures
> Hacked:
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> Could not load conf for core Hacked: Error loading solr config from
> /var/solr/data/new_core/conf/solrconfig.xml
>
> I have no idea where this "new_core" bit is coming from... I've only ever
> had one core (comox_core).
>
> I really appreciate any help you can give!
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-PDF-parsing-failing-with-java-error-tp4342909p4343053.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr PDF parsing failing with java error

2017-06-27 Thread MatthewMeredith
Erick Erickson wrote
> Sure, someone changed the system variable "solr.install.dir" (i.e.
> -Dsolr.install.dir=some other place). Or removed it. Or changed the
> startup script. Or
> 
> I've gotten very skeptical of "we didn't change anything but suddenly
> it stopped working". Usually it's something someone's changed
> unbeknownst to the person you're interacting with.
> 
> The solr log usually shows the paths where everything gets loaded
> from. You should be able to track where Solr is looking for all its
> resources.
> 
> It's also possible one of the jars was corrupted on disk (disks do go
> bad).
> 
> So you can also inspect the jars to see if that class. Here's a way to
> look for one:
> 
> find . -name '*jar' -exec bash -c 'jar tvf {} | grep
> ParserDiscoveryAnnotation' \; -print
> 
> where ParserDiscoveryAnnotation is the class you're not finding.
> 
> Best,
> Erick

Erick,

Don't worry, I'm equally as sceptical of the situation... But my client
doesn't have access to the server and I haven't been on in months... So
unless my web host went tinkering :P Could an update have caused an issue? 

If I type in:

cd $SOLR_INSTALL

as per the README file, I'm taken to /root. This doesn't seem right, does
it? In my Solr Admin, the CWD is listed as /opt/solr-6.0.1/server and my
core instance is at /var/solr/data/comox_core

I tried going to the contrib/extraction/lib folder and running that find
command, but I just got:

bash: jar: command not found

a bunch of times (once per .jar file, I assume).

Another interesting thing is that when I opened my Solr Admin this morning,
I was shown the following error:

SolrCore Initialization Failures
Hacked:
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Could not load conf for core Hacked: Error loading solr config from
/var/solr/data/new_core/conf/solrconfig.xml

I have no idea where this "new_core" bit is coming from... I've only ever
had one core (comox_core).

I really appreciate any help you can give!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-PDF-parsing-failing-with-java-error-tp4342909p4343053.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Index 0, Size 0 - hashJoin Stream function Error

2017-06-27 Thread Joel Bernstein
Ok, I'll take a look. Thanks!

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Jun 27, 2017 at 10:01 AM, Susheel Kumar 
wrote:

> Hi Joel,
>
> I have submitted a patch to handle this.  Please review.
>
> https://issues.apache.org/jira/secure/attachment/12874681/SOLR-10944.patch
>
> Thanks,
> Susheel
>
> On Fri, Jun 23, 2017 at 12:32 PM, Susheel Kumar 
> wrote:
>
> > Thanks for confirming.  Here is the JIRA
> >
> > https://issues.apache.org/jira/browse/SOLR-10944
> >
> > On Fri, Jun 23, 2017 at 11:20 AM, Joel Bernstein 
> > wrote:
> >
> >> yeah, this looks like a bug in the get expression.
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >> On Fri, Jun 23, 2017 at 11:07 AM, Susheel Kumar 
> >> wrote:
> >>
> >> > Hi Joel,
> >> >
> >> > As i am getting deeper, it doesn't look like a problem due to hashJoin
> >> etc.
> >> >
> >> >
> >> > Below is a simple let expr where if search would not find a match and
> >> > return 0 result.  In that case, I would expect get(a) to show a EOF
> >> tuple
> >> > while it is throwing exception. It looks like something wrong/bug in
> the
> >> > code.  Please suggest
> >> >
> >> > ===
> >> > let(a=search(collection1,
> >> > q=id:9,
> >> > fl="id,business_email",
> >> > sort="business_email asc"),
> >> > get(a)
> >> > )
> >> >
> >> >
> >> > {
> >> >   "result-set": {
> >> > "docs": [
> >> >   {
> >> > "EXCEPTION": "Index: 0, Size: 0",
> >> > "EOF": true,
> >> > "RESPONSE_TIME": 8
> >> >   }
> >> > ]
> >> >   }
> >> > }
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Fri, Jun 23, 2017 at 7:44 AM, Joel Bernstein 
> >> > wrote:
> >> >
> >> > > Ok, I hadn't anticipated some of the scenarios that you've been
> trying
> >> > out.
> >> > > Particularly reading streams into variables and performing joins
> >> etc...
> >> > >
> >> > > The main idea with variables was to use them with the new
> statistical
> >> > > evaluators. So you perform retrievals (search, random, nodes, knn
> >> etc...)
> >> > > set the results to variables and then perform statistical analysis.
> >> > >
> >> > > The problem with joining variables is that is doesn't scale very
> well
> >> > > because all the records are read into memory. Also the parallel
> stream
> >> > > won't work over variables.
> >> > >
> >> > > Joel Bernstein
> >> > > http://joelsolr.blogspot.com/
> >> > >
> >> > > On Thu, Jun 22, 2017 at 3:50 PM, Susheel Kumar <
> susheel2...@gmail.com
> >> >
> >> > > wrote:
> >> > >
> >> > > > Hi Joel,
> >> > > >
> >> > > > I am able to reproduce this in a simple way.  Looks like Let
> Stream
> >> is
> >> > > > having some issues.  Below complement function works fine if I
> >> execute
> >> > > > outside let and returns an EOF:true tuple but if a tuple with
> >> EOF:true
> >> > > > assigned to let variable, it gets changed to EXCEPTION "Index 0,
> >> Size
> >> > 0"
> >> > > > etc.
> >> > > >
> >> > > > So let stream not able to handle the stream/results which has only
> >> EOF
> >> > > > tuple and breaks the whole let expression block
> >> > > >
> >> > > >
> >> > > > ===Complement inside let
> >> > > > let(
> >> > > > a=echo(Hello),
> >> > > > b=complement(sort(select(tuple(id=1,email="A"),id,email),by="id
> >> > > asc,email
> >> > > > asc"),
> >> > > > sort(select(tuple(id=1,email="A"),id,email),by="id asc,email
> asc"),
> >> > > > on="id,email"),
> >> > > > c=get(b),
> >> > > > get(a)
> >> > > > )
> >> > > >
> >> > > > Result
> >> > > > ===
> >> > > > {
> >> > > >   "result-set": {
> >> > > > "docs": [
> >> > > >   {
> >> > > > "EXCEPTION": "Index: 0, Size: 0",
> >> > > > "EOF": true,
> >> > > > "RESPONSE_TIME": 1
> >> > > >   }
> >> > > > ]
> >> > > >   }
> >> > > > }
> >> > > >
> >> > > > ===Complement outside let
> >> > > >
> >> > > > complement(sort(select(tuple(id=1,email="A"),id,email),by="id
> >> > asc,email
> >> > > > asc"),
> >> > > > sort(select(tuple(id=1,email="A"),id,email),by="id asc,email
> asc"),
> >> > > > on="id,email")
> >> > > >
> >> > > > Result
> >> > > > ===
> >> > > > { "result-set": { "docs": [ { "EOF": true, "RESPONSE_TIME": 0 } ]
> }
> >> }
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Thu, Jun 22, 2017 at 11:55 AM, Susheel Kumar <
> >> susheel2...@gmail.com
> >> > >
> >> > > > wrote:
> >> > > >
> >> > > > > Sorry for typo
> >> > > > >
> >> > > > > Facing a weird behavior when using hashJoin / innerJoin etc. The
> >> > below
> >> > > > > expression display tuples from variable a shown below
> >> > > > >
> >> > > > >
> >> > > > > let(a=fetch(SMS,having(rollup(over=email,
> >> > > > >  count(email),
> >> > > > > select(search(SMS,
> >> > > > > q=*:*,
> >> > > > > fl="id,dv_sv_business_email",
> >> > > > > sort="dv_sv_business_email a

Re: solr /export handler - behavior during close()

2017-06-27 Thread Joel Bernstein
Ok, I'll fix the ParallelStream to set the stream context though when
creating the SolrStreams. Thanks for pointing this out.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Jun 27, 2017 at 1:46 PM, Susmit Shukla 
wrote:

> Hi Joel,
>
> I was on solr 6.3 branch. I see HttpClient deprecated methods are all fixed
> in master.
> I had forgot to mention that I used a custom SolrClientCache to have higher
> limits for maxConnectionPerHost settings thats why I saw difference in
> behavior. SolrClientCache also looks configurable with a new constructor on
> master branch.
>
> I guess it is all good going forward on master.
>
> Thanks,
> Susmit
>
> On Tue, Jun 27, 2017 at 10:14 AM, Joel Bernstein 
> wrote:
>
> > Ok, I see where it's not set the stream context. This needs to be fixed.
> >
> > I'm curious about where you're seeing deprecated methods in the
> > HttpClientUtil? I was reviewing the master version of HttpClientUtil and
> > didn't see any deprecations in my IDE.
> >
> > I'm wondering if you're using an older version of HttpClientUtil then I
> > used when I was testing SOLR-10698?
> >
> > You also mentioned that the SolrStream and the SolrClientCache were using
> > the same approach to create the client. In that case changing the
> > ParallelStream to set the streamContext shouldn't have any effect on the
> > close() issue.
> >
> >
> >
> >
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Sun, Jun 25, 2017 at 10:48 AM, Susmit Shukla  >
> > wrote:
> >
> > > Hi Joel,
> > >
> > > Looked at the fix for SOLR-10698, there could be 2 potential issues
> > >
> > > - Parallel Stream does not set stream context on newly created
> > SolrStreams
> > > in open() method.
> > >
> > > - This results in creation of new uncached HttpSolrClient in open()
> > method
> > > of SolrStream. This client is created using deprecated methods of http
> > > client library (HttpClientUtil.createClient) and behaves differently on
> > > close() than the one created using HttpClientBuilder API.
> SolrClientCache
> > > too uses the same deprecated API
> > >
> > > This test case shows the problem
> > >
> > > ParallelStream ps = new parallelStream(tupleStream,...)
> > >
> > > while(true){
> > >
> > > read();
> > >
> > > break after 2 iterations
> > >
> > > }
> > >
> > > ps.close()
> > >
> > > //close() reads through the end of tupleStream.
> > >
> > > I tried with HttpClient created by *org**.**apache**.**http**.**
> impl**.*
> > > *client**.HttpClientBuilder.create()* and close() is working for that.
> > >
> > >
> > > Thanks,
> > >
> > > Susmit
> > >
> > > On Wed, May 17, 2017 at 7:33 AM, Susmit Shukla <
> shukla.sus...@gmail.com>
> > > wrote:
> > >
> > > > Thanks Joel, will try that.
> > > > Binary response would be more performant.
> > > > I observed the server sends responses in 32 kb chunks and the client
> > > reads
> > > > it with 8 kb buffer on inputstream. I don't know if changing that can
> > > > impact anything on performance. Even if buffer size is increased on
> > > > httpclient, it can't override the hardcoded 8kb buffer on
> > > > sun.nio.cs.StreamDecoder
> > > >
> > > > Thanks,
> > > > Susmit
> > > >
> > > > On Wed, May 17, 2017 at 5:49 AM, Joel Bernstein 
> > > > wrote:
> > > >
> > > >> Susmit,
> > > >>
> > > >> You could wrap a LimitStream around the outside of all the
> relational
> > > >> algebra. For example:
> > > >>
> > > >> parallel(limit((intersect(intersect(search, search), union(search,
> > > >> search)
> > > >>
> > > >> In this scenario the limit would happen on the workers.
> > > >>
> > > >> As far as the worker/replica ratio. This will depend on how heavy
> the
> > > >> export is. If it's a light export, small number of fields, mostly
> > > numeric,
> > > >> simple sort params, then I've seen a ratio of 5 (workers) to 1
> > (replica)
> > > >> work well. This will basically saturate the CPU on the replica. But
> > > >> heavier
> > > >> exports will saturate the replicas with fewer workers.
> > > >>
> > > >> Also I tend to use Direct DocValues to get the best performance. I'm
> > not
> > > >> sure how much difference this makes, but it should eliminate the
> > > >> compression overhead fetching the data from the DocValues.
> > > >>
> > > >> Varun's suggestion of using the binary transport will provide a nice
> > > >> performance increase as well. But you'll need to upgrade. You may
> need
> > > to
> > > >> do that anyway as the fix on the early stream close will be on a
> later
> > > >> version that was refactored to support the binary transport.
> > > >>
> > > >> Joel Bernstein
> > > >> http://joelsolr.blogspot.com/
> > > >>
> > > >> On Tue, May 16, 2017 at 8:03 PM, Joel Bernstein  >
> > > >> wrote:
> > > >>
> > > >> > Yep, saw it. I'll comment on the ticket for what I believe needs
> to
> > be
> > > >> > done.
> > > >> >
> > > >> > Joel Bernstein
> > > >> > http://joelsolr.blogspot.com/
> > > >> >
> > > >> > On Tue, May 16, 2017 at 8:00 PM, Varun Thacker  >
> > > >> wrote:

Re: Using of Streaming to join between shards

2017-06-27 Thread Joel Bernstein
I don't think the distributed joins are going to work for you in the ACL
use case you describe. I think the overhead of streaming the documents will
be too costly in this scenario. The distributed joins were designed more
for OLAP data warehousing use cases rather then high QPS loads.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Jun 27, 2017 at 7:51 AM, mganeshs  wrote:

> Hi Susheel,
>
> Thanks for your reply and as you suggested we will start with innerJoin.
>
> But what I want know is that, Is Streaming can be used instead of normal
> default Join ?
>
> For ex. currently we fire request for every user clicks on menu in the page
> to show list of his documents with default JOIN and it works well without
> any issues with 100 concurrent users as well or even more than that
> concurrency.
>
> Can we do same for streaming join as well ? I just want to know whether
> concurrent streaming request will create heavy load to solr server or it's
> same as default join. What would be penalty of using streaming concurrently
> instead of default join ?
>
> Kindly throw some light on this topic.
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Using-of-Streaming-to-join-between-shards-
> tp4342563p4343005.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: solr /export handler - behavior during close()

2017-06-27 Thread Susmit Shukla
Hi Joel,

I was on solr 6.3 branch. I see HttpClient deprecated methods are all fixed
in master.
I had forgot to mention that I used a custom SolrClientCache to have higher
limits for maxConnectionPerHost settings thats why I saw difference in
behavior. SolrClientCache also looks configurable with a new constructor on
master branch.

I guess it is all good going forward on master.

Thanks,
Susmit

On Tue, Jun 27, 2017 at 10:14 AM, Joel Bernstein  wrote:

> Ok, I see where it's not set the stream context. This needs to be fixed.
>
> I'm curious about where you're seeing deprecated methods in the
> HttpClientUtil? I was reviewing the master version of HttpClientUtil and
> didn't see any deprecations in my IDE.
>
> I'm wondering if you're using an older version of HttpClientUtil then I
> used when I was testing SOLR-10698?
>
> You also mentioned that the SolrStream and the SolrClientCache were using
> the same approach to create the client. In that case changing the
> ParallelStream to set the streamContext shouldn't have any effect on the
> close() issue.
>
>
>
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Sun, Jun 25, 2017 at 10:48 AM, Susmit Shukla 
> wrote:
>
> > Hi Joel,
> >
> > Looked at the fix for SOLR-10698, there could be 2 potential issues
> >
> > - Parallel Stream does not set stream context on newly created
> SolrStreams
> > in open() method.
> >
> > - This results in creation of new uncached HttpSolrClient in open()
> method
> > of SolrStream. This client is created using deprecated methods of http
> > client library (HttpClientUtil.createClient) and behaves differently on
> > close() than the one created using HttpClientBuilder API. SolrClientCache
> > too uses the same deprecated API
> >
> > This test case shows the problem
> >
> > ParallelStream ps = new parallelStream(tupleStream,...)
> >
> > while(true){
> >
> > read();
> >
> > break after 2 iterations
> >
> > }
> >
> > ps.close()
> >
> > //close() reads through the end of tupleStream.
> >
> > I tried with HttpClient created by *org**.**apache**.**http**.**impl**.*
> > *client**.HttpClientBuilder.create()* and close() is working for that.
> >
> >
> > Thanks,
> >
> > Susmit
> >
> > On Wed, May 17, 2017 at 7:33 AM, Susmit Shukla 
> > wrote:
> >
> > > Thanks Joel, will try that.
> > > Binary response would be more performant.
> > > I observed the server sends responses in 32 kb chunks and the client
> > reads
> > > it with 8 kb buffer on inputstream. I don't know if changing that can
> > > impact anything on performance. Even if buffer size is increased on
> > > httpclient, it can't override the hardcoded 8kb buffer on
> > > sun.nio.cs.StreamDecoder
> > >
> > > Thanks,
> > > Susmit
> > >
> > > On Wed, May 17, 2017 at 5:49 AM, Joel Bernstein 
> > > wrote:
> > >
> > >> Susmit,
> > >>
> > >> You could wrap a LimitStream around the outside of all the relational
> > >> algebra. For example:
> > >>
> > >> parallel(limit((intersect(intersect(search, search), union(search,
> > >> search)
> > >>
> > >> In this scenario the limit would happen on the workers.
> > >>
> > >> As far as the worker/replica ratio. This will depend on how heavy the
> > >> export is. If it's a light export, small number of fields, mostly
> > numeric,
> > >> simple sort params, then I've seen a ratio of 5 (workers) to 1
> (replica)
> > >> work well. This will basically saturate the CPU on the replica. But
> > >> heavier
> > >> exports will saturate the replicas with fewer workers.
> > >>
> > >> Also I tend to use Direct DocValues to get the best performance. I'm
> not
> > >> sure how much difference this makes, but it should eliminate the
> > >> compression overhead fetching the data from the DocValues.
> > >>
> > >> Varun's suggestion of using the binary transport will provide a nice
> > >> performance increase as well. But you'll need to upgrade. You may need
> > to
> > >> do that anyway as the fix on the early stream close will be on a later
> > >> version that was refactored to support the binary transport.
> > >>
> > >> Joel Bernstein
> > >> http://joelsolr.blogspot.com/
> > >>
> > >> On Tue, May 16, 2017 at 8:03 PM, Joel Bernstein 
> > >> wrote:
> > >>
> > >> > Yep, saw it. I'll comment on the ticket for what I believe needs to
> be
> > >> > done.
> > >> >
> > >> > Joel Bernstein
> > >> > http://joelsolr.blogspot.com/
> > >> >
> > >> > On Tue, May 16, 2017 at 8:00 PM, Varun Thacker 
> > >> wrote:
> > >> >
> > >> >> Hi Joel,Susmit
> > >> >>
> > >> >> I created https://issues.apache.org/jira/browse/SOLR-10698 to
> track
> > >> the
> > >> >> issue
> > >> >>
> > >> >> @Susmit looking at the stack trace I see the expression is using
> > >> >> JSONTupleStream
> > >> >> . I wonder if you tried using JavabinTupleStreamParser could it
> help
> > >> >> improve performance ?
> > >> >>
> > >> >> On Tue, May 16, 2017 at 9:39 AM, Susmit Shukla <
> > >> shukla.sus...@gmail.com>
> > >> >> wrote:
> > >> >>
> > >> >> > Hi Joel,
> > >> >> >
> > >> >> > queries can be a

Re: solr /export handler - behavior during close()

2017-06-27 Thread Joel Bernstein
Ok, I see where it's not set the stream context. This needs to be fixed.

I'm curious about where you're seeing deprecated methods in the
HttpClientUtil? I was reviewing the master version of HttpClientUtil and
didn't see any deprecations in my IDE.

I'm wondering if you're using an older version of HttpClientUtil then I
used when I was testing SOLR-10698?

You also mentioned that the SolrStream and the SolrClientCache were using
the same approach to create the client. In that case changing the
ParallelStream to set the streamContext shouldn't have any effect on the
close() issue.








Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, Jun 25, 2017 at 10:48 AM, Susmit Shukla 
wrote:

> Hi Joel,
>
> Looked at the fix for SOLR-10698, there could be 2 potential issues
>
> - Parallel Stream does not set stream context on newly created SolrStreams
> in open() method.
>
> - This results in creation of new uncached HttpSolrClient in open() method
> of SolrStream. This client is created using deprecated methods of http
> client library (HttpClientUtil.createClient) and behaves differently on
> close() than the one created using HttpClientBuilder API. SolrClientCache
> too uses the same deprecated API
>
> This test case shows the problem
>
> ParallelStream ps = new parallelStream(tupleStream,...)
>
> while(true){
>
> read();
>
> break after 2 iterations
>
> }
>
> ps.close()
>
> //close() reads through the end of tupleStream.
>
> I tried with HttpClient created by *org**.**apache**.**http**.**impl**.*
> *client**.HttpClientBuilder.create()* and close() is working for that.
>
>
> Thanks,
>
> Susmit
>
> On Wed, May 17, 2017 at 7:33 AM, Susmit Shukla 
> wrote:
>
> > Thanks Joel, will try that.
> > Binary response would be more performant.
> > I observed the server sends responses in 32 kb chunks and the client
> reads
> > it with 8 kb buffer on inputstream. I don't know if changing that can
> > impact anything on performance. Even if buffer size is increased on
> > httpclient, it can't override the hardcoded 8kb buffer on
> > sun.nio.cs.StreamDecoder
> >
> > Thanks,
> > Susmit
> >
> > On Wed, May 17, 2017 at 5:49 AM, Joel Bernstein 
> > wrote:
> >
> >> Susmit,
> >>
> >> You could wrap a LimitStream around the outside of all the relational
> >> algebra. For example:
> >>
> >> parallel(limit((intersect(intersect(search, search), union(search,
> >> search)
> >>
> >> In this scenario the limit would happen on the workers.
> >>
> >> As far as the worker/replica ratio. This will depend on how heavy the
> >> export is. If it's a light export, small number of fields, mostly
> numeric,
> >> simple sort params, then I've seen a ratio of 5 (workers) to 1 (replica)
> >> work well. This will basically saturate the CPU on the replica. But
> >> heavier
> >> exports will saturate the replicas with fewer workers.
> >>
> >> Also I tend to use Direct DocValues to get the best performance. I'm not
> >> sure how much difference this makes, but it should eliminate the
> >> compression overhead fetching the data from the DocValues.
> >>
> >> Varun's suggestion of using the binary transport will provide a nice
> >> performance increase as well. But you'll need to upgrade. You may need
> to
> >> do that anyway as the fix on the early stream close will be on a later
> >> version that was refactored to support the binary transport.
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >> On Tue, May 16, 2017 at 8:03 PM, Joel Bernstein 
> >> wrote:
> >>
> >> > Yep, saw it. I'll comment on the ticket for what I believe needs to be
> >> > done.
> >> >
> >> > Joel Bernstein
> >> > http://joelsolr.blogspot.com/
> >> >
> >> > On Tue, May 16, 2017 at 8:00 PM, Varun Thacker 
> >> wrote:
> >> >
> >> >> Hi Joel,Susmit
> >> >>
> >> >> I created https://issues.apache.org/jira/browse/SOLR-10698 to track
> >> the
> >> >> issue
> >> >>
> >> >> @Susmit looking at the stack trace I see the expression is using
> >> >> JSONTupleStream
> >> >> . I wonder if you tried using JavabinTupleStreamParser could it help
> >> >> improve performance ?
> >> >>
> >> >> On Tue, May 16, 2017 at 9:39 AM, Susmit Shukla <
> >> shukla.sus...@gmail.com>
> >> >> wrote:
> >> >>
> >> >> > Hi Joel,
> >> >> >
> >> >> > queries can be arbitrarily nested with AND/OR/NOT joins e.g.
> >> >> >
> >> >> > (intersect(intersect(search, search), union(search, search))). If I
> >> cut
> >> >> off
> >> >> > the innermost stream with a limit, the complete intersection would
> >> not
> >> >> > happen at upper levels. Also would the limit stream have same
> effect
> >> as
> >> >> > using /select handler with rows parameter?
> >> >> > I am trying to force input stream close through reflection, just to
> >> see
> >> >> if
> >> >> > it gives performance gains.
> >> >> >
> >> >> > 2) would experiment with null streams. Is workers = number of
> >> replicas
> >> >> in
> >> >> > data collection a good thumb rule? is parallelstream performance
> >> upper
> >> >> > bounded by number of repl

RE: Master/Slave out of sync

2017-06-27 Thread Pouliot, Scott
I figured the attachments would get stripped, but it was worth a shot!  It was 
just a screenshot showing the version numbers off from each other.

Here are the Master/Slave commit settings:

 
   18 
   false 


 
   6 


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, June 27, 2017 11:17 AM
To: solr-user 
Subject: Re: Master/Slave out of sync

First, attachments are almost always stripped by the mail program, so we can't 
see anything.

Hmmm, does look odd. What happens if you issue a commit against the slave via a 
url? I.e.
http://server:port/solr/core/update?commit=true?

And what are the autocommit settings on the slave?

Best,
Erick

On Tue, Jun 27, 2017 at 7:22 AM, Pouliot, Scott < 
scott.poul...@peoplefluent.com> wrote:

> Hey guys…
>
>
>
> Does anyone else have a problem with the master/slave setup getting 
> out of sync and staying that way until I either optimize the core or 
> restart SOLR?  It seems to be happening more and more frequently these 
> days and I’m looking for a solution here.  Running SOLR 6.2 on these 
> instances using jetty.
>
>
>
> I do see some log entries like the following at the moment, but it has 
> happened WITHOUT these errors in the past as well.  This error just 
> looks like the core is being loaded, so it can’t replicate (as far as I can 
> tell):
>
>
>
> 2017-06-23 00:44:08.624 ERROR (indexFetcher-677-thread-1) [   x:Client1]
> o.a.s.h.IndexFetcher Master at: http://master:8080/solr/Client1 is not 
> available. Index fetch failed. Exception: Error from server at
> http://master:8080/solr/Client1: Expected mime type 
> application/octet-stream but got text/html. 
>
> 
>
> 
>
> Error 503 {metadata={error-class=org.apache.solr.common.
> SolrException,root-error-class=org.apache.solr.common.SolrException},m
> sg=SolrCore
> is loading,code=503}
>
> 
>
> HTTP ERROR 503
>
> Problem accessing /solr/Client1/replication. Reason:
>
> {metadata={error-class=org.apache.solr.common.
> SolrException,root-error-class=org.apache.solr.common.SolrException},m
> sg=SolrCore
> is loading,code=503}
>
> 
>
> 
>
>
>
> Our setup looks something like this:
>
>
>
> Master
>
> Client Core 1
>
> Client Core 2
>
> Client Core 3
>
>
>
> Slave
>
> Client Core 1
>
> Client Core 2
>
> Client Core 3
>
>
>
> Master Config
>
>  class="solr.ReplicationHandler"
> >
>
> 
>
>   
>
>   startup
>
>   commit
>
>
>
>   
>
>   00:00:10
>
> 
>
> 
>
> 
>
> 1
>
>   
>
>
>
>
>
> Slave Config
>
>  class="solr.ReplicationHandler"
> >
>
> 
>
>
>
>   
>
>   http://master:8080/solr/${solr.core.name}
> 
>
>
>
>   
>
>   00:00:45
>
> 
>
>   
>
>
>
> Master screenshot
>
>
>
>
>
> Slave Screenshot
>
>


Re: Tlogs not being deleted/truncated

2017-06-27 Thread Webster Homer
Commits were definitely not happening. We ran out of filesystem space. The
admins deleted old tlogs and restartd. The collection in question was
missing a lot of data. We reloaded it, and then we saw some commits. In
Solrcloud they look like this:
2017-06-23 17:28:06.441 INFO  (commitScheduler-56-thread-1)
[c:sial-content-citations s:shard1 r:core_node2
x:sial-content-citations_shard1_replica1]
o.a.s.u.DirectUpdateHandler2 start commit{,optimize=false,openSearcher=true,
waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
2017-06-23 17:28:07.823 INFO  (commitScheduler-56-thread-1)
[c:sial-content-citations s:shard1 r:core_node2
x:sial-content-citations_shard1_replica1]
o.a.s.s.SolrIndexSearcher Opening [Searcher@1c6a3bf1[sial-
content-citations_shard1_replica1] main]
2017-06-23 17:28:07.824 INFO  (commitScheduler-56-thread-1)
[c:sial-content-citations s:shard1 r:core_node2
x:sial-content-citations_shard1_replica1]
o.a.s.u.DirectUpdateHandler2 end_commit_flush
2017-06-23 17:28:49.665 INFO  (commitScheduler-66-thread-1)
[c:ehs-catalog-qmdoc s:shard2 r:core_node2 x:ehs-catalog-qmdoc_shard2_replica1]
o.a.s.u.DirectUpdateHandler2 start commit{,optimize=false,
openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,
prepareCommit=false}
2017-06-23 17:28:49.742 INFO  (commitScheduler-66-thread-1)
[c:ehs-catalog-qmdoc s:shard2 r:core_node2 x:ehs-catalog-qmdoc_shard2_replica1]
o.a.s.c.SolrDeletionPolicy SolrDeletionPolicy.onCommit: commits: num=2
commit{dir=NRTCachingDirectory(MMapDirectory@/var/solr/data/
ehs-catalog-qmdoc_shard2_replica1/data/index lockFactory=org.apache.lucene.
store.NativeFSLockFactory@597830aa; maxCacheMB=48.0
maxMergeSizeMB=4.0),segFN=segments_2gb,generation=3179}

I have been busy and couldn't get back to this issue until now. The problem
started happening again. I manually sent a commit and that seemed to help
for a time. Unfortunately I don't have access to our Production solrs. we
use logstash for the logs, but not all logs were being captured, the commit
messages above were not.


On Tue, Jun 20, 2017 at 5:34 PM, Erick Erickson 
wrote:

> bq: Neither in our source collection nor in our target collections.
>
> Hmmm. You should see messages similar to the following which I just
> generated on Solr 6.2 (stand-alone I admit but that code should be the
> same):
>
> INFO  - 2017-06-20 21:11:55.424; [   x:techproducts]
> org.apache.solr.update.DirectUpdateHandler2; start
> commit{,optimize=false,openSearcher=false,waitSearcher=true,
> expungeDeletes=false,softCommit=false,prepareCommit=false}
>
> INFO  - 2017-06-20 21:11:55.425; [   x:techproducts]
> org.apache.solr.update.SolrIndexWriter; Calling setCommitData with
> IW:org.apache.solr.update.SolrIndexWriter@4862d97c
>
> INFO  - 2017-06-20 21:11:55.663; [   x:techproducts]
> org.apache.solr.core.SolrDeletionPolicy; SolrDeletionPolicy.onCommit:
> commits: num=2
>
> commit{dir=NRTCachingDirectory(MMapDirectory@/Users/Erick/
> apache/solrVersions/playspace/solr/example/techproducts/
> solr/techproducts/data/index
> lockFactory=org.apache.lucene.store.NativeFSLockFactory@3d8e7c06;
> maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_c,generation=12}
>
> commit{dir=NRTCachingDirectory(MMapDirectory@/Users/Erick/
> apache/solrVersions/playspace/solr/example/techproducts/
> solr/techproducts/data/index
> lockFactory=org.apache.lucene.store.NativeFSLockFactory@3d8e7c06;
> maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_d,generation=13}
>
> INFO  - 2017-06-20 21:11:55.663; [   x:techproducts]
> org.apache.solr.core.SolrDeletionPolicy; newest commit generation = 13
>
> INFO  - 2017-06-20 21:11:55.668; [   x:techproducts]
> org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
>
> whenever commits kick in.
>
> So how sure are you of your autocommit settings? Is there any chance
> the sysvar "solr.autoCommit.maxTime" is somehow being set to -1?
> Unlikely frankly since you have multiple tlogs.
>
> Another possibility is that somehow you've changed the number of docs
> to keep in the transaction logs and the number of transaction logs
> through:
> numRecordsToKeep
> and
> maxNumLogsToKeep
>
> And you'll note that the tlogs stat at 00082, which means many
> have been deleted. Does the minimum number keep going up? If that's
> the case tlogs _are_ being rolled over, just not very often. Why is a
> mystery of course.
>
> So what happens if you issue manual commits on both source and target?
>
> It's unlikely that autocommit is totally broken or we'd have heard
> howls almost immediately. That said the behavior you report is
> disturbing
>
> Best,
> Erick
>
>
> On Tue, Jun 20, 2017 at 1:37 PM, Webster Homer 
> wrote:
> > Yes, soft commits are irrelevant for this. What is relevant about soft
> > commits is that we can search the data.
> >
> > We have autoCommit set to 10 minutes and never see tlogs truncated.
> > Apparently autoCommit doesn't fire, ever.
> > Neither in our source collection nor in our tar

Solr 6.6 SSL Question

2017-06-27 Thread Gruenberger, Hans
How does Solr find the correct certificate to use for handling inbound requests?

Documentation shows the solr.in.sh settings but I only see the keystore 
locations and the passwords, there is no reference to the alias being used if 
the keystore contains more certificates ...

However the "Generate a Self-Signed Certificate and a Key" example shows that 
an alias solr-ssl is being defined - does this mean that there is a hard-coded 
alias being used?

Hans Grünberger
AirPlus International
Sales and Application Processing Automation
Dornhofstr. 10
63263 Neu-Isenburg
T +49 (0) 61 02 2 04 - 84 71
F +49 (0) 61 02 2 04 - 95 90
hgruenber...@airplus.com
http://www.airplus.com

Lufthansa AirPlus Servicekarten GmbH * Dornhofstraße 10 * 63263 Neu-Isenburg * 
Deutschland/Germany * Geschäftsführung/Management Board: Patrick W. Diemer 
(Vorsitz/Chairman), Roland Kern * Vorsitzender des Aufsichtsrates/Chairman of 
the Supervisory Board: Axel Tillmann * Registergericht/Court of Registration: 
Amtsgericht Offenbach/Main, HRB 8119



Re: Master/Slave out of sync

2017-06-27 Thread Erick Erickson
First, attachments are almost always stripped by the mail program, so we
can't see anything.

Hmmm, does look odd. What happens if you issue a commit against the slave
via a url? I.e.
http://server:port/solr/core/update?commit=true?

And what are the autocommit settings on the slave?

Best,
Erick

On Tue, Jun 27, 2017 at 7:22 AM, Pouliot, Scott <
scott.poul...@peoplefluent.com> wrote:

> Hey guys…
>
>
>
> Does anyone else have a problem with the master/slave setup getting out of
> sync and staying that way until I either optimize the core or restart
> SOLR?  It seems to be happening more and more frequently these days and I’m
> looking for a solution here.  Running SOLR 6.2 on these instances using
> jetty.
>
>
>
> I do see some log entries like the following at the moment, but it has
> happened WITHOUT these errors in the past as well.  This error just looks
> like the core is being loaded, so it can’t replicate (as far as I can tell):
>
>
>
> 2017-06-23 00:44:08.624 ERROR (indexFetcher-677-thread-1) [   x:Client1]
> o.a.s.h.IndexFetcher Master at: http://master:8080/solr/Client1 is not
> available. Index fetch failed. Exception: Error from server at
> http://master:8080/solr/Client1: Expected mime type
> application/octet-stream but got text/html. 
>
> 
>
> 
>
> Error 503 {metadata={error-class=org.apache.solr.common.
> SolrException,root-error-class=org.apache.solr.common.SolrException},msg=SolrCore
> is loading,code=503}
>
> 
>
> HTTP ERROR 503
>
> Problem accessing /solr/Client1/replication. Reason:
>
> {metadata={error-class=org.apache.solr.common.
> SolrException,root-error-class=org.apache.solr.common.SolrException},msg=SolrCore
> is loading,code=503}
>
> 
>
> 
>
>
>
> Our setup looks something like this:
>
>
>
> Master
>
> Client Core 1
>
> Client Core 2
>
> Client Core 3
>
>
>
> Slave
>
> Client Core 1
>
> Client Core 2
>
> Client Core 3
>
>
>
> Master Config
>
>  class="solr.ReplicationHandler"
> >
>
> 
>
>   
>
>   startup
>
>   commit
>
>
>
>   
>
>   00:00:10
>
> 
>
> 
>
> 
>
> 1
>
>   
>
>
>
>
>
> Slave Config
>
>  class="solr.ReplicationHandler"
> >
>
> 
>
>
>
>   
>
>   http://master:8080/solr/${solr.core.name}
> 
>
>
>
>   
>
>   00:00:45
>
> 
>
>   
>
>
>
> Master screenshot
>
>
>
>
>
> Slave Screenshot
>
>


Master/Slave out of sync

2017-06-27 Thread Pouliot, Scott
Hey guys...

Does anyone else have a problem with the master/slave setup getting out of sync 
and staying that way until I either optimize the core or restart SOLR?  It 
seems to be happening more and more frequently these days and I'm looking for a 
solution here.  Running SOLR 6.2 on these instances using jetty.

I do see some log entries like the following at the moment, but it has happened 
WITHOUT these errors in the past as well.  This error just looks like the core 
is being loaded, so it can't replicate (as far as I can tell):

2017-06-23 00:44:08.624 ERROR (indexFetcher-677-thread-1) [   x:Client1] 
o.a.s.h.IndexFetcher Master at: http://master:8080/solr/Client1 is not 
available. Index fetch failed. Exception: Error from server at 
http://master:8080/solr/Client1: Expected mime type application/octet-stream 
but got text/html. 


Error 503 
{metadata={error-class=org.apache.solr.common.SolrException,root-error-class=org.apache.solr.common.SolrException},msg=SolrCore
 is loading,code=503}

HTTP ERROR 503
Problem accessing /solr/Client1/replication. Reason:

{metadata={error-class=org.apache.solr.common.SolrException,root-error-class=org.apache.solr.common.SolrException},msg=SolrCore
 is loading,code=503}



Our setup looks something like this:

Master
Client Core 1
Client Core 2
Client Core 3

Slave
Client Core 1
Client Core 2
Client Core 3

Master Config


  
  startup
  commit

  
  00:00:10



1
  


Slave Config



  
  http://master:8080/solr/${solr.core.name}

  
  00:00:45

  

Master screenshot
[cid:image001.png@01D2EF2F.53DF4C10]


Slave Screenshot
[cid:image002.png@01D2EF2F.53DF4C10]


Re: Index 0, Size 0 - hashJoin Stream function Error

2017-06-27 Thread Susheel Kumar
Hi Joel,

I have submitted a patch to handle this.  Please review.

https://issues.apache.org/jira/secure/attachment/12874681/SOLR-10944.patch

Thanks,
Susheel

On Fri, Jun 23, 2017 at 12:32 PM, Susheel Kumar 
wrote:

> Thanks for confirming.  Here is the JIRA
>
> https://issues.apache.org/jira/browse/SOLR-10944
>
> On Fri, Jun 23, 2017 at 11:20 AM, Joel Bernstein 
> wrote:
>
>> yeah, this looks like a bug in the get expression.
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Fri, Jun 23, 2017 at 11:07 AM, Susheel Kumar 
>> wrote:
>>
>> > Hi Joel,
>> >
>> > As i am getting deeper, it doesn't look like a problem due to hashJoin
>> etc.
>> >
>> >
>> > Below is a simple let expr where if search would not find a match and
>> > return 0 result.  In that case, I would expect get(a) to show a EOF
>> tuple
>> > while it is throwing exception. It looks like something wrong/bug in the
>> > code.  Please suggest
>> >
>> > ===
>> > let(a=search(collection1,
>> > q=id:9,
>> > fl="id,business_email",
>> > sort="business_email asc"),
>> > get(a)
>> > )
>> >
>> >
>> > {
>> >   "result-set": {
>> > "docs": [
>> >   {
>> > "EXCEPTION": "Index: 0, Size: 0",
>> > "EOF": true,
>> > "RESPONSE_TIME": 8
>> >   }
>> > ]
>> >   }
>> > }
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Jun 23, 2017 at 7:44 AM, Joel Bernstein 
>> > wrote:
>> >
>> > > Ok, I hadn't anticipated some of the scenarios that you've been trying
>> > out.
>> > > Particularly reading streams into variables and performing joins
>> etc...
>> > >
>> > > The main idea with variables was to use them with the new statistical
>> > > evaluators. So you perform retrievals (search, random, nodes, knn
>> etc...)
>> > > set the results to variables and then perform statistical analysis.
>> > >
>> > > The problem with joining variables is that is doesn't scale very well
>> > > because all the records are read into memory. Also the parallel stream
>> > > won't work over variables.
>> > >
>> > > Joel Bernstein
>> > > http://joelsolr.blogspot.com/
>> > >
>> > > On Thu, Jun 22, 2017 at 3:50 PM, Susheel Kumar > >
>> > > wrote:
>> > >
>> > > > Hi Joel,
>> > > >
>> > > > I am able to reproduce this in a simple way.  Looks like Let Stream
>> is
>> > > > having some issues.  Below complement function works fine if I
>> execute
>> > > > outside let and returns an EOF:true tuple but if a tuple with
>> EOF:true
>> > > > assigned to let variable, it gets changed to EXCEPTION "Index 0,
>> Size
>> > 0"
>> > > > etc.
>> > > >
>> > > > So let stream not able to handle the stream/results which has only
>> EOF
>> > > > tuple and breaks the whole let expression block
>> > > >
>> > > >
>> > > > ===Complement inside let
>> > > > let(
>> > > > a=echo(Hello),
>> > > > b=complement(sort(select(tuple(id=1,email="A"),id,email),by="id
>> > > asc,email
>> > > > asc"),
>> > > > sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"),
>> > > > on="id,email"),
>> > > > c=get(b),
>> > > > get(a)
>> > > > )
>> > > >
>> > > > Result
>> > > > ===
>> > > > {
>> > > >   "result-set": {
>> > > > "docs": [
>> > > >   {
>> > > > "EXCEPTION": "Index: 0, Size: 0",
>> > > > "EOF": true,
>> > > > "RESPONSE_TIME": 1
>> > > >   }
>> > > > ]
>> > > >   }
>> > > > }
>> > > >
>> > > > ===Complement outside let
>> > > >
>> > > > complement(sort(select(tuple(id=1,email="A"),id,email),by="id
>> > asc,email
>> > > > asc"),
>> > > > sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"),
>> > > > on="id,email")
>> > > >
>> > > > Result
>> > > > ===
>> > > > { "result-set": { "docs": [ { "EOF": true, "RESPONSE_TIME": 0 } ] }
>> }
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On Thu, Jun 22, 2017 at 11:55 AM, Susheel Kumar <
>> susheel2...@gmail.com
>> > >
>> > > > wrote:
>> > > >
>> > > > > Sorry for typo
>> > > > >
>> > > > > Facing a weird behavior when using hashJoin / innerJoin etc. The
>> > below
>> > > > > expression display tuples from variable a shown below
>> > > > >
>> > > > >
>> > > > > let(a=fetch(SMS,having(rollup(over=email,
>> > > > >  count(email),
>> > > > > select(search(SMS,
>> > > > > q=*:*,
>> > > > > fl="id,dv_sv_business_email",
>> > > > > sort="dv_sv_business_email asc"),
>> > > > >id,
>> > > > >dv_sv_business_email as email)),
>> > > > > eq(count(email),1)),
>> > > > > fl="id,dv_sv_business_email as email",
>> > > > > on="email=dv_sv_business_email"),
>> > > > > b=fetch(SMS,having(rollup(over=email,
>> > > > >  count(email),
>> > > > > select(search(SMS,
>> > > > > q=*:*,
>> > > > > fl="id,dv_sv_personal_email",
>> > > > > sort="dv_sv_personal_email asc"),
>> 

SolrJ 6.6.0 Connection pool shutdown

2017-06-27 Thread Markus Jelsma
Hi,

We have a proces checking presence of many documents in a collection, just a 
simple client.getById(id). It sometimes begins throwing lots of  these 
exceptions in a row:

org.apache.solr.client.solrj.SolrServerException: 
java.lang.IllegalStateException: Connection pool shut down

Then, as suddenly as it appeared, it's gone again a no longer a problem. I 
would expect SolrJ not to throw this but to wait until it the connection pool, 
or whatever mechanism is there, to recover.

Did i miss a magic parameter for SolrJ?

Thanks,
Markus


Re: Dynamic fields vs parent child

2017-06-27 Thread Susheel Kumar
Can you describe your use case in terms of what business functionality you
are looking to achieve.

Thanks,
Susheel

On Mon, Jun 26, 2017 at 4:26 PM, Saurabh Sethi 
wrote:

> Number of dynamic fields will be in thousands (millions of users +
> thousands of events shared between subsets of users).
>
> We also thought about indexing in one field with value being
> fieldname_fieldvalue. Since we support range queries for dates and numbers,
> it won't work out of box.
>
> On Mon, Jun 26, 2017 at 1:05 PM, Erick Erickson 
> wrote:
>
> > How many distinct fields do you expect across _all_ documents? That
> > is, if doc1 has 10 dynamic fields and doc2 has 10 dynamic fields, will
> > there be exactly 10 fields total or more than 10 when you consider
> > both documents?
> >
> > 100s of fields total across all documents is a tractable problem.
> > thousands of dynamic fields total is going to be a problem.
> >
> > One technique that people do use is to index one field with a prefix
> > rather than N dynamic fields. So you have something like
> > dyn1_val1
> > dyn1_val2
> > dyn4_val67
> >
> > Only really works with string fields of course.
> >
> > Best,
> > Erick
> >
> > On Mon, Jun 26, 2017 at 10:11 AM, Saurabh Sethi
> >  wrote:
> > > We have two requirements:
> > >
> > > 1. Indexing and storing event id and its timestamp.
> > > 2. Indexing and storing custom field name and value. The fields can be
> of
> > > any type, but for now lets say they are of types string, date and
> number.
> > >
> > > The events and custom fields for any solr document can easily be in
> > > hundreds.
> > >
> > > We are looking at two different approaches to handle these scenarios:
> > >
> > > 1. *Dynamic fields* - Have the fields name start with a particular
> > pattern
> > > like for string, the pattern could be like str_* and for event could be
> > > eventid_*
> > > 2. *Parent/child fields* - This seems to be an overkill for our use
> case
> > > since it's more for hierarchical data. Also, the parent and all its
> > > children need to be reindexed on update which defeats the purpose - we
> > are
> > > now reindexing multiple docs instead of one with dynamic fields. But it
> > > allows us to store custom field name along with its value unlike
> dynamic
> > > fields where we will have to map user supplied custom field to some
> other
> > > name based on type.
> > >
> > > Has anyone handled similar scenarios with Solr? If so, which approach
> > would
> > > you recommend based on your experience?
> > >
> > > We are using solr 6.6
> > >
> > > Thanks,
> > > Saurabh
> >
>
>
>
> --
> Saurabh Sethi
> Principal Engineer I | Engineering
>


Re: SOLR Suggester returns either the full field value or single terms only

2017-06-27 Thread Angel Todorov
Hi Alessandro,

Thanks. I've experimented a bit more and here is what I have discovered -
 If my query is enclosed with quotes, then i get multi terms, if it is not
enclosed in quotes, i only get single terms,

Example: will only return single terms:

http://localhost:8080/solr//suggest?suggest=true&suggest.dictionary=mySuggester&wt=json&suggest.q=
*video*

Sample result:

{"responseHeader":{"status":0,"QTime":32},"suggest":{"mySuggester":{"video":{"numFound":10,"suggestions":[{"term":"video","weight":4169589427484439,"payload":""},{"term":"videos","weight":274540867653296,"payload":""},{"term":"videopilot5011pilot2016","weight":137270433826648,"payload":""},{"term":"videoplaylisthttp","weight":137270433826648,"payload":""},{"term":"videotransition","weight":34317608456662,"payload":""}]
 

Example: multiterm results

http://localhost:8080/solr//suggest?suggest=true&suggest.dictionary=mySuggester&wt=json&suggest.q=
*"video"*

{"responseHeader":{"status":0,"QTime":12},"suggest":{"mySuggester":{"\"video\"":{"numFound":10,"suggestions":[{"term":"video
shows","weight":3491976244405923328,"payload":""},{"term":"video
from","weight":948906588153783552,"payload":""},{"term":"video
leaving","weight":189781317630756704,"payload":""},{"term":"video
entitled","weight":151825054104605376,"payload":""}}]  

What's more , if my query is something like "video g", i get results that
don't include "video", for example:

http://localhost:8080/solr//suggest?suggest=true&suggest.dictionary=mySuggester&wt=json&suggest.q=*"video
g"*

{"responseHeader":{"status":0,"QTime":48},"suggest":{"mySuggester":{"\"video
g\"":{"numFound":10,"suggestions":[{"term":"g
eo","weight":952090016707589760,"payload":""},{"term":"g
em","weight":297528130221121792,"payload":""},{"term":"g
spokesperson","weight":297528130221121792,"payload":""},{"term":"g
prepares","weight":238022504176897440,"payload":""}]

and so on. I am not sure why it's working in this way, but it doesn't seem
right to me. I am running on SOLR 5.1, if it makes any difference. Could it
be the solr version?

Thanks again,
Angel


On Tue, Jun 27, 2017 at 11:43 AM, alessandro.benedetti  wrote:

> Hi Angel,
> can you give me an example of query, a couple of documents of example, and
> the suggestions you get ( which you don't expect) ?
>
> The config seems fine ( I remember there were some tricky problems with the
> default separator, but a space should be fine there).
>
> Cheers
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/SOLR-Suggester-returns-either-the-full-field-
> value-or-single-terms-only-tp4342763p4342987.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: async backup

2017-06-27 Thread Damien Kamerman
yes. Requeststatus is returning state=completed prematurely.

On Tuesday, 27 June 2017, Amrit Sarkar  wrote:

> Damien,
>
> then I poll with REQUESTSTATUS
>
>
> REQUESTSTATUS is an API which provided you the status of the any API
> (including other heavy duty apis like SPLITSHARD or CREATECOLLECTION)
> associated with async_id at that current timestamp / moment. Does that give
> you "state"="completed"?
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Tue, Jun 27, 2017 at 5:25 AM, Damien Kamerman  > wrote:
>
> > A regular backup creates the files in this order:
> > drwxr-xr-x   2 root root  63 Jun 27 09:46 snapshot.shard7
> > drwxr-xr-x   2 root root 159 Jun 27 09:46 snapshot.shard8
> > drwxr-xr-x   2 root root 135 Jun 27 09:46 snapshot.shard1
> > drwxr-xr-x   2 root root 178 Jun 27 09:46 snapshot.shard3
> > drwxr-xr-x   2 root root 210 Jun 27 09:46 snapshot.shard11
> > drwxr-xr-x   2 root root 218 Jun 27 09:46 snapshot.shard9
> > drwxr-xr-x   2 root root 180 Jun 27 09:46 snapshot.shard2
> > drwxr-xr-x   2 root root 164 Jun 27 09:47 snapshot.shard5
> > drwxr-xr-x   2 root root 252 Jun 27 09:47 snapshot.shard6
> > drwxr-xr-x   2 root root 103 Jun 27 09:47 snapshot.shard12
> > drwxr-xr-x   2 root root 135 Jun 27 09:47 snapshot.shard4
> > drwxr-xr-x   2 root root 119 Jun 27 09:47 snapshot.shard10
> > drwxr-xr-x   3 root root   4 Jun 27 09:47 zk_backup
> > -rw-r--r--   1 root root 185 Jun 27 09:47 backup.properties
> >
> > While an async backup creates files in this order:
> > drwxr-xr-x   2 root root  15 Jun 27 09:49 snapshot.shard3
> > drwxr-xr-x   2 root root  15 Jun 27 09:49 snapshot.shard9
> > drwxr-xr-x   2 root root  62 Jun 27 09:49 snapshot.shard6
> > drwxr-xr-x   2 root root  37 Jun 27 09:49 snapshot.shard2
> > drwxr-xr-x   2 root root  67 Jun 27 09:49 snapshot.shard7
> > drwxr-xr-x   2 root root  75 Jun 27 09:49 snapshot.shard5
> > drwxr-xr-x   2 root root  70 Jun 27 09:49 snapshot.shard8
> > drwxr-xr-x   2 root root  15 Jun 27 09:49 snapshot.shard4
> > drwxr-xr-x   2 root root  15 Jun 27 09:50 snapshot.shard11
> > drwxr-xr-x   2 root root 127 Jun 27 09:50 snapshot.shard1
> > drwxr-xr-x   2 root root 116 Jun 27 09:50 snapshot.shard12
> > drwxr-xr-x   3 root root   4 Jun 27 09:50 zk_backup
> > -rw-r--r--   1 root root 185 Jun 27 09:50 backup.properties
> > drwxr-xr-x   2 root root  25 Jun 27 09:51 snapshot.shard10
> >
> >
> > shard10 is much larger than the other shards.
> >
> > From the logs:
> > INFO  - 2017-06-27 09:50:33.832; [   ] org.apache.solr.cloud.BackupCmd;
> > Completed backing up ZK data for backupName=collection1
> > INFO  - 2017-06-27 09:50:33.800; [   ]
> > org.apache.solr.handler.admin.CoreAdminOperation; Checking request
> status
> > for : backup1103459705035055
> > INFO  - 2017-06-27 09:50:33.800; [   ]
> > org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null
> > path=/admin/cores
> > params={qt=/admin/cores&requestid=backup1103459705035055&action=
> > REQUESTSTATUS&wt=javabin&version=2}
> > status=0 QTime=0
> > INFO  - 2017-06-27 09:51:33.405; [   ] org.apache.solr.handler.
> > SnapShooter;
> > Done creating backup snapshot: shard10 at file:///online/backup/
> > collection1
> >
> > Has anyone seen this bug, or knows a workaround?
> >
> >
> > On 27 June 2017 at 09:47, Damien Kamerman  > wrote:
> >
> > > Yes, the async command returns, and then I poll with REQUESTSTATUS.
> > >
> > > On 27 June 2017 at 01:24, Varun Thacker  > wrote:
> > >
> > >> Hi Damien,
> > >>
> > >> A backup command with async is supposed to return early. It is start
> the
> > >> backup process and return.
> > >>
> > >> Are you using the REQUESTSTATUS (
> > >> http://lucene.apache.org/solr/guide/6_6/collections-api.html
> > >> #collections-api
> > >> ) API to validate if the backup is complete?
> > >>
> > >> On Sun, Jun 25, 2017 at 10:28 PM, Damien Kamerman  >
> > >> wrote:
> > >>
> > >> > I've noticed an issue with the Solr 6.5.1 Collections API BACKUP
> async
> > >> > command returning early. The state is finished well before one shard
> > is
> > >> > finished.
> > >> >
> > >> > The collection I'm backing up has 12 shards across 6 nodes and I
> > suspect
> > >> > the issue is that it is not waiting for all backups on the node to
> > >> finish.
> > >> >
> > >> > Alternatively, I if I change the request to not be async it works OK
> > but
> > >> > sometimes I get the exception "backup the collection time out:180s".
> > >> >
> > >> > Has anyone seen this, or knows a workaround?
> > >> >
> > >> > Cheers,
> > >> > Damien.
> > >> >
> > >>
> > >

Re: async backup

2017-06-27 Thread Amrit Sarkar
Damien,

then I poll with REQUESTSTATUS


REQUESTSTATUS is an API which provided you the status of the any API
(including other heavy duty apis like SPLITSHARD or CREATECOLLECTION)
associated with async_id at that current timestamp / moment. Does that give
you "state"="completed"?

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Tue, Jun 27, 2017 at 5:25 AM, Damien Kamerman  wrote:

> A regular backup creates the files in this order:
> drwxr-xr-x   2 root root  63 Jun 27 09:46 snapshot.shard7
> drwxr-xr-x   2 root root 159 Jun 27 09:46 snapshot.shard8
> drwxr-xr-x   2 root root 135 Jun 27 09:46 snapshot.shard1
> drwxr-xr-x   2 root root 178 Jun 27 09:46 snapshot.shard3
> drwxr-xr-x   2 root root 210 Jun 27 09:46 snapshot.shard11
> drwxr-xr-x   2 root root 218 Jun 27 09:46 snapshot.shard9
> drwxr-xr-x   2 root root 180 Jun 27 09:46 snapshot.shard2
> drwxr-xr-x   2 root root 164 Jun 27 09:47 snapshot.shard5
> drwxr-xr-x   2 root root 252 Jun 27 09:47 snapshot.shard6
> drwxr-xr-x   2 root root 103 Jun 27 09:47 snapshot.shard12
> drwxr-xr-x   2 root root 135 Jun 27 09:47 snapshot.shard4
> drwxr-xr-x   2 root root 119 Jun 27 09:47 snapshot.shard10
> drwxr-xr-x   3 root root   4 Jun 27 09:47 zk_backup
> -rw-r--r--   1 root root 185 Jun 27 09:47 backup.properties
>
> While an async backup creates files in this order:
> drwxr-xr-x   2 root root  15 Jun 27 09:49 snapshot.shard3
> drwxr-xr-x   2 root root  15 Jun 27 09:49 snapshot.shard9
> drwxr-xr-x   2 root root  62 Jun 27 09:49 snapshot.shard6
> drwxr-xr-x   2 root root  37 Jun 27 09:49 snapshot.shard2
> drwxr-xr-x   2 root root  67 Jun 27 09:49 snapshot.shard7
> drwxr-xr-x   2 root root  75 Jun 27 09:49 snapshot.shard5
> drwxr-xr-x   2 root root  70 Jun 27 09:49 snapshot.shard8
> drwxr-xr-x   2 root root  15 Jun 27 09:49 snapshot.shard4
> drwxr-xr-x   2 root root  15 Jun 27 09:50 snapshot.shard11
> drwxr-xr-x   2 root root 127 Jun 27 09:50 snapshot.shard1
> drwxr-xr-x   2 root root 116 Jun 27 09:50 snapshot.shard12
> drwxr-xr-x   3 root root   4 Jun 27 09:50 zk_backup
> -rw-r--r--   1 root root 185 Jun 27 09:50 backup.properties
> drwxr-xr-x   2 root root  25 Jun 27 09:51 snapshot.shard10
>
>
> shard10 is much larger than the other shards.
>
> From the logs:
> INFO  - 2017-06-27 09:50:33.832; [   ] org.apache.solr.cloud.BackupCmd;
> Completed backing up ZK data for backupName=collection1
> INFO  - 2017-06-27 09:50:33.800; [   ]
> org.apache.solr.handler.admin.CoreAdminOperation; Checking request status
> for : backup1103459705035055
> INFO  - 2017-06-27 09:50:33.800; [   ]
> org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null
> path=/admin/cores
> params={qt=/admin/cores&requestid=backup1103459705035055&action=
> REQUESTSTATUS&wt=javabin&version=2}
> status=0 QTime=0
> INFO  - 2017-06-27 09:51:33.405; [   ] org.apache.solr.handler.
> SnapShooter;
> Done creating backup snapshot: shard10 at file:///online/backup/
> collection1
>
> Has anyone seen this bug, or knows a workaround?
>
>
> On 27 June 2017 at 09:47, Damien Kamerman  wrote:
>
> > Yes, the async command returns, and then I poll with REQUESTSTATUS.
> >
> > On 27 June 2017 at 01:24, Varun Thacker  wrote:
> >
> >> Hi Damien,
> >>
> >> A backup command with async is supposed to return early. It is start the
> >> backup process and return.
> >>
> >> Are you using the REQUESTSTATUS (
> >> http://lucene.apache.org/solr/guide/6_6/collections-api.html
> >> #collections-api
> >> ) API to validate if the backup is complete?
> >>
> >> On Sun, Jun 25, 2017 at 10:28 PM, Damien Kamerman 
> >> wrote:
> >>
> >> > I've noticed an issue with the Solr 6.5.1 Collections API BACKUP async
> >> > command returning early. The state is finished well before one shard
> is
> >> > finished.
> >> >
> >> > The collection I'm backing up has 12 shards across 6 nodes and I
> suspect
> >> > the issue is that it is not waiting for all backups on the node to
> >> finish.
> >> >
> >> > Alternatively, I if I change the request to not be async it works OK
> but
> >> > sometimes I get the exception "backup the collection time out:180s".
> >> >
> >> > Has anyone seen this, or knows a workaround?
> >> >
> >> > Cheers,
> >> > Damien.
> >> >
> >>
> >
> >
>


Re: Boosting Documents using the field Value

2017-06-27 Thread govind nitk
Hi Erick,

Finally Made it work.

bf=if(exists(query($qqone)),one_score,0)&qqone=one_query:\"google cloud\"

Thanks a lot for guiding, also reminding its not url escape.

No analyzers used.


Regards,
Govind



On Tue, Jun 27, 2017 at 11:01 AM, govind nitk  wrote:

> Hi Erick,
> I accept, I should have mentioned the what I was doing first.
>
> field types:
> one_query is "string",
> one_score is float.
>
> So No explicit analyzers.
>
> mentioned sow=false. and escaping as you mentioned. But still the error
> persist. - undefined field "cloud"
>
> Will get back.
>
> Regards,
> Givind
>
> On Tue, Jun 27, 2017 at 8:44 AM, Erick Erickson 
> wrote:
>
>> bq: So, ultimate goal is when the exact query matches in field
>> one_query, apply boost of one_score
>>
>> It would have been helpful to have made that statement in the first
>> place, would have saved some false paths.
>>
>> What is your analysis chain here? If it's anything like "text_general"
>> or the like then you're going to have some trouble. I'd think about an
>> analysis chain like KeywordTokenizerFactory and
>> LowercaseFilterFactory. That'll index the entire field as a single
>> token. The admin/analysis page is your friend.
>>
>> To search against it, you need to _escape_ the space (not "url
>> escape"). As in google\ cloud so that makes it through the query
>> parser as a single token.
>>
>> As of Solr 6.5 you can also specify sow=false (Split On Whitespace),
>> which may be a better option, see:
>> https://issues.apache.org/jira/browse/SOLR-9185
>>
>> Best,
>> Erick
>>
>> On Mon, Jun 26, 2017 at 7:32 PM, govind nitk 
>> wrote:
>> > Hi Developers, Erick
>> >
>> > I am able to add boost through function as below:
>> > bf=if(termfreq(one_query,"google"),one_score,0)
>> >
>> > Problem is when I say "google cloud" as query, it gives error:
>> > undefined field: \"cloud\""
>> >
>> > I tried encoding the query(%20, + for space), but not able to get it
>> > working.
>> >
>> > So, ultimate goal is when the exact query matches in field one_query,
>> apply
>> > boost of one_score.
>> >
>> > Is there any way to do this? Or a PR is needed.
>> >
>> >
>> > Regards,
>> > Govind
>> >
>> >
>> > On Mon, Jun 26, 2017 at 11:14 AM, govind nitk 
>> wrote:
>> >
>> >>
>> >> Hi Erick,
>> >>
>> >> Exactly this is what I was looking for.
>> >> Thanks a lot.
>> >>
>> >>
>> >> Regards,
>> >> Govind
>> >>
>> >> On Mon, Jun 26, 2017 at 12:03 AM, Erick Erickson <
>> erickerick...@gmail.com>
>> >> wrote:
>> >>
>> >>> Take a look at function queries. You're probably looking for "field",
>> >>> "termfreq" and "if" functions or some other combination like that.
>> >>>
>> >>> On Sun, Jun 25, 2017 at 9:01 AM, govind nitk 
>> >>> wrote:
>> >>> > Hi Erik, Thanks for the reply.
>> >>> >
>> >>> > My intention of using the domain_ct in the qf was, giving the weight
>> >>> > present in the that document.
>> >>> >
>> >>> > e.g
>> >>> > qf=category^domain_ct
>> >>> >
>> >>> > if the current query matched in the category, the boost given will
>> be
>> >>> > domain_ct, which is present in the current matched document.
>> >>> >
>> >>> >
>> >>> > So if I have category_1ct, category_2ct, category_3ct, category_4ct
>> as 4
>> >>> > indexed categories(text_general fields) and the same document has
>> >>> > domain_1ct, domain_2ct, domain_3ct, domain_4ct as 4 different count
>> >>> > fields(int), is there any way to achieve:
>> >>> >
>> >>> > qf=category_1ct^domain_1ct&qf=category_2ct^domain_2ct&qf=cat
>> >>> egory_3ct^domain_3ct&qf=category_4ct^domain_4ct
>> >>> >   ?
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> > Regards
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> > On Sat, Jun 24, 2017 at 3:42 PM, Erik Hatcher <
>> erik.hatc...@gmail.com>
>> >>> > wrote:
>> >>> >
>> >>> >> With dismax use bf=domain_ct. you can also use boost=domain_ct with
>> >>> >> edismax.
>> >>> >>
>> >>> >> > On Jun 23, 2017, at 23:01, govind nitk 
>> >>> wrote:
>> >>> >> >
>> >>> >> > Hi Solr,
>> >>> >> >
>> >>> >> > My Index Data:
>> >>> >> >
>> >>> >> > id name category domain domain_ct
>> >>> >> > 1 Banana Fruits Home > Fruits > Banana 2
>> >>> >> > 2 Orange Fruits Home > Fruits > Orange 4
>> >>> >> > 3 Samsung Mobile Electronics > Mobile > Samsung 3
>> >>> >> >
>> >>> >> >
>> >>> >> > I am able to retrieve the documents with dismax parser with the
>> >>> weights
>> >>> >> > mentioned as below.
>> >>> >> >
>> >>> >> > http://localhost:8983/solr/my_index/select?defType=dismax&;
>> >>> >> indent=on&q=fruits&qf=category
>> >>> >> > ^0.9&qf=name^0.7&wt=json
>> >>> >> >
>> >>> >> >
>> >>> >> > Is it possible to retrieve the documents with weight taken from
>> the
>> >>> >> indexed
>> >>> >> > field like:
>> >>> >> >
>> >>> >> > http://localhost:8983/solr/my_index/select?defType=dismax&;
>> >>> >> indent=on&q=fruits&qf=category
>> >>> >> > ^domain_ct&qf=name^domain_ct&wt=json
>> >>> >> >
>> >>> >> > Is this possible to give weight from an indexed field ? Am I
>> doing
>> >>> >> > something wrong?
>> >>> >> > Is there any other way of

Re: Using of Streaming to join between shards

2017-06-27 Thread mganeshs
Hi Susheel,

Thanks for your reply and as you suggested we will start with innerJoin.

But what I want know is that, Is Streaming can be used instead of normal
default Join ? 

For ex. currently we fire request for every user clicks on menu in the page
to show list of his documents with default JOIN and it works well without
any issues with 100 concurrent users as well or even more than that
concurrency.

Can we do same for streaming join as well ? I just want to know whether
concurrent streaming request will create heavy load to solr server or it's
same as default join. What would be penalty of using streaming concurrently
instead of default join ?

Kindly throw some light on this topic.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-of-Streaming-to-join-between-shards-tp4342563p4343005.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR Suggester returns either the full field value or single terms only

2017-06-27 Thread alessandro.benedetti
Hi Angel, 
can you give me an example of query, a couple of documents of example, and
the suggestions you get ( which you don't expect) ?

The config seems fine ( I remember there were some tricky problems with the
default separator, but a space should be fine there).

Cheers



-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Suggester-returns-either-the-full-field-value-or-single-terms-only-tp4342763p4342987.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 5.5 - spatial intersects query returns results outside of search box

2017-06-27 Thread Leila Gonzales
Hi all,


I’m running on Solr 5.5 and have run into an issue where the Solr spatial
search is returning results outside of the search rectangle parameters, and
I’m not quite sure what is causing this to happen. Thank you in advance for
any troubleshooting tips you can pass along.



The spatial field, location_geo, is defined in the schema.xml as follows:







The spatial query parameters are as follows:

params

q "*:*"

indent  "true"

fl"id, location_geo"

fq"location_geo:\"Intersects(ENVELOPE(-60, -55, 40, 37))\""

wt"json"

debugQuery  "true"



The results include these records:

  {

"id": "5230",

"location_geo":
["ENVELOPE(-75.0,-75.939723,39.3597224,38.289722)"]

  },

  {

"id": "9414",

"location_geo": ["ENVELOPE(173.0,-10.0,84.0,8.0)"]

  },

  {

"id": "5498",

"location_geo":
["ENVELOPE(-75.0,-75.799721,39.899722,38.399722)"]

  },

  {

"id": "6023",

"location_geo": ["ENVELOPE(-102.0,-35.0,37.0,35.5)"]

  }



The debug output is as follows:



  "debug": {

"rawquerystring": "*:*",

"querystring": "*:*",

"parsedquery": "MatchAllDocsQuery(*:*)",

"parsedquery_toString": "*:*",

"explain": {

  "5230": "\n1.0 = *:*, product of:\n  1.0 = boost\n  1.0 =
queryNorm\n",

  "5498": "\n1.0 = *:*, product of:\n  1.0 = boost\n  1.0 =
queryNorm\n",

  "9414": "\n1.0 = *:*, product of:\n  1.0 = boost\n  1.0 =
queryNorm\n",

  "6023": "\n1.0 = *:*, product of:\n  1.0 = boost\n  1.0 = queryNorm\n"

},

"QParser": "LuceneQParser",

"filter_queries": [

  "location_geo:\"Intersects(ENVELOPE(-60, -55, 40, 37))\""

],

"parsed_filter_queries": [


"IntersectsPrefixTreeQuery(IntersectsPrefixTreeQuery(fieldName=location_geo,queryShape=Rect(minX=-60.0,maxX=-55.0,minY=37.0,maxY=40.0),detailLevel=5,prefixGridScanLevel=7))"

],



Thanks for any help and/or tips you can pass along.



Kind regards,
Leila