Re: setup solrcloud from scratch vie web-ui

2017-05-16 Thread Thomas Porschberg
Hi,

I did not manipulating the data dir. What I did was:

1. Downloaded solr-6.5.1.zip
2. ensured no solr process is running
3. unzipped solr-6.5.1.zip to ~/solr_new2/solr-6.5.1
3. started an external zookeeper 
4. copied a conf directory from a working non-cloudsolr (6.5.1) to 
   ~/solr_new2/solr-6.5.1 so that I have ~/solr_new2/solr-6.5.1/conf
  (see http://randspringer.de/solrcloud_test/my.zip for content)
5. postd the conf to zookeeper with:
   bin/solr zk upconfig -n heise -d ./conf -z localhost:2181
6. started solr in cloud mode with
   bin/solr -c -z localhost:2181
7. tried to create a acollection with
   bin/solr create -c heise -shards 2
   -->failed with:
  
Connecting to ZooKeeper at localhost:2181 ...
INFO  - 2017-05-17 07:06:38.249; 
org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster at 
localhost:2181 ready
Re-using existing configuration directory heise

Creating new collection 'heise' using command:
http://localhost:8983/solr/admin/collections?action=CREATE=heise=2=1=2=heise


ERROR: Failed to create collection 'heise' due to: 
{127.0.1.1:8983_solr=org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
 from server at http://127.0.1.1:8983/solr: Error CREATEing SolrCore 
'heise_shard2_replica1': Unable to create core [heise_shard2_replica1] Caused 
by: Lock held by this virtual machine: 
/home/pberg/solr_new2/solr-6.5.1/server/data/index/write.lock}

8. Tried with 1 shard, worked -->
pberg@porschberg:~/solr_new2/solr-6.5.1$ bin/solr create -c heise -shards 1

Connecting to ZooKeeper at localhost:2181 ...
INFO  - 2017-05-17 07:21:01.632; 
org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster at 
localhost:2181 ready
Re-using existing configuration directory heise

Creating new collection 'heise' using command:
http://localhost:8983/solr/admin/collections?action=CREATE=heise=1=1=1=heise

{
  "responseHeader":{
"status":0,
"QTime":2577},
  "success":{"127.0.1.1:8983_solr":{
  "responseHeader":{
"status":0,
"QTime":1441},
  "core":"heise_shard1_replica1"}}}


What did I wrong? I want to use multiple shards on ONE node.

Best regards 
Thomas



> Shawn Heisey  hat am 16. Mai 2017 um 16:30 geschrieben:
> 
> 
> On 5/12/2017 8:49 AM, Thomas Porschberg wrote:
> > ERROR: Failed to create collection 'cat' due to: 
> > {127.0.1.1:8983_solr=org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
> >  from server at http://127.0.1.1:8983/solr: Error CREATEing SolrCore 
> > 'cat_shard1_replica1': Unable to create core [cat_shard1_replica1] Caused 
> > by: Lock held by this virtual machine: 
> > /home/pberg/solr_new2/solr-6.5.1/server/data/bestand/index/write.lock}
> 
> The same Solr instance is already holding the lock on the index at
> /home/pberg/solr_new2/solr-6.5.1/server/data/bestand/index.  This means
> that Solr already has a core using that index directory.
> 
> If the write.lock were present but wasn't being held by the same
> instance, then the message would have said it was held by another program.
> 
> This sounds like you are manually manipulating settings like dataDir. 
> When you start the server from an extracted download (not as a service)
> and haven't messed with any configurations, the index directory for a
> single-shard single-replica "cat" collection should be something like
> the following, and should not be overridden unless you understand
> *EXACTLY* how SolrCloud functions and have a REALLY good reason for
> changing it:
> 
> /home/pberg/solr_new2/solr-6.5.1/server/solr/cat_shard1_replica1/data/index
> 
> On the "Sorry, no dataimport-handler defined!" problem, this is
> happening because the solrconfig.xml file being used by the collection
> does not have any configuration for the dataimport handler.  It's not
> enough to add a DIH config file, solrconfig.xml must have a dataimport
> handler defined that references the DIH config file.
> 
> Thanks,
> Shawn
>


Re: SolrJ - How to add a blocked document without child documents

2017-05-16 Thread Jeffery Yuan
Yes, the id is the unique key.

I think maybe this is because the first one (a parent doc(Parent1) without
any children) is not a block (I don't really know what's the term), so later
when we add same parent (Parent2) with some children, the first one is
somehow left alone.

- If we update the parent document again with some new child documents, it
will update Parent2 correctly, but still leave/keep Parent1.

This issue is talked in some jiras like
https://issues.apache.org/jira/browse/SOLR-6096.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-How-to-add-a-blocked-document-without-child-documents-tp4335006p4335441.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ - How to add a blocked document without child documents

2017-05-16 Thread Zheng Lin Edwin Yeo
Is the id your unique key in the collections? By right if your id is the
unique key, it will be overwritten automatically if the id is the same,
when you add the same parent documents with child documents.

Regards,
Edwin


On 16 May 2017 at 08:25, Jeffery Yuan  wrote:

> Hi, Damien Kamerman
>
>   Thanks for your reply. The problem is when we add a parent documents
> which
> doesn't contain child info yet.
>   Later we will add same parent documents with child documents.
>
>   But this would cause 2 parent documents with same id in the solr index.
>
>   I workaround this issue by always deleting first, but I am wondering
> whether there is better approach.
>
> Thanks
> Jeffery Yuan
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/SolrJ-How-to-add-a-blocked-document-without-child-documents-
> tp4335006p4335195.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Possible regression in Parallel SQL in 6.5.1?

2017-05-16 Thread Kevin Risden
Well didn't take as long as I thought:
https://issues.apache.org/jira/browse/CALCITE-1306

Once Calcite 1.13 is released we should upgrade and get support for this
again.

Kevin Risden

On Tue, May 16, 2017 at 7:23 PM, Kevin Risden 
wrote:

> Yea this came up on the calcite mailing list. Not sure if aliases in the
> having clause were going to be added. I'll have to see if I can find that
> discussion or JIRA.
>
> Kevin Risden
>
> On May 16, 2017 18:54, "Joel Bernstein"  wrote:
>
>> Yeah, Calcite doesn't support field aliases in the having clause. The
>> query
>> should work if you use count(*). We could consider this a regression, but
>> I
>> think this will be a won't fix.
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Tue, May 16, 2017 at 12:51 PM, Timothy Potter 
>> wrote:
>>
>> > This SQL used to work pre-calcite:
>> >
>> > SELECT movie_id, COUNT(*) as num_ratings, avg(rating) as aggAvg FROM
>> > ratings GROUP BY movie_id HAVING num_ratings > 100 ORDER BY aggAvg ASC
>> > LIMIT 10
>> >
>> > Now I get:
>> > Caused by: java.io.IOException: -->
>> > http://192.168.1.4:8983/solr/ratings_shard2_replica1/:Failed to
>> > execute sqlQuery 'SELECT movie_id, COUNT(*) as num_ratings,
>> > avg(rating) as aggAvg FROM ratings GROUP BY movie_id HAVING
>> > num_ratings > 100 ORDER BY aggAvg ASC LIMIT 10' against JDBC
>> > connection 'jdbc:calcitesolr:'.
>> > Error while executing SQL "SELECT movie_id, COUNT(*) as num_ratings,
>> > avg(rating) as aggAvg FROM ratings GROUP BY movie_id HAVING
>> > num_ratings > 100 ORDER BY aggAvg ASC LIMIT 10": From line 1, column
>> > 103 to line 1, column 113: Column 'num_ratings' not found in any table
>> > at org.apache.solr.client.solrj.io.stream.SolrStream.read(
>> > SolrStream.java:235)
>> > at com.lucidworks.spark.query.TupleStreamIterator.fetchNextTupl
>> e(
>> > TupleStreamIterator.java:82)
>> > at com.lucidworks.spark.query.TupleStreamIterator.hasNext(
>> > TupleStreamIterator.java:47)
>> > ... 31 more
>> >
>>
>


Re: Possible regression in Parallel SQL in 6.5.1?

2017-05-16 Thread Kevin Risden
Yea this came up on the calcite mailing list. Not sure if aliases in the
having clause were going to be added. I'll have to see if I can find that
discussion or JIRA.

Kevin Risden

On May 16, 2017 18:54, "Joel Bernstein"  wrote:

> Yeah, Calcite doesn't support field aliases in the having clause. The query
> should work if you use count(*). We could consider this a regression, but I
> think this will be a won't fix.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, May 16, 2017 at 12:51 PM, Timothy Potter 
> wrote:
>
> > This SQL used to work pre-calcite:
> >
> > SELECT movie_id, COUNT(*) as num_ratings, avg(rating) as aggAvg FROM
> > ratings GROUP BY movie_id HAVING num_ratings > 100 ORDER BY aggAvg ASC
> > LIMIT 10
> >
> > Now I get:
> > Caused by: java.io.IOException: -->
> > http://192.168.1.4:8983/solr/ratings_shard2_replica1/:Failed to
> > execute sqlQuery 'SELECT movie_id, COUNT(*) as num_ratings,
> > avg(rating) as aggAvg FROM ratings GROUP BY movie_id HAVING
> > num_ratings > 100 ORDER BY aggAvg ASC LIMIT 10' against JDBC
> > connection 'jdbc:calcitesolr:'.
> > Error while executing SQL "SELECT movie_id, COUNT(*) as num_ratings,
> > avg(rating) as aggAvg FROM ratings GROUP BY movie_id HAVING
> > num_ratings > 100 ORDER BY aggAvg ASC LIMIT 10": From line 1, column
> > 103 to line 1, column 113: Column 'num_ratings' not found in any table
> > at org.apache.solr.client.solrj.io.stream.SolrStream.read(
> > SolrStream.java:235)
> > at com.lucidworks.spark.query.TupleStreamIterator.
> fetchNextTuple(
> > TupleStreamIterator.java:82)
> > at com.lucidworks.spark.query.TupleStreamIterator.hasNext(
> > TupleStreamIterator.java:47)
> > ... 31 more
> >
>


Re: solr /export handler - behavior during close()

2017-05-16 Thread Joel Bernstein
Yep, saw it. I'll comment on the ticket for what I believe needs to be done.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, May 16, 2017 at 8:00 PM, Varun Thacker  wrote:

> Hi Joel,Susmit
>
> I created https://issues.apache.org/jira/browse/SOLR-10698 to track the
> issue
>
> @Susmit looking at the stack trace I see the expression is using
> JSONTupleStream
> . I wonder if you tried using JavabinTupleStreamParser could it help
> improve performance ?
>
> On Tue, May 16, 2017 at 9:39 AM, Susmit Shukla 
> wrote:
>
> > Hi Joel,
> >
> > queries can be arbitrarily nested with AND/OR/NOT joins e.g.
> >
> > (intersect(intersect(search, search), union(search, search))). If I cut
> off
> > the innermost stream with a limit, the complete intersection would not
> > happen at upper levels. Also would the limit stream have same effect as
> > using /select handler with rows parameter?
> > I am trying to force input stream close through reflection, just to see
> if
> > it gives performance gains.
> >
> > 2) would experiment with null streams. Is workers = number of replicas in
> > data collection a good thumb rule? is parallelstream performance upper
> > bounded by number of replicas?
> >
> > Thanks,
> > Susmit
> >
> > On Tue, May 16, 2017 at 5:59 AM, Joel Bernstein 
> > wrote:
> >
> > > Your approach looks OK. The single sharded worker collection is only
> > needed
> > > if you were using CloudSolrStream to send the initial Streaming
> > Expression
> > > to the /stream handler. You are not doing this, so you're approach is
> > fine.
> > >
> > > Here are some thoughts on what you described:
> > >
> > > 1) If you are closing the parallel stream after the top 1000 results,
> > then
> > > try wrapping the intersect in a LimitStream. This stream doesn't exist
> > yet
> > > so it will be a custom stream. The LimitStream can return the EOF tuple
> > > after it reads N tuples. This will cause the worker nodes to close the
> > > underlying stream and cause the Broken Pipe exception to occur at the
> > > /export handler, which will stop the /export.
> > >
> > > Here is the basic approach:
> > >
> > > parallel(limit(intersect(search, search)))
> > >
> > >
> > > 2) It can be tricky to understand where the bottleneck lies when using
> > the
> > > ParallelStream for parallel relational algebra. You can use the
> > NullStream
> > > to get an understanding of why performance is not increasing when you
> > > increase the workers. Here is the basic approach:
> > >
> > > parallel(null(intersect(search, search)))
> > >
> > > The NullStream will eat all the tuples on the workers and return a
> single
> > > tuple with the tuple count and the time taken to run the expression. So
> > > you'll get one tuple from each worker. This will eliminate any
> bottleneck
> > > on tuples returning through the ParallelStream and you can focus on the
> > > performance of the intersect and the /export handler.
> > >
> > > Then experiment with:
> > >
> > > 1) Increasing the number of parallel workers.
> > > 2) Increasing the number of replicas in the data collections.
> > >
> > > And watch the timing information coming back from the NullStream
> tuples.
> > If
> > > increasing the workers is not improving performance then the bottleneck
> > may
> > > be in the /export handler. So try increasing replicas and see if that
> > > improves performance. Different partitions of the streams will be
> served
> > by
> > > different replicas.
> > >
> > > If performance doesn't improve with the NullStream after increasing
> both
> > > workers and replicas then we know the bottleneck is the network.
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Mon, May 15, 2017 at 10:37 PM, Susmit Shukla <
> shukla.sus...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi Joel,
> > > >
> > > > Regarding the implementation, I am wrapping the topmost TupleStream
> in
> > a
> > > > ParallelStream and execute it on the worker cluster (one of the
> joined
> > > > cluster doubles up as worker cluster). ParallelStream does submit the
> > > query
> > > > to /stream handler.
> > > > for #2, for e.g. I am creating 2 CloudSolrStreams , wrapping them in
> > > > IntersectStream and wrapping that in ParallelStream and reading out
> the
> > > > tuples from parallel stream. close() is called on parallelStream. I
> do
> > > have
> > > > custom streams but that is similar to intersectStream.
> > > > I am on solr 6.3.1
> > > > The 2 solr clusters serving the join queries are having many shards.
> > > Worker
> > > > collection is also multi sharded and is one from the main clusters,
> so
> > do
> > > > you imply I should be using a single sharded "worker" collection?
> Would
> > > the
> > > > joins execute faster?
> > > > On a side note, increasing the workers beyond 1 was not improving the
> > > > execution times but was degrading if number was 3 and above. That is
> > > > counter intuitive since the joins 

Re: solr /export handler - behavior during close()

2017-05-16 Thread Varun Thacker
Hi Joel,Susmit

I created https://issues.apache.org/jira/browse/SOLR-10698 to track the
issue

@Susmit looking at the stack trace I see the expression is using
JSONTupleStream
. I wonder if you tried using JavabinTupleStreamParser could it help
improve performance ?

On Tue, May 16, 2017 at 9:39 AM, Susmit Shukla 
wrote:

> Hi Joel,
>
> queries can be arbitrarily nested with AND/OR/NOT joins e.g.
>
> (intersect(intersect(search, search), union(search, search))). If I cut off
> the innermost stream with a limit, the complete intersection would not
> happen at upper levels. Also would the limit stream have same effect as
> using /select handler with rows parameter?
> I am trying to force input stream close through reflection, just to see if
> it gives performance gains.
>
> 2) would experiment with null streams. Is workers = number of replicas in
> data collection a good thumb rule? is parallelstream performance upper
> bounded by number of replicas?
>
> Thanks,
> Susmit
>
> On Tue, May 16, 2017 at 5:59 AM, Joel Bernstein 
> wrote:
>
> > Your approach looks OK. The single sharded worker collection is only
> needed
> > if you were using CloudSolrStream to send the initial Streaming
> Expression
> > to the /stream handler. You are not doing this, so you're approach is
> fine.
> >
> > Here are some thoughts on what you described:
> >
> > 1) If you are closing the parallel stream after the top 1000 results,
> then
> > try wrapping the intersect in a LimitStream. This stream doesn't exist
> yet
> > so it will be a custom stream. The LimitStream can return the EOF tuple
> > after it reads N tuples. This will cause the worker nodes to close the
> > underlying stream and cause the Broken Pipe exception to occur at the
> > /export handler, which will stop the /export.
> >
> > Here is the basic approach:
> >
> > parallel(limit(intersect(search, search)))
> >
> >
> > 2) It can be tricky to understand where the bottleneck lies when using
> the
> > ParallelStream for parallel relational algebra. You can use the
> NullStream
> > to get an understanding of why performance is not increasing when you
> > increase the workers. Here is the basic approach:
> >
> > parallel(null(intersect(search, search)))
> >
> > The NullStream will eat all the tuples on the workers and return a single
> > tuple with the tuple count and the time taken to run the expression. So
> > you'll get one tuple from each worker. This will eliminate any bottleneck
> > on tuples returning through the ParallelStream and you can focus on the
> > performance of the intersect and the /export handler.
> >
> > Then experiment with:
> >
> > 1) Increasing the number of parallel workers.
> > 2) Increasing the number of replicas in the data collections.
> >
> > And watch the timing information coming back from the NullStream tuples.
> If
> > increasing the workers is not improving performance then the bottleneck
> may
> > be in the /export handler. So try increasing replicas and see if that
> > improves performance. Different partitions of the streams will be served
> by
> > different replicas.
> >
> > If performance doesn't improve with the NullStream after increasing both
> > workers and replicas then we know the bottleneck is the network.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Mon, May 15, 2017 at 10:37 PM, Susmit Shukla  >
> > wrote:
> >
> > > Hi Joel,
> > >
> > > Regarding the implementation, I am wrapping the topmost TupleStream in
> a
> > > ParallelStream and execute it on the worker cluster (one of the joined
> > > cluster doubles up as worker cluster). ParallelStream does submit the
> > query
> > > to /stream handler.
> > > for #2, for e.g. I am creating 2 CloudSolrStreams , wrapping them in
> > > IntersectStream and wrapping that in ParallelStream and reading out the
> > > tuples from parallel stream. close() is called on parallelStream. I do
> > have
> > > custom streams but that is similar to intersectStream.
> > > I am on solr 6.3.1
> > > The 2 solr clusters serving the join queries are having many shards.
> > Worker
> > > collection is also multi sharded and is one from the main clusters, so
> do
> > > you imply I should be using a single sharded "worker" collection? Would
> > the
> > > joins execute faster?
> > > On a side note, increasing the workers beyond 1 was not improving the
> > > execution times but was degrading if number was 3 and above. That is
> > > counter intuitive since the joins are huge and putting more workers
> > should
> > > have improved the performance.
> > >
> > > Thanks,
> > > Susmit
> > >
> > >
> > > On Mon, May 15, 2017 at 6:47 AM, Joel Bernstein 
> > > wrote:
> > >
> > > > Ok please do report any issues you run into. This is quite a good bug
> > > > report.
> > > >
> > > > I reviewed the code and I believe I see the problem. The problem
> seems
> > to
> > > > be that output code from the /stream handler is 

Re: Possible regression in Parallel SQL in 6.5.1?

2017-05-16 Thread Joel Bernstein
Yeah, Calcite doesn't support field aliases in the having clause. The query
should work if you use count(*). We could consider this a regression, but I
think this will be a won't fix.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, May 16, 2017 at 12:51 PM, Timothy Potter 
wrote:

> This SQL used to work pre-calcite:
>
> SELECT movie_id, COUNT(*) as num_ratings, avg(rating) as aggAvg FROM
> ratings GROUP BY movie_id HAVING num_ratings > 100 ORDER BY aggAvg ASC
> LIMIT 10
>
> Now I get:
> Caused by: java.io.IOException: -->
> http://192.168.1.4:8983/solr/ratings_shard2_replica1/:Failed to
> execute sqlQuery 'SELECT movie_id, COUNT(*) as num_ratings,
> avg(rating) as aggAvg FROM ratings GROUP BY movie_id HAVING
> num_ratings > 100 ORDER BY aggAvg ASC LIMIT 10' against JDBC
> connection 'jdbc:calcitesolr:'.
> Error while executing SQL "SELECT movie_id, COUNT(*) as num_ratings,
> avg(rating) as aggAvg FROM ratings GROUP BY movie_id HAVING
> num_ratings > 100 ORDER BY aggAvg ASC LIMIT 10": From line 1, column
> 103 to line 1, column 113: Column 'num_ratings' not found in any table
> at org.apache.solr.client.solrj.io.stream.SolrStream.read(
> SolrStream.java:235)
> at com.lucidworks.spark.query.TupleStreamIterator.fetchNextTuple(
> TupleStreamIterator.java:82)
> at com.lucidworks.spark.query.TupleStreamIterator.hasNext(
> TupleStreamIterator.java:47)
> ... 31 more
>


Re: Best practices for backup & restore

2017-05-16 Thread Dave
I think it's depends what you are backing up and restoring from. Hardware 
failure? Accidental delete?  For my use case my master indexer stores the index 
on a San with daily snap shots for reliability, then my live searching master 
is on a San as well, my live slave searchers are all on SSD drives for speed. 
In my situation that means the test index is backed up daily. A copy of the 
live index is backed up daily and the SSD's can die and it doesn't matter to 
me.  I don't think there is a best practice, just find how risk adverse you are 
and how much performance you require

> On May 16, 2017, at 6:38 PM, Jay Potharaju  wrote:
> 
> Hi,
> I was wondering if there are any best practices for doing solr backup &
> restore. In the past when running backup, I stopped indexing during the
> backup process.
> 
> I am looking at this documentation and it says that indexing can continue
> when backup is in progress.
> https://cwiki.apache.org/confluence/display/solr/Making+and+Restoring+Backups
> 
> Any recommendations ?
> 
> -- 
> Thanks
> Jay


Best practices for backup & restore

2017-05-16 Thread Jay Potharaju
Hi,
I was wondering if there are any best practices for doing solr backup &
restore. In the past when running backup, I stopped indexing during the
backup process.

I am looking at this documentation and it says that indexing can continue
when backup is in progress.
https://cwiki.apache.org/confluence/display/solr/Making+and+Restoring+Backups

Any recommendations ?

-- 
Thanks
Jay


Re: Solr Admin Documents tab

2017-05-16 Thread Chris Ulicny
If you pass the array of documents without the opening and closing square
brackets it should work with the page defaults (at least in v6.3.0)

{ "id":"1",...},{"id":"2",...},...
instead of
[{ "id":"1",...},{"id":"2",...},...]

Best,
Chris

On Tue, May 16, 2017 at 2:42 PM Rick Leir  wrote:

> Hi all,
> In the Solr Admin Documents tab, with the document type set to JSON, I
> cannot get it to accept more than one document. The legend says
> "Document(s)". What syntax is expected? It rejects an array of documents.
> Thanks -- Rick
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com


Re: Shard marked as down while its operational & SOLR-9120

2017-05-16 Thread Susheel Kumar
Do not see any similar issue fixed part of later releases.

Anyone experienced similar issue / suggestions to look into?

Thanks,
Susheel

On Tue, May 16, 2017 at 2:43 PM, Walter Underwood 
wrote:

> Look at all the bugs fixed or reported after 6.0.0. This might have been
> reported and their might be a workaround.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On May 16, 2017, at 11:41 AM, Susheel Kumar 
> wrote:
> >
> > Hi Walter - will upgrade to 6.5.1 but this will take time to go from all
> > the environments to Prod.
> >
> > Looking for something short term while we upgrade.
> >
> > Though we can connect to Solr Admin panel on the shard which is down but
> if
> > I query shards.info,  then that shard is not being shown/queried.  So
> its
> > partially down
> >
> > "*shards.info *":{
> >
> >"*http://server1:8080/solr/COL_shard3_replica2/
> > *":{
> >
> >  "*numFound*":38013303,
> >
> >  "*maxScore*":1.0,
> >
> >  "*shardAddress*":"http://server1:8080/solr/COL_shard3_replica2/;,
> >
> >  "*time*":2},
> >
> >"*http://server2:8080/solr/COL_shard4_replica2/|http://
> server3:8080/solr/COL_shard4_replica1/
> >  server3:8080/solr/COL_shard4_replica1/>*
> > ":{
> >
> >  "*numFound*":43816942,
> >
> >  "*maxScore*":1.0,
> >
> >  "*shardAddress*":"http://server3:8080/solr/COL_shard4_replica1;,
> >
> >  "*time*":2},
> >
> > …….
> >
> > ………..
> >
> >
> >
> > On Tue, May 16, 2017 at 1:55 PM, Walter Underwood  >
> > wrote:
> >
> >> I would upgrade to 6.5.1 before doing anything else. 6.0.0 is more than
> a
> >> year old.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>
> >>> On May 16, 2017, at 10:27 AM, Susheel Kumar 
> >> wrote:
> >>>
> >>> Also this is for Solr-6.0.0.
> >>>
> >>> On Tue, May 16, 2017 at 12:41 PM, Susheel Kumar  >
> >>> wrote:
> >>>
>  Hello,
> 
>  One of the shard (out of 6 six shards and 6 replica's) we have, is
> being
>  shown as down in Solr Admin Cloud Panel (orange color) while the Solr
>  process is running and even can connect to it Solr Admin Cloud panel.
> >> Each
>  shard size is around 18GB and have around 40+ million docs.
> 
>  Also I do notice lots of below warnings as described by SOLR-9120.
> 
>  Any idea what is going on and if it is due to SOLR-9120.
> 
>  Thanks,
>  Susheel
> 
>  LukeRequestHandler Error getting file length for [segments_1p6l]
> 
>  java.nio.file.NoSuchFileException: ..._shard3_replica1/data/
> >> index.20170128034221957/segments_1p6l
>  at sun.nio.fs.UnixException.translateToIOException(
> >> UnixException.java:86)
>  at sun.nio.fs.UnixException.rethrowAsIOException(
> >> UnixException.java:102)
>  at sun.nio.fs.UnixException.rethrowAsIOException(
> >> UnixException.java:107)
>  at sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(
> >> UnixFileAttributeViews.java:55)
> 
> 
> >>
> >>
>
>


Re: Shard marked as down while its operational & SOLR-9120

2017-05-16 Thread Walter Underwood
Look at all the bugs fixed or reported after 6.0.0. This might have been 
reported and their might be a workaround.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On May 16, 2017, at 11:41 AM, Susheel Kumar  wrote:
> 
> Hi Walter - will upgrade to 6.5.1 but this will take time to go from all
> the environments to Prod.
> 
> Looking for something short term while we upgrade.
> 
> Though we can connect to Solr Admin panel on the shard which is down but if
> I query shards.info,  then that shard is not being shown/queried.  So its
> partially down
> 
> "*shards.info *":{
> 
>"*http://server1:8080/solr/COL_shard3_replica2/
> *":{
> 
>  "*numFound*":38013303,
> 
>  "*maxScore*":1.0,
> 
>  "*shardAddress*":"http://server1:8080/solr/COL_shard3_replica2/;,
> 
>  "*time*":2},
> 
>
> "*http://server2:8080/solr/COL_shard4_replica2/|http://server3:8080/solr/COL_shard4_replica1/
> *
> ":{
> 
>  "*numFound*":43816942,
> 
>  "*maxScore*":1.0,
> 
>  "*shardAddress*":"http://server3:8080/solr/COL_shard4_replica1;,
> 
>  "*time*":2},
> 
> …….
> 
> ………..
> 
> 
> 
> On Tue, May 16, 2017 at 1:55 PM, Walter Underwood 
> wrote:
> 
>> I would upgrade to 6.5.1 before doing anything else. 6.0.0 is more than a
>> year old.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On May 16, 2017, at 10:27 AM, Susheel Kumar 
>> wrote:
>>> 
>>> Also this is for Solr-6.0.0.
>>> 
>>> On Tue, May 16, 2017 at 12:41 PM, Susheel Kumar 
>>> wrote:
>>> 
 Hello,
 
 One of the shard (out of 6 six shards and 6 replica's) we have, is being
 shown as down in Solr Admin Cloud Panel (orange color) while the Solr
 process is running and even can connect to it Solr Admin Cloud panel.
>> Each
 shard size is around 18GB and have around 40+ million docs.
 
 Also I do notice lots of below warnings as described by SOLR-9120.
 
 Any idea what is going on and if it is due to SOLR-9120.
 
 Thanks,
 Susheel
 
 LukeRequestHandler Error getting file length for [segments_1p6l]
 
 java.nio.file.NoSuchFileException: ..._shard3_replica1/data/
>> index.20170128034221957/segments_1p6l
 at sun.nio.fs.UnixException.translateToIOException(
>> UnixException.java:86)
 at sun.nio.fs.UnixException.rethrowAsIOException(
>> UnixException.java:102)
 at sun.nio.fs.UnixException.rethrowAsIOException(
>> UnixException.java:107)
 at sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(
>> UnixFileAttributeViews.java:55)
 
 
>> 
>> 



Solr Admin Documents tab

2017-05-16 Thread Rick Leir
Hi all,
In the Solr Admin Documents tab, with the document type set to JSON, I cannot 
get it to accept more than one document. The legend says "Document(s)". What 
syntax is expected? It rejects an array of documents. Thanks -- Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Shard marked as down while its operational & SOLR-9120

2017-05-16 Thread Susheel Kumar
Hi Walter - will upgrade to 6.5.1 but this will take time to go from all
the environments to Prod.

Looking for something short term while we upgrade.

Though we can connect to Solr Admin panel on the shard which is down but if
I query shards.info,  then that shard is not being shown/queried.  So its
partially down

"*shards.info *":{

"*http://server1:8080/solr/COL_shard3_replica2/
*":{

  "*numFound*":38013303,

  "*maxScore*":1.0,

  "*shardAddress*":"http://server1:8080/solr/COL_shard3_replica2/;,

  "*time*":2},


"*http://server2:8080/solr/COL_shard4_replica2/|http://server3:8080/solr/COL_shard4_replica1/
*
":{

  "*numFound*":43816942,

  "*maxScore*":1.0,

  "*shardAddress*":"http://server3:8080/solr/COL_shard4_replica1;,

  "*time*":2},

…….

………..



On Tue, May 16, 2017 at 1:55 PM, Walter Underwood 
wrote:

> I would upgrade to 6.5.1 before doing anything else. 6.0.0 is more than a
> year old.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On May 16, 2017, at 10:27 AM, Susheel Kumar 
> wrote:
> >
> > Also this is for Solr-6.0.0.
> >
> > On Tue, May 16, 2017 at 12:41 PM, Susheel Kumar 
> > wrote:
> >
> >> Hello,
> >>
> >> One of the shard (out of 6 six shards and 6 replica's) we have, is being
> >> shown as down in Solr Admin Cloud Panel (orange color) while the Solr
> >> process is running and even can connect to it Solr Admin Cloud panel.
> Each
> >> shard size is around 18GB and have around 40+ million docs.
> >>
> >> Also I do notice lots of below warnings as described by SOLR-9120.
> >>
> >> Any idea what is going on and if it is due to SOLR-9120.
> >>
> >> Thanks,
> >> Susheel
> >>
> >> LukeRequestHandler Error getting file length for [segments_1p6l]
> >>
> >> java.nio.file.NoSuchFileException: ..._shard3_replica1/data/
> index.20170128034221957/segments_1p6l
> >>  at sun.nio.fs.UnixException.translateToIOException(
> UnixException.java:86)
> >>  at sun.nio.fs.UnixException.rethrowAsIOException(
> UnixException.java:102)
> >>  at sun.nio.fs.UnixException.rethrowAsIOException(
> UnixException.java:107)
> >>  at sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(
> UnixFileAttributeViews.java:55)
> >>
> >>
>
>


Re: knowing which fields were successfully hit

2017-05-16 Thread Erik Hatcher
Is this the equivalent of facet.query’s?   or maybe rather, group.query?

Erik



> On May 16, 2017, at 1:16 PM, Dorian Hoxha  wrote:
> 
> Something like elasticsearch named-queries, right
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-named-queries-and-filters.html
> ?
> 
> 
> On Tue, May 16, 2017 at 7:10 PM, John Blythe  wrote:
> 
>> sorry for the confusion. as in i received results due to matches on field x
>> vs. field y.
>> 
>> i've gone w a highlighting solution for now. the fact that it requires
>> field storage isn't yet prohibitive for me, so can serve well for now. open
>> to any alternative approaches all the same
>> 
>> thanks-
>> 
>> --
>> *John Blythe*
>> Product Manager & Lead Developer
>> 
>> 251.605.3071 | j...@curvolabs.com
>> www.curvolabs.com
>> 
>> 58 Adams Ave
>> Evansville, IN 47713
>> 
>> On Tue, May 16, 2017 at 11:37 AM, David Hastings <
>> hastings.recurs...@gmail.com> wrote:
>> 
>>> what do you mean "hit?" As in the user clicked it?
>>> 
>>> On Tue, May 16, 2017 at 11:35 AM, John Blythe 
>> wrote:
>>> 
 hey all. i'm sending data out that could represent a purchased item or
>> a
 competitive alternative. when the results are returned i'm needing to
>>> know
 which of the two were hit so i can serve up the *other*.
 
 i can make a blunt instrument in the application layer to simply look
>>> for a
 match between the queried terms and the resulting fields, but the
>> problem
 of fuzzy matching and some of the special analysis being done to get
>> the
 hits will be for naught.
 
 cursory googling landed me at a similar discussion that suggested using
>>> hit
 highlighting or retrieving the debuggers explain data to sort through.
 
 is there another, more efficient means or are these the two tools in
>> the
 toolbox?
 
 thanks!
 
>>> 
>> 



Re: Shard marked as down while its operational & SOLR-9120

2017-05-16 Thread Walter Underwood
I would upgrade to 6.5.1 before doing anything else. 6.0.0 is more than a year 
old.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On May 16, 2017, at 10:27 AM, Susheel Kumar  wrote:
> 
> Also this is for Solr-6.0.0.
> 
> On Tue, May 16, 2017 at 12:41 PM, Susheel Kumar 
> wrote:
> 
>> Hello,
>> 
>> One of the shard (out of 6 six shards and 6 replica's) we have, is being
>> shown as down in Solr Admin Cloud Panel (orange color) while the Solr
>> process is running and even can connect to it Solr Admin Cloud panel.  Each
>> shard size is around 18GB and have around 40+ million docs.
>> 
>> Also I do notice lots of below warnings as described by SOLR-9120.
>> 
>> Any idea what is going on and if it is due to SOLR-9120.
>> 
>> Thanks,
>> Susheel
>> 
>> LukeRequestHandler Error getting file length for [segments_1p6l]
>> 
>> java.nio.file.NoSuchFileException: 
>> ..._shard3_replica1/data/index.20170128034221957/segments_1p6l
>>  at 
>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>>  at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>>  at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>>  at 
>> sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
>> 
>> 



Re: knowing which fields were successfully hit

2017-05-16 Thread John Blythe
dorian - yup!
mikhail - interesting, will definitely check it out.

thanks-

-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Tue, May 16, 2017 at 1:16 PM, Dorian Hoxha 
wrote:

> Something like elasticsearch named-queries, right
> https://www.elastic.co/guide/en/elasticsearch/reference/
> current/search-request-named-queries-and-filters.html
> ?
>
>
> On Tue, May 16, 2017 at 7:10 PM, John Blythe  wrote:
>
> > sorry for the confusion. as in i received results due to matches on
> field x
> > vs. field y.
> >
> > i've gone w a highlighting solution for now. the fact that it requires
> > field storage isn't yet prohibitive for me, so can serve well for now.
> open
> > to any alternative approaches all the same
> >
> > thanks-
> >
> > --
> > *John Blythe*
> > Product Manager & Lead Developer
> >
> > 251.605.3071 | j...@curvolabs.com
> > www.curvolabs.com
> >
> > 58 Adams Ave
> > Evansville, IN 47713
> >
> > On Tue, May 16, 2017 at 11:37 AM, David Hastings <
> > hastings.recurs...@gmail.com> wrote:
> >
> > > what do you mean "hit?" As in the user clicked it?
> > >
> > > On Tue, May 16, 2017 at 11:35 AM, John Blythe 
> > wrote:
> > >
> > > > hey all. i'm sending data out that could represent a purchased item
> or
> > a
> > > > competitive alternative. when the results are returned i'm needing to
> > > know
> > > > which of the two were hit so i can serve up the *other*.
> > > >
> > > > i can make a blunt instrument in the application layer to simply look
> > > for a
> > > > match between the queried terms and the resulting fields, but the
> > problem
> > > > of fuzzy matching and some of the special analysis being done to get
> > the
> > > > hits will be for naught.
> > > >
> > > > cursory googling landed me at a similar discussion that suggested
> using
> > > hit
> > > > highlighting or retrieving the debuggers explain data to sort
> through.
> > > >
> > > > is there another, more efficient means or are these the two tools in
> > the
> > > > toolbox?
> > > >
> > > > thanks!
> > > >
> > >
> >
>


Re: knowing which fields were successfully hit

2017-05-16 Thread Mikhail Khludnev
John,
You can probably go with something like
https://issues.apache.org/jira/browse/LUCENE-7628. I even gave a talk about
this approach. But turns out it's really hard to support.

On Tue, May 16, 2017 at 8:10 PM, John Blythe  wrote:

> sorry for the confusion. as in i received results due to matches on field x
> vs. field y.
>
> i've gone w a highlighting solution for now. the fact that it requires
> field storage isn't yet prohibitive for me, so can serve well for now. open
> to any alternative approaches all the same
>
> thanks-
>
> --
> *John Blythe*
> Product Manager & Lead Developer
>
> 251.605.3071 | j...@curvolabs.com
> www.curvolabs.com
>
> 58 Adams Ave
> Evansville, IN 47713
>
> On Tue, May 16, 2017 at 11:37 AM, David Hastings <
> hastings.recurs...@gmail.com> wrote:
>
> > what do you mean "hit?" As in the user clicked it?
> >
> > On Tue, May 16, 2017 at 11:35 AM, John Blythe 
> wrote:
> >
> > > hey all. i'm sending data out that could represent a purchased item or
> a
> > > competitive alternative. when the results are returned i'm needing to
> > know
> > > which of the two were hit so i can serve up the *other*.
> > >
> > > i can make a blunt instrument in the application layer to simply look
> > for a
> > > match between the queried terms and the resulting fields, but the
> problem
> > > of fuzzy matching and some of the special analysis being done to get
> the
> > > hits will be for naught.
> > >
> > > cursory googling landed me at a similar discussion that suggested using
> > hit
> > > highlighting or retrieving the debuggers explain data to sort through.
> > >
> > > is there another, more efficient means or are these the two tools in
> the
> > > toolbox?
> > >
> > > thanks!
> > >
> >
>



-- 
Sincerely yours
Mikhail Khludnev


Re: Are there roadblocks to creating custom DocRouter implementations?

2017-05-16 Thread Dorian Hoxha
Also interested in custom/pluggable routing.

On Tue, May 16, 2017 at 4:47 PM, Erick Erickson 
wrote:

> Hmmm, would the functionality be served by just using implicit routing
> putting the logic in creating the doc and populating the route field?
> Not, perhaps, as elegant as having some kind of pluggable routing I
> grant.
>
> Best,
> Erick
>
> On Tue, May 16, 2017 at 7:31 AM, Shawn Heisey  wrote:
> > There was a question in the #solr IRC channel about creating a custom
> > document router to assign documents to shards based on geolocation data.
> >
> > Looking into this, I think I see a roadblock or two standing in the way
> > of users creating custom router implementations.
> >
> > The "routerMap" field in the DocRouter class is private, and its
> > contents are not dynamically created.  It appears that only specific
> > names (null, plain, implicit, compositeId) are added to the map.
> >
> > I'm thinking that if we make routerMap protected (or create protected
> > access methods), and put "static { }" code blocks in each implementation
> > that add themselves to the parent routerMap, it will be much easier for
> > a user to create their own implementation and have it automatically
> > available to use in a CREATE action.
> >
> > Is this worth an issue in Jira?
> >
> > Thanks,
> > Shawn
> >
>


Re: Shard marked as down while its operational & SOLR-9120

2017-05-16 Thread Susheel Kumar
Also this is for Solr-6.0.0.

On Tue, May 16, 2017 at 12:41 PM, Susheel Kumar 
wrote:

> Hello,
>
> One of the shard (out of 6 six shards and 6 replica's) we have, is being
> shown as down in Solr Admin Cloud Panel (orange color) while the Solr
> process is running and even can connect to it Solr Admin Cloud panel.  Each
> shard size is around 18GB and have around 40+ million docs.
>
> Also I do notice lots of below warnings as described by SOLR-9120.
>
> Any idea what is going on and if it is due to SOLR-9120.
>
> Thanks,
> Susheel
>
> LukeRequestHandler Error getting file length for [segments_1p6l]
>
> java.nio.file.NoSuchFileException: 
> ..._shard3_replica1/data/index.20170128034221957/segments_1p6l
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
>
>


Re: knowing which fields were successfully hit

2017-05-16 Thread Dorian Hoxha
Something like elasticsearch named-queries, right
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-named-queries-and-filters.html
?


On Tue, May 16, 2017 at 7:10 PM, John Blythe  wrote:

> sorry for the confusion. as in i received results due to matches on field x
> vs. field y.
>
> i've gone w a highlighting solution for now. the fact that it requires
> field storage isn't yet prohibitive for me, so can serve well for now. open
> to any alternative approaches all the same
>
> thanks-
>
> --
> *John Blythe*
> Product Manager & Lead Developer
>
> 251.605.3071 | j...@curvolabs.com
> www.curvolabs.com
>
> 58 Adams Ave
> Evansville, IN 47713
>
> On Tue, May 16, 2017 at 11:37 AM, David Hastings <
> hastings.recurs...@gmail.com> wrote:
>
> > what do you mean "hit?" As in the user clicked it?
> >
> > On Tue, May 16, 2017 at 11:35 AM, John Blythe 
> wrote:
> >
> > > hey all. i'm sending data out that could represent a purchased item or
> a
> > > competitive alternative. when the results are returned i'm needing to
> > know
> > > which of the two were hit so i can serve up the *other*.
> > >
> > > i can make a blunt instrument in the application layer to simply look
> > for a
> > > match between the queried terms and the resulting fields, but the
> problem
> > > of fuzzy matching and some of the special analysis being done to get
> the
> > > hits will be for naught.
> > >
> > > cursory googling landed me at a similar discussion that suggested using
> > hit
> > > highlighting or retrieving the debuggers explain data to sort through.
> > >
> > > is there another, more efficient means or are these the two tools in
> the
> > > toolbox?
> > >
> > > thanks!
> > >
> >
>


Re: knowing which fields were successfully hit

2017-05-16 Thread John Blythe
sorry for the confusion. as in i received results due to matches on field x
vs. field y.

i've gone w a highlighting solution for now. the fact that it requires
field storage isn't yet prohibitive for me, so can serve well for now. open
to any alternative approaches all the same

thanks-

-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Tue, May 16, 2017 at 11:37 AM, David Hastings <
hastings.recurs...@gmail.com> wrote:

> what do you mean "hit?" As in the user clicked it?
>
> On Tue, May 16, 2017 at 11:35 AM, John Blythe  wrote:
>
> > hey all. i'm sending data out that could represent a purchased item or a
> > competitive alternative. when the results are returned i'm needing to
> know
> > which of the two were hit so i can serve up the *other*.
> >
> > i can make a blunt instrument in the application layer to simply look
> for a
> > match between the queried terms and the resulting fields, but the problem
> > of fuzzy matching and some of the special analysis being done to get the
> > hits will be for naught.
> >
> > cursory googling landed me at a similar discussion that suggested using
> hit
> > highlighting or retrieving the debuggers explain data to sort through.
> >
> > is there another, more efficient means or are these the two tools in the
> > toolbox?
> >
> > thanks!
> >
>


Solr Carrot Clustering query with specific label in it

2017-05-16 Thread Pratik Patel
Hi,

When we do a Carrot Clustering query on a set of solr documents we get back
following type of response.



  
DDR
  
  3.9599865057283354
  
TWINX2048-3200PRO
VS1GB400C3
VDBDB1A16
  


  
iPod
  
  11.959228467119022
  
F8V7067-APL-KIT
IW-02
MA147LL/A
  





Each label(cluster) has corresponding set of documents. The question is, is
it possible to make another Carrot Clustering query with specific label in
it so as to only get back documents corresponding to that label.

In my use case, I am trying to write a streaming expression where one of
the stream is documents corresponding to a label(carrot cluster) selected
by user. Hence, I can not use the data present in original response object.

I have been exploring Carrot2 documentation but I can't seem find any
option which lets you specify a label in the query. I am using solr 6.4.1
in cloud mode and clustering algorithm is
"org.carrot2.clustering.lingo.LingoClusteringAlgorithm"

Thanks,

Pratik


Possible regression in Parallel SQL in 6.5.1?

2017-05-16 Thread Timothy Potter
This SQL used to work pre-calcite:

SELECT movie_id, COUNT(*) as num_ratings, avg(rating) as aggAvg FROM
ratings GROUP BY movie_id HAVING num_ratings > 100 ORDER BY aggAvg ASC
LIMIT 10

Now I get:
Caused by: java.io.IOException: -->
http://192.168.1.4:8983/solr/ratings_shard2_replica1/:Failed to
execute sqlQuery 'SELECT movie_id, COUNT(*) as num_ratings,
avg(rating) as aggAvg FROM ratings GROUP BY movie_id HAVING
num_ratings > 100 ORDER BY aggAvg ASC LIMIT 10' against JDBC
connection 'jdbc:calcitesolr:'.
Error while executing SQL "SELECT movie_id, COUNT(*) as num_ratings,
avg(rating) as aggAvg FROM ratings GROUP BY movie_id HAVING
num_ratings > 100 ORDER BY aggAvg ASC LIMIT 10": From line 1, column
103 to line 1, column 113: Column 'num_ratings' not found in any table
at 
org.apache.solr.client.solrj.io.stream.SolrStream.read(SolrStream.java:235)
at 
com.lucidworks.spark.query.TupleStreamIterator.fetchNextTuple(TupleStreamIterator.java:82)
at 
com.lucidworks.spark.query.TupleStreamIterator.hasNext(TupleStreamIterator.java:47)
... 31 more


Re: TrieIntField vs IntPointField performance only for equality comparison (no range filtering)

2017-05-16 Thread Dorian Hoxha
Hi Shawn,

I forgot that legacy-int-fields were deprecated. Point fields it is then.

Thanks,
Dorian

On Tue, May 16, 2017 at 3:01 PM, Shawn Heisey  wrote:

> On 5/16/2017 3:33 AM, Dorian Hoxha wrote:
> > Has anyone measured which is more efficient/performant between the 2
> > intfields if we don't need to do range-checking ? (precisionStep=0)
>
> Point field support in Solr is *BRAND NEW*.  Very little information is
> available yet on the Solr implementation.  Benchmarks were done at the
> Lucene level, but I do not know what the numbers were.  If any Solr
> benchmarks were done, which I can't be sure about, I do not know where
> the results might be.
>
> Lucene had Points support long before Solr did.  The Lucene developers
> felt so strongly about the superiority of the Point implementations that
> they completely deprecated the legacy numeric field classes (which is
> what Trie classes use) early in the 6.x development cycle, slating them
> for removal in 7.0.
>
> If you wonder about backward compatibility in Solr 7.0 because the
> Lucene legacy numerics are disappearing, then you've discovered a
> dilemma that we're facing before the 7.0 release.
>
> Thanks,
> Shawn
>
>


Shard marked as down while its operational & SOLR-9120

2017-05-16 Thread Susheel Kumar
Hello,

One of the shard (out of 6 six shards and 6 replica's) we have, is being
shown as down in Solr Admin Cloud Panel (orange color) while the Solr
process is running and even can connect to it Solr Admin Cloud panel.  Each
shard size is around 18GB and have around 40+ million docs.

Also I do notice lots of below warnings as described by SOLR-9120.

Any idea what is going on and if it is due to SOLR-9120.

Thanks,
Susheel

LukeRequestHandler Error getting file length for [segments_1p6l]

java.nio.file.NoSuchFileException:
..._shard3_replica1/data/index.20170128034221957/segments_1p6l
at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at 
sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)


Re: solr /export handler - behavior during close()

2017-05-16 Thread Susmit Shukla
Hi Joel,

queries can be arbitrarily nested with AND/OR/NOT joins e.g.

(intersect(intersect(search, search), union(search, search))). If I cut off
the innermost stream with a limit, the complete intersection would not
happen at upper levels. Also would the limit stream have same effect as
using /select handler with rows parameter?
I am trying to force input stream close through reflection, just to see if
it gives performance gains.

2) would experiment with null streams. Is workers = number of replicas in
data collection a good thumb rule? is parallelstream performance upper
bounded by number of replicas?

Thanks,
Susmit

On Tue, May 16, 2017 at 5:59 AM, Joel Bernstein  wrote:

> Your approach looks OK. The single sharded worker collection is only needed
> if you were using CloudSolrStream to send the initial Streaming Expression
> to the /stream handler. You are not doing this, so you're approach is fine.
>
> Here are some thoughts on what you described:
>
> 1) If you are closing the parallel stream after the top 1000 results, then
> try wrapping the intersect in a LimitStream. This stream doesn't exist yet
> so it will be a custom stream. The LimitStream can return the EOF tuple
> after it reads N tuples. This will cause the worker nodes to close the
> underlying stream and cause the Broken Pipe exception to occur at the
> /export handler, which will stop the /export.
>
> Here is the basic approach:
>
> parallel(limit(intersect(search, search)))
>
>
> 2) It can be tricky to understand where the bottleneck lies when using the
> ParallelStream for parallel relational algebra. You can use the NullStream
> to get an understanding of why performance is not increasing when you
> increase the workers. Here is the basic approach:
>
> parallel(null(intersect(search, search)))
>
> The NullStream will eat all the tuples on the workers and return a single
> tuple with the tuple count and the time taken to run the expression. So
> you'll get one tuple from each worker. This will eliminate any bottleneck
> on tuples returning through the ParallelStream and you can focus on the
> performance of the intersect and the /export handler.
>
> Then experiment with:
>
> 1) Increasing the number of parallel workers.
> 2) Increasing the number of replicas in the data collections.
>
> And watch the timing information coming back from the NullStream tuples. If
> increasing the workers is not improving performance then the bottleneck may
> be in the /export handler. So try increasing replicas and see if that
> improves performance. Different partitions of the streams will be served by
> different replicas.
>
> If performance doesn't improve with the NullStream after increasing both
> workers and replicas then we know the bottleneck is the network.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, May 15, 2017 at 10:37 PM, Susmit Shukla 
> wrote:
>
> > Hi Joel,
> >
> > Regarding the implementation, I am wrapping the topmost TupleStream in a
> > ParallelStream and execute it on the worker cluster (one of the joined
> > cluster doubles up as worker cluster). ParallelStream does submit the
> query
> > to /stream handler.
> > for #2, for e.g. I am creating 2 CloudSolrStreams , wrapping them in
> > IntersectStream and wrapping that in ParallelStream and reading out the
> > tuples from parallel stream. close() is called on parallelStream. I do
> have
> > custom streams but that is similar to intersectStream.
> > I am on solr 6.3.1
> > The 2 solr clusters serving the join queries are having many shards.
> Worker
> > collection is also multi sharded and is one from the main clusters, so do
> > you imply I should be using a single sharded "worker" collection? Would
> the
> > joins execute faster?
> > On a side note, increasing the workers beyond 1 was not improving the
> > execution times but was degrading if number was 3 and above. That is
> > counter intuitive since the joins are huge and putting more workers
> should
> > have improved the performance.
> >
> > Thanks,
> > Susmit
> >
> >
> > On Mon, May 15, 2017 at 6:47 AM, Joel Bernstein 
> > wrote:
> >
> > > Ok please do report any issues you run into. This is quite a good bug
> > > report.
> > >
> > > I reviewed the code and I believe I see the problem. The problem seems
> to
> > > be that output code from the /stream handler is not properly accounting
> > for
> > > client disconnects and closing the underlying stream. What I see in the
> > > code is that exceptions coming from read() in the stream do
> automatically
> > > close the underlying stream. But exceptions from the writing of the
> > stream
> > > do not close the stream. This needs to be fixed.
> > >
> > > A few questions about your streaming implementation:
> > >
> > > 1) Are you sending requests to the /stream handler? Or are you
> embedding
> > > CloudSolrStream in your application and bypassing the /stream handler?
> > >
> > > 2) If you're 

RE: Solr Index issue on string type while querying

2017-05-16 Thread Matt Kuiper
Your problem statement is not quite clear, however I will make a guess.

Assuming your problem is that when you remove the '>' sign from your query term 
you receive zero results, then this is actually expected behavior for field 
types that are of type string.  When searching against string fields you need 
to match the whole field value exactly.  So the '>' is needed to get a match.  
Recommend redefining or adding corresponding fields as type text_general.   
This type is tokenized and will allow for the match you are looking for.

Matt

-Original Message-
From: Padmanabhan V [mailto:padmanabhan.venkitachalapa...@gmail.com] 
Sent: Tuesday, May 16, 2017 9:33 AM
To: solr-user@lucene.apache.org
Subject: Solr Index issue on string type while querying

Hello Solr Geeks,

Am looking for some helping hands to proceed on an issue am facing now.
Here given below one record from the prepared index. i could query the fields 
without greater than symbol. but when i did query for widthSquareTube_string_mv 
& heightSquareTube_string_mv. It is not returning any result, thought there are 
records which has some values tagged similar like below. These two fields are 
dynamicFields and are of fied type:


*string.*


*Given below the query executed through solr console at Query area1.
*heightSquareTube_string_mv:>
90 - 100 mm

&

2. heightSquareTube_string_mv:"> 90 - 100 mm"


{
"indexOperationId_long": 379908,
"id": "Online/10004003x1500",
"pk": 2558081,
"wallThickessTubeSquare_string_mv": [
"3 - 5.99 mm"
],
"widthSquareTube_string_mv": [
"> 30 - 40 mm"
],
"heightSquareTube_string_mv": [
"> 90 - 100 mm"
],
"length_string_mv": [
"1000 - 1999 mm"
],
"allCategories_string_mv": [
"AL_ST",
"100",
"F000",
"F060",
"AL",
"F061"
],
"category_string_mv": [
"AL_ST",
"100",
"F000",
"F060",
"AL",
"F061"
],
"inStockFlag_boolean": true,
"baseProduct_string": "ST606010004003",
"name_text_de_de": "100 x 40 x 3 x 1500 mm",
"name_sortable_de_de_sortabletext": "100 x 40 x 3 x 1500 mm",
"autosuggest": [
"ST606010004003x1500"
],
"_version_": 1567229255468712000
}


Best Regards,
Padmanabhan.V


Re: resource governance

2017-05-16 Thread Joel Bernstein
Streaming Expressions has this capability through it's parallel executor
and priority streams. But this would mean switching to a queue based
mechanism for both indexing a querying. Here is a blog describing how this
works:

http://joelsolr.blogspot.com/2017/01/deploying-solrs-new-parallel-executor.html

This is designed to support a "function as service model" of execution.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, May 16, 2017 at 11:09 AM, Avi Steiner  wrote:

>
> Hello
>
> I have a question about resource governance / quality of service.
> Is there any way to configure priorities of queries? Can I set some select
> queries, for example, higher priority than index (add/update) ?
>
> Thanks
>
> Avi
>
>
>
> 
> This email and any attachments thereto may contain private, confidential,
> and privileged material for the sole use of the intended recipient. Any
> review, copying, or distribution of this email (or any attachments thereto)
> by others is strictly prohibited. If you are not the intended recipient,
> please contact the sender immediately and permanently delete the original
> and any copies of this email and any attachments thereto.
>


Re: Configure query parser to handle field name case-insensitive

2017-05-16 Thread Erick Erickson
Rick:

Easiest to _code_. There isn't any. And if you just toss the problem
over the fence to support then it's not a problem ;)

Best,
Erick

On Tue, May 16, 2017 at 9:04 AM, Rick Leir  wrote:
> Björn
> You are not serious about (1) are you? Yikes!! Easiest for you if you do not 
> need to sit at the helpdesk. Easiest if the users stop using the system.
>
> My guess is that (2) is easiest if you have text entry boxes for each field, 
> and the user need not type in the field name. Cheers -- Rick
>
> On May 16, 2017 10:56:37 AM EDT, Erick Erickson  
> wrote:
>>Yeah, your options (5) and (6) are well... definitely at the
>>bottom of _my_ list, I understand you included them for
>>completeness...
>>
>>as for (4) Oh, my aching head. Parsers give me a headache ;)
>>
>>Yes, (1) is the easiest.(2) and (3) mostly depend on where you're most
>>comfortable coding. If you intercept the query on the backend in Java
>>_very_ early in the process you are working with essentially the same
>>string as you would in JS on the front end so it's a tossup. You might
>>just be more comfortable writing JS on the client rather than Java and
>>getting it hooked in to Solr, really your choice.
>>
>>Best,
>>Erick
>>
>>2017-05-16 0:59 GMT-07:00 Peemöller, Björn
>>:
>>> Hi all,
>>>
>>> thank you for your replies!
>>>
>>> We do not directly expose the Solr API, but provide an endpoint in
>>our backend which acts as a proxy for a specific search handler. One
>>requirement in our application is to search for people using various
>>properties, e.g., first name, last name, description, date of birth.
>>For simplicity reasons, we want to provide only a single search input
>>and allow the user to narrow down its results using the query syntax,
>>e.g. "firstname:John".
>>>
>>> Based on your suggestions, I can see the following solutions for our
>>problem:
>>>
>>> 1) Train the users to denote fieldnames in lowercase - they need to
>>know the exact field names anyway.
>>> 2) Modify (i.e., lowercase) the search term in the backend (Java)
>>> 3) Modify (i.e., lowercase) the search term in the frontend (JS)
>>> 4) Modify the Solr query parser (provide a customized implementation)
>>> 5) Define *a lot* of field aliases
>>> 6) Define *a lot* of copy fields
>>>
>>> I assess these solutions to be ordered in decreasing quality, so I
>>think that we will start to improve with more user guidance.
>>>
>>> Thanks to all,
>>> Björn
>>>
>>> -Ursprüngliche Nachricht-
>>> Von: Rick Leir [mailto:rl...@leirtech.com]
>>> Gesendet: Montag, 15. Mai 2017 18:33
>>> An: solr-user@lucene.apache.org
>>> Betreff: Re: Configure query parser to handle field name
>>case-insensitive
>>>
>>> Björn
>>> Yes, at query time you could downcase the names. Not in Solr, but in
>>the front-end web app you have in front of Solr. It needs to be a bit
>>smart, so it can downcase the field names but not the query terms.
>>>
>>> I assume you do not expose Solr directly to the web.
>>>
>>> This downcasing might be easier to do in Javascript in the browser.
>>Particularly if the user never has to enter a field name.
>>>
>>> Another solution, this time inside Solr, is to provide copyfields for
>>ID, Id, and maybe iD. And for other fields that you mention in queries.
>>This will consume some memory, particularly for saved fields, so I
>>hesitate to even suggest it. Cheers - Rick
>>>
>>>
>>> On May 15, 2017 9:16:59 AM EDT, "Peemöller, Björn"
>> wrote:
Hi Rick,

thank you for your reply! I really meant field *names*, since our
values are already processed by a lower case filter (both index and
query). However, our users are confused because they can search for
"id:1" but not for "ID:1". Furthermore, we employ the EDisMax query
parser, so then even get no error message.

Therefore, I thought it may be sufficient to map all field names to
lower case at the query level so that I do not have to introduce
additional fields.

Regards,
Björn

-Ursprüngliche Nachricht-
Von: Rick Leir [mailto:rl...@leirtech.com]
Gesendet: Montag, 15. Mai 2017 13:48
An: solr-user@lucene.apache.org
Betreff: Re: Configure query parser to handle field name
case-insensitive

Björn
Field names or values? I assume values. Your analysis chain in
schema.xml probably downcases chars, if not then that could be your
problem.

Field _name_? Then you might have to copyfield the field to a new
>>field
with the desired case. Avoid doing that if you can. Cheers -- Rick

On May 15, 2017 5:48:09 AM EDT, "Peemöller, Björn"
 wrote:
>Hi all,
>
>I'm fairly new at using Solr and I need to configure our instance to
>accept field names in both uppercase and lowercase (they are defined
as
>lowercase in our configuration). Is there a 

Solr Index issue on string type while querying

2017-05-16 Thread Padmanabhan V
Hello Solr Geeks,

Am looking for some helping hands to proceed on an issue am facing now.
Here given below one record from the prepared index. i could query the
fields without greater than symbol. but when i did query for
widthSquareTube_string_mv & heightSquareTube_string_mv. It is not returning
any result, thought there are records which has some values tagged similar
like below. These two fields are dynamicFields and are of fied type:


*string.*


*Given below the query executed through solr console at Query area1.
*heightSquareTube_string_mv:>
90 - 100 mm

&

2. heightSquareTube_string_mv:"> 90 - 100 mm"


{
"indexOperationId_long": 379908,
"id": "Online/10004003x1500",
"pk": 2558081,
"wallThickessTubeSquare_string_mv": [
"3 - 5.99 mm"
],
"widthSquareTube_string_mv": [
"> 30 - 40 mm"
],
"heightSquareTube_string_mv": [
"> 90 - 100 mm"
],
"length_string_mv": [
"1000 - 1999 mm"
],
"allCategories_string_mv": [
"AL_ST",
"100",
"F000",
"F060",
"AL",
"F061"
],
"category_string_mv": [
"AL_ST",
"100",
"F000",
"F060",
"AL",
"F061"
],
"inStockFlag_boolean": true,
"baseProduct_string": "ST606010004003",
"name_text_de_de": "100 x 40 x 3 x 1500 mm",
"name_sortable_de_de_sortabletext": "100 x 40 x 3 x 1500 mm",
"autosuggest": [
"ST606010004003x1500"
],
"_version_": 1567229255468712000
}


Best Regards,
Padmanabhan.V


Re: Configure query parser to handle field name case-insensitive

2017-05-16 Thread Rick Leir
Björn
You are not serious about (1) are you? Yikes!! Easiest for you if you do not 
need to sit at the helpdesk. Easiest if the users stop using the system. 

My guess is that (2) is easiest if you have text entry boxes for each field, 
and the user need not type in the field name. Cheers -- Rick

On May 16, 2017 10:56:37 AM EDT, Erick Erickson  wrote:
>Yeah, your options (5) and (6) are well... definitely at the
>bottom of _my_ list, I understand you included them for
>completeness...
>
>as for (4) Oh, my aching head. Parsers give me a headache ;)
>
>Yes, (1) is the easiest.(2) and (3) mostly depend on where you're most
>comfortable coding. If you intercept the query on the backend in Java
>_very_ early in the process you are working with essentially the same
>string as you would in JS on the front end so it's a tossup. You might
>just be more comfortable writing JS on the client rather than Java and
>getting it hooked in to Solr, really your choice.
>
>Best,
>Erick
>
>2017-05-16 0:59 GMT-07:00 Peemöller, Björn
>:
>> Hi all,
>>
>> thank you for your replies!
>>
>> We do not directly expose the Solr API, but provide an endpoint in
>our backend which acts as a proxy for a specific search handler. One
>requirement in our application is to search for people using various
>properties, e.g., first name, last name, description, date of birth.
>For simplicity reasons, we want to provide only a single search input
>and allow the user to narrow down its results using the query syntax,
>e.g. "firstname:John".
>>
>> Based on your suggestions, I can see the following solutions for our
>problem:
>>
>> 1) Train the users to denote fieldnames in lowercase - they need to
>know the exact field names anyway.
>> 2) Modify (i.e., lowercase) the search term in the backend (Java)
>> 3) Modify (i.e., lowercase) the search term in the frontend (JS)
>> 4) Modify the Solr query parser (provide a customized implementation)
>> 5) Define *a lot* of field aliases
>> 6) Define *a lot* of copy fields
>>
>> I assess these solutions to be ordered in decreasing quality, so I
>think that we will start to improve with more user guidance.
>>
>> Thanks to all,
>> Björn
>>
>> -Ursprüngliche Nachricht-
>> Von: Rick Leir [mailto:rl...@leirtech.com]
>> Gesendet: Montag, 15. Mai 2017 18:33
>> An: solr-user@lucene.apache.org
>> Betreff: Re: Configure query parser to handle field name
>case-insensitive
>>
>> Björn
>> Yes, at query time you could downcase the names. Not in Solr, but in
>the front-end web app you have in front of Solr. It needs to be a bit
>smart, so it can downcase the field names but not the query terms.
>>
>> I assume you do not expose Solr directly to the web.
>>
>> This downcasing might be easier to do in Javascript in the browser.
>Particularly if the user never has to enter a field name.
>>
>> Another solution, this time inside Solr, is to provide copyfields for
>ID, Id, and maybe iD. And for other fields that you mention in queries.
>This will consume some memory, particularly for saved fields, so I
>hesitate to even suggest it. Cheers - Rick
>>
>>
>> On May 15, 2017 9:16:59 AM EDT, "Peemöller, Björn"
> wrote:
>>>Hi Rick,
>>>
>>>thank you for your reply! I really meant field *names*, since our
>>>values are already processed by a lower case filter (both index and
>>>query). However, our users are confused because they can search for
>>>"id:1" but not for "ID:1". Furthermore, we employ the EDisMax query
>>>parser, so then even get no error message.
>>>
>>>Therefore, I thought it may be sufficient to map all field names to
>>>lower case at the query level so that I do not have to introduce
>>>additional fields.
>>>
>>>Regards,
>>>Björn
>>>
>>>-Ursprüngliche Nachricht-
>>>Von: Rick Leir [mailto:rl...@leirtech.com]
>>>Gesendet: Montag, 15. Mai 2017 13:48
>>>An: solr-user@lucene.apache.org
>>>Betreff: Re: Configure query parser to handle field name
>>>case-insensitive
>>>
>>>Björn
>>>Field names or values? I assume values. Your analysis chain in
>>>schema.xml probably downcases chars, if not then that could be your
>>>problem.
>>>
>>>Field _name_? Then you might have to copyfield the field to a new
>field
>>>with the desired case. Avoid doing that if you can. Cheers -- Rick
>>>
>>>On May 15, 2017 5:48:09 AM EDT, "Peemöller, Björn"
>>> wrote:
Hi all,

I'm fairly new at using Solr and I need to configure our instance to
accept field names in both uppercase and lowercase (they are defined
>>>as
lowercase in our configuration). Is there a simple way to achieve
>>>this?

Thanks in advance,
Björn

Björn Peemöller
IT & IT Operations

BERENBERG
Joh. Berenberg, Gossler & Co. KG
Neuer Jungfernstieg 20
20354 Hamburg

Telefon +49 40 350 60-8548
Telefax +49 40 350 60-900
E-Mail

Re: does suggester's contextField support TrieDate data type?

2017-05-16 Thread arik
Yes your assumptions are correct.  I have built the suggester and it works
fine without the cfq.

These queries work:
   /autocomplete?suggest.q=mexican=json
   /select?indent=on=+isoDateTime:[2016-05-16T0:0:0.0Z%20TO%20*]=json

This one does not:
  
/autocomplete?suggest.q=mexican=[2016-05-16T0:0:0.0Z%20TO%20*]=json

Is it maybe because my primary suggestion field type doesn't match the
contextField type?  Or do I maybe need a secondary definition like the
suggestAnalyzerFieldType to cover the field type of the contextField (if so
what's the syntax to do that)?

*Relevant configs*

*solrconfig.xml:*



  mysuggester

  AnalyzingInfixLookupFactory
  textSuggest
  false
  isoDateTime

  DocumentDictionaryFactory
  suggestedcompletions  

  false
  false


  


  
true
mysuggester
10
  
  
suggest
  



*schema.xml*


  



  


  
  
  

  





--
View this message in context: 
http://lucene.472066.n3.nabble.com/does-suggester-s-contextField-support-TrieDate-data-type-tp4335208p4335334.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: knowing which fields were successfully hit

2017-05-16 Thread David Hastings
what do you mean "hit?" As in the user clicked it?

On Tue, May 16, 2017 at 11:35 AM, John Blythe  wrote:

> hey all. i'm sending data out that could represent a purchased item or a
> competitive alternative. when the results are returned i'm needing to know
> which of the two were hit so i can serve up the *other*.
>
> i can make a blunt instrument in the application layer to simply look for a
> match between the queried terms and the resulting fields, but the problem
> of fuzzy matching and some of the special analysis being done to get the
> hits will be for naught.
>
> cursory googling landed me at a similar discussion that suggested using hit
> highlighting or retrieving the debuggers explain data to sort through.
>
> is there another, more efficient means or are these the two tools in the
> toolbox?
>
> thanks!
>


knowing which fields were successfully hit

2017-05-16 Thread John Blythe
hey all. i'm sending data out that could represent a purchased item or a
competitive alternative. when the results are returned i'm needing to know
which of the two were hit so i can serve up the *other*.

i can make a blunt instrument in the application layer to simply look for a
match between the queried terms and the resulting fields, but the problem
of fuzzy matching and some of the special analysis being done to get the
hits will be for naught.

cursory googling landed me at a similar discussion that suggested using hit
highlighting or retrieving the debuggers explain data to sort through.

is there another, more efficient means or are these the two tools in the
toolbox?

thanks!


Help reviewing json facet api query

2017-05-16 Thread Mandar Deshpande
Hi,

Could anyone please help reviewing the below json facet api query. 

After updating to json facet api we are getting strange results in which the
count
(uniqueCount) for the first facet bucket is correct and the remaining are
either 0 or incorrect.
 

Below are the both the solr query urls:


Old query (with correct results)

http://wwwdev.ebi.ac.uk/pdbe/search/pdb/select?group=true=pdb_id
=true=true=true=0=100
.mincount=1=count=map=detector=ex
perimental_method=deposition_year_year.facet.range.
start=1970_year.facet.range.end=2050_year.facet.ra
nge.gap=5_year.facet.range.other=between_year.face
t.range.include=upper=text%3Ahemoglobin=json=true

New (json api implementation)

http://wwwdev.ebi.ac.uk/pdbe/search/pdb/select?json={query:%20%22text:hemogl
obin%22,%20facet:%20{%20detector:%20{type:%20%27terms%27,%20field:%20%27dete
ctor%27,%20limit:%20100,%20mincount:%201,%20sort:count,%20facet:%20{%20uniqu
eCount:%20%22unique(pdb_id)%22%20}%20},%20experimental_method:%20{type:%20%2
7terms%27,%20field:%20%27experimental_method%27,%20limit:%20100,%20mincount:
%201,%20sort:count,%20facet:%20{%20uniqueCount:%20%22unique(pdb_id)%22%20}%2
0},%20deposition_year:%20{%20type:%20%27range%27,%20field:%20%27deposition_y
ear%27,%20start:%201970,%20end:%202050,%20gap:%205,%20other:%20%27between%27
,%20include%20:%20%27upper%27,%20limit:%20100,%20mincount:%201,%20sort:count
,%20facet:%20{%20uniqueCount:%20%22unique(pdb_id)%22%20}%20}%20}%20}=0&
wt=json=true


Am I missing something in the new facet json api query or it is a solr issue
?

Thanks and regards,
Mandar

 




resource governance

2017-05-16 Thread Avi Steiner

Hello

I have a question about resource governance / quality of service.
Is there any way to configure priorities of queries? Can I set some select 
queries, for example, higher priority than index (add/update) ?

Thanks

Avi




This email and any attachments thereto may contain private, confidential, and 
privileged material for the sole use of the intended recipient. Any review, 
copying, or distribution of this email (or any attachments thereto) by others 
is strictly prohibited. If you are not the intended recipient, please contact 
the sender immediately and permanently delete the original and any copies of 
this email and any attachments thereto.


Re: Configure query parser to handle field name case-insensitive

2017-05-16 Thread Erick Erickson
Yeah, your options (5) and (6) are well... definitely at the
bottom of _my_ list, I understand you included them for
completeness...

as for (4) Oh, my aching head. Parsers give me a headache ;)

Yes, (1) is the easiest.(2) and (3) mostly depend on where you're most
comfortable coding. If you intercept the query on the backend in Java
_very_ early in the process you are working with essentially the same
string as you would in JS on the front end so it's a tossup. You might
just be more comfortable writing JS on the client rather than Java and
getting it hooked in to Solr, really your choice.

Best,
Erick

2017-05-16 0:59 GMT-07:00 Peemöller, Björn :
> Hi all,
>
> thank you for your replies!
>
> We do not directly expose the Solr API, but provide an endpoint in our 
> backend which acts as a proxy for a specific search handler. One requirement 
> in our application is to search for people using various properties, e.g., 
> first name, last name, description, date of birth. For simplicity reasons, we 
> want to provide only a single search input and allow the user to narrow down 
> its results using the query syntax, e.g. "firstname:John".
>
> Based on your suggestions, I can see the following solutions for our problem:
>
> 1) Train the users to denote fieldnames in lowercase - they need to know the 
> exact field names anyway.
> 2) Modify (i.e., lowercase) the search term in the backend (Java)
> 3) Modify (i.e., lowercase) the search term in the frontend (JS)
> 4) Modify the Solr query parser (provide a customized implementation)
> 5) Define *a lot* of field aliases
> 6) Define *a lot* of copy fields
>
> I assess these solutions to be ordered in decreasing quality, so I think that 
> we will start to improve with more user guidance.
>
> Thanks to all,
> Björn
>
> -Ursprüngliche Nachricht-
> Von: Rick Leir [mailto:rl...@leirtech.com]
> Gesendet: Montag, 15. Mai 2017 18:33
> An: solr-user@lucene.apache.org
> Betreff: Re: Configure query parser to handle field name case-insensitive
>
> Björn
> Yes, at query time you could downcase the names. Not in Solr, but in the 
> front-end web app you have in front of Solr. It needs to be a bit smart, so 
> it can downcase the field names but not the query terms.
>
> I assume you do not expose Solr directly to the web.
>
> This downcasing might be easier to do in Javascript in the browser. 
> Particularly if the user never has to enter a field name.
>
> Another solution, this time inside Solr, is to provide copyfields for ID, Id, 
> and maybe iD. And for other fields that you mention in queries. This will 
> consume some memory, particularly for saved fields, so I hesitate to even 
> suggest it. Cheers - Rick
>
>
> On May 15, 2017 9:16:59 AM EDT, "Peemöller, Björn" 
>  wrote:
>>Hi Rick,
>>
>>thank you for your reply! I really meant field *names*, since our
>>values are already processed by a lower case filter (both index and
>>query). However, our users are confused because they can search for
>>"id:1" but not for "ID:1". Furthermore, we employ the EDisMax query
>>parser, so then even get no error message.
>>
>>Therefore, I thought it may be sufficient to map all field names to
>>lower case at the query level so that I do not have to introduce
>>additional fields.
>>
>>Regards,
>>Björn
>>
>>-Ursprüngliche Nachricht-
>>Von: Rick Leir [mailto:rl...@leirtech.com]
>>Gesendet: Montag, 15. Mai 2017 13:48
>>An: solr-user@lucene.apache.org
>>Betreff: Re: Configure query parser to handle field name
>>case-insensitive
>>
>>Björn
>>Field names or values? I assume values. Your analysis chain in
>>schema.xml probably downcases chars, if not then that could be your
>>problem.
>>
>>Field _name_? Then you might have to copyfield the field to a new field
>>with the desired case. Avoid doing that if you can. Cheers -- Rick
>>
>>On May 15, 2017 5:48:09 AM EDT, "Peemöller, Björn"
>> wrote:
>>>Hi all,
>>>
>>>I'm fairly new at using Solr and I need to configure our instance to
>>>accept field names in both uppercase and lowercase (they are defined
>>as
>>>lowercase in our configuration). Is there a simple way to achieve
>>this?
>>>
>>>Thanks in advance,
>>>Björn
>>>
>>>Björn Peemöller
>>>IT & IT Operations
>>>
>>>BERENBERG
>>>Joh. Berenberg, Gossler & Co. KG
>>>Neuer Jungfernstieg 20
>>>20354 Hamburg
>>>
>>>Telefon +49 40 350 60-8548
>>>Telefax +49 40 350 60-900
>>>E-Mail
>>>bjoern.peemoel...@berenberg.de
>>>www.berenberg.de
>>>
>>>Sitz: Hamburg - Amtsgericht Hamburg HRA 42659
>>>
>>>
>>>Diese Nachricht einschliesslich etwa beigefuegter Anhaenge ist
>>>vertraulich und kann dem Bank- und Datengeheimnis unterliegen oder
>>>sonst rechtlich geschuetzte Daten und Informationen enthalten. Wenn
>>Sie
>>>nicht der richtige Adressat sind oder diese Nachricht irrtuemlich
>>>erhalten haben, informieren Sie bitte sofort den 

Re: Are there roadblocks to creating custom DocRouter implementations?

2017-05-16 Thread Erick Erickson
Hmmm, would the functionality be served by just using implicit routing
putting the logic in creating the doc and populating the route field?
Not, perhaps, as elegant as having some kind of pluggable routing I
grant.

Best,
Erick

On Tue, May 16, 2017 at 7:31 AM, Shawn Heisey  wrote:
> There was a question in the #solr IRC channel about creating a custom
> document router to assign documents to shards based on geolocation data.
>
> Looking into this, I think I see a roadblock or two standing in the way
> of users creating custom router implementations.
>
> The "routerMap" field in the DocRouter class is private, and its
> contents are not dynamically created.  It appears that only specific
> names (null, plain, implicit, compositeId) are added to the map.
>
> I'm thinking that if we make routerMap protected (or create protected
> access methods), and put "static { }" code blocks in each implementation
> that add themselves to the parent routerMap, it will be much easier for
> a user to create their own implementation and have it automatically
> available to use in a CREATE action.
>
> Is this worth an issue in Jira?
>
> Thanks,
> Shawn
>


Re: Terms not being indexed; not sure why

2017-05-16 Thread Erick Erickson
What David said. There are a very few cases where changing your schema
does _not_ require that you blow away your index and re-index from
scratch. I always blow the index away when I make any changes if at
all possible.

Also note that when you quote, you are asking for _phrases_, so
searching for title:"University of Wisconsin" requires that these
words appear next to each other. If you just want all the words to be
in a field, try title:(University of Wisconsin)

But as David says, something's very weird. Note that when you search
against the _text_ field you get a parsed query like:

"parsedquery_toString": "_text_:university _text_:of _text_:wisconsin",

whereas against title "University" is not lowercased.

By the way, the admin UI>>select core or collection>>analysis page is
invaluable here. It shows you exactly what transformations happen at
index and query time.

And the "schema browser" link allows you to see the actual terms in your index.

Then there's the TermsComponent here:
https://cwiki.apache.org/confluence/display/solr/The+Terms+Component
which allows you to see what's actually in your index (i.e. the
analyzed values).

Together these tools make Solr a little less of a "black box". And if
you really want to examine the index, get a copy of Luke, although I'm
not sure how up to date it is.

Good Luck!
Erick

On Tue, May 16, 2017 at 7:30 AM, David Hastings
 wrote:
> something in text_general isnt actually doing what its supposed to be
> doing, is it possible you indexed it as a string then changed the type for
> the field after the fact?
>
> on my title i have something like such:
>
> "rawquerystring": "title:University",
>"querystring": "title:University",
>"parsedquery": "title:university",
>"parsedquery_toString": "title:university",
>
>
> maybe try re-indexing the document and see if that does it.  even your
> LowerCaseFilterFactoryisnt converting it to university.
>
>
> On Tue, May 16, 2017 at 10:10 AM, Chip Calhoun  wrote:
>
>> Now that I know what to look for in the debugQuery it's becoming more
>> clear. Yes, it's just searching "text" unless I specify otherwise. More
>> importantly, title searches don't work at all unless I search on the
>> complete title; words used in the title have no effect. Clearly I'm doing
>> something wrong.
>>
>> I've included a link to my schema, and links to some representative
>> queries.
>>
>> Solr schema:
>> https://drive.google.com/open?id=0Bz0ceORxyQb9bkFtTERMZTdaWEE
>>
>> q: University of Wisconsin (no quotes)
>> https://drive.google.com/open?id=0Bz0ceORxyQb9UlhOMmZtbUxQdTA
>>
>> q: "University of Wisconsin" (in quotes)
>> https://drive.google.com/open?id=0Bz0ceORxyQb9NWR0NjZFdmM5WEU
>>
>> q: title:University
>> https://drive.google.com/open?id=0Bz0ceORxyQb9X2pfM25CbFNNTmc
>>
>> q: title:"University of Wisconsin--Madison. Department of Physics |
>> Scientific Biographies"
>> https://drive.google.com/open?id=0Bz0ceORxyQb9dVkwNFRDZlRBWWs
>>
>> Chip
>>
>> 
>> From: Erick Erickson [erickerick...@gmail.com]
>> Sent: Monday, May 15, 2017 5:37 PM
>> To: solr-user
>> Subject: Re: Terms not being indexed; not sure why
>>
>> Most likely you're searching against your default field, often "text".
>> A frequent problem is that you enter a search like
>>
>> q=content:University of Wisconsin
>>
>> and the search is actually
>>
>> q=content:university text:of text:wisconsin
>>
>> Try your debug=query with the original maybe?
>>
>> In fact, somehow you're getting lucky, I'm not sure you you're even
>> getting a hit when you search by title since the parsed query is:
>>
>> "parsedquery": "_text_:21610003",
>> "parsedquery_toString": "_text_:21610003",
>>
>> i.e you're searching against your _text_ field not your filename field.
>>
>> So my guess is that you're throwing everything in a _text_ field and
>> always searching against that. Since it's such a "bag of words", it's
>> just happening to score your query below the top 10.
>>
>> You'll also want to be boosting the title field, perhaps use edismax.
>>
>> Best,
>> Erick
>>
>> On Mon, May 15, 2017 at 1:17 PM, Susheel Kumar 
>> wrote:
>> > Can you upload your schema to some site like dropbox etc. to look and
>> send
>> > the query which you are using and returning no results?
>> >
>> > Thanks,
>> > Susheel
>> >
>> > On Mon, May 15, 2017 at 1:46 PM, Chip Calhoun  wrote:
>> >
>> >> I'm creating a new Solr core to crawl a local site. We have a page on
>> >> "University of Wisconsin--Madison", but a search for that name in any
>> form
>> >> won't appear within the first 10 results. the page is indexed, and I can
>> >> search for it by filename. Termfreq(title) shows 0s for search terms
>> which
>> >> are very clearly in the title. What would cause this?
>> >>
>> >> In case it's useful, I'm pasting my results for a search on the
>> filename,
>> >> with termfreq arguments 

Re: Stop solr instance

2017-05-16 Thread Shawn Heisey
On 5/15/2017 7:26 AM, Mithu Tokder wrote:
> Now my question is that as the solr instances are running in three
> machines so is it required to configure same value for STOP.PORT and
> STOP.KEY in start and stop script of three machines or i can use
> separate value for them.

The port and key can only be used via localhost, because that's the only
interface where the stop port listens.  Make them different if you want
to, or make them the same ... they can't be used outside each individual
server.

Thanks,
Shawn



Re: setup solrcloud from scratch vie web-ui

2017-05-16 Thread Shawn Heisey
On 5/12/2017 8:49 AM, Thomas Porschberg wrote:
> ERROR: Failed to create collection 'cat' due to: 
> {127.0.1.1:8983_solr=org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
>  from server at http://127.0.1.1:8983/solr: Error CREATEing SolrCore 
> 'cat_shard1_replica1': Unable to create core [cat_shard1_replica1] Caused by: 
> Lock held by this virtual machine: 
> /home/pberg/solr_new2/solr-6.5.1/server/data/bestand/index/write.lock}

The same Solr instance is already holding the lock on the index at
/home/pberg/solr_new2/solr-6.5.1/server/data/bestand/index.  This means
that Solr already has a core using that index directory.

If the write.lock were present but wasn't being held by the same
instance, then the message would have said it was held by another program.

This sounds like you are manually manipulating settings like dataDir. 
When you start the server from an extracted download (not as a service)
and haven't messed with any configurations, the index directory for a
single-shard single-replica "cat" collection should be something like
the following, and should not be overridden unless you understand
*EXACTLY* how SolrCloud functions and have a REALLY good reason for
changing it:

/home/pberg/solr_new2/solr-6.5.1/server/solr/cat_shard1_replica1/data/index

On the "Sorry, no dataimport-handler defined!" problem, this is
happening because the solrconfig.xml file being used by the collection
does not have any configuration for the dataimport handler.  It's not
enough to add a DIH config file, solrconfig.xml must have a dataimport
handler defined that references the DIH config file.

Thanks,
Shawn



Are there roadblocks to creating custom DocRouter implementations?

2017-05-16 Thread Shawn Heisey
There was a question in the #solr IRC channel about creating a custom
document router to assign documents to shards based on geolocation data.

Looking into this, I think I see a roadblock or two standing in the way
of users creating custom router implementations.

The "routerMap" field in the DocRouter class is private, and its
contents are not dynamically created.  It appears that only specific
names (null, plain, implicit, compositeId) are added to the map.

I'm thinking that if we make routerMap protected (or create protected
access methods), and put "static { }" code blocks in each implementation
that add themselves to the parent routerMap, it will be much easier for
a user to create their own implementation and have it automatically
available to use in a CREATE action.

Is this worth an issue in Jira?

Thanks,
Shawn



Re: Terms not being indexed; not sure why

2017-05-16 Thread David Hastings
something in text_general isnt actually doing what its supposed to be
doing, is it possible you indexed it as a string then changed the type for
the field after the fact?

on my title i have something like such:

"rawquerystring": "title:University",
   "querystring": "title:University",
   "parsedquery": "title:university",
   "parsedquery_toString": "title:university",


maybe try re-indexing the document and see if that does it.  even your
LowerCaseFilterFactoryisnt converting it to university.


On Tue, May 16, 2017 at 10:10 AM, Chip Calhoun  wrote:

> Now that I know what to look for in the debugQuery it's becoming more
> clear. Yes, it's just searching "text" unless I specify otherwise. More
> importantly, title searches don't work at all unless I search on the
> complete title; words used in the title have no effect. Clearly I'm doing
> something wrong.
>
> I've included a link to my schema, and links to some representative
> queries.
>
> Solr schema:
> https://drive.google.com/open?id=0Bz0ceORxyQb9bkFtTERMZTdaWEE
>
> q: University of Wisconsin (no quotes)
> https://drive.google.com/open?id=0Bz0ceORxyQb9UlhOMmZtbUxQdTA
>
> q: "University of Wisconsin" (in quotes)
> https://drive.google.com/open?id=0Bz0ceORxyQb9NWR0NjZFdmM5WEU
>
> q: title:University
> https://drive.google.com/open?id=0Bz0ceORxyQb9X2pfM25CbFNNTmc
>
> q: title:"University of Wisconsin--Madison. Department of Physics |
> Scientific Biographies"
> https://drive.google.com/open?id=0Bz0ceORxyQb9dVkwNFRDZlRBWWs
>
> Chip
>
> 
> From: Erick Erickson [erickerick...@gmail.com]
> Sent: Monday, May 15, 2017 5:37 PM
> To: solr-user
> Subject: Re: Terms not being indexed; not sure why
>
> Most likely you're searching against your default field, often "text".
> A frequent problem is that you enter a search like
>
> q=content:University of Wisconsin
>
> and the search is actually
>
> q=content:university text:of text:wisconsin
>
> Try your debug=query with the original maybe?
>
> In fact, somehow you're getting lucky, I'm not sure you you're even
> getting a hit when you search by title since the parsed query is:
>
> "parsedquery": "_text_:21610003",
> "parsedquery_toString": "_text_:21610003",
>
> i.e you're searching against your _text_ field not your filename field.
>
> So my guess is that you're throwing everything in a _text_ field and
> always searching against that. Since it's such a "bag of words", it's
> just happening to score your query below the top 10.
>
> You'll also want to be boosting the title field, perhaps use edismax.
>
> Best,
> Erick
>
> On Mon, May 15, 2017 at 1:17 PM, Susheel Kumar 
> wrote:
> > Can you upload your schema to some site like dropbox etc. to look and
> send
> > the query which you are using and returning no results?
> >
> > Thanks,
> > Susheel
> >
> > On Mon, May 15, 2017 at 1:46 PM, Chip Calhoun  wrote:
> >
> >> I'm creating a new Solr core to crawl a local site. We have a page on
> >> "University of Wisconsin--Madison", but a search for that name in any
> form
> >> won't appear within the first 10 results. the page is indexed, and I can
> >> search for it by filename. Termfreq(title) shows 0s for search terms
> which
> >> are very clearly in the title. What would cause this?
> >>
> >> In case it's useful, I'm pasting my results for a search on the
> filename,
> >> with termfreq arguments for the terms I'd actually like to search on.
> >>
> >> {
> >>   "responseHeader": {
> >> "status": 0,
> >> "QTime": 33,
> >> "params": {
> >>   "debugQuery": "true",
> >>   "fl": "content, id, title, url, termfreq(content,\"university\"),
> >> termfreq(content,\"wisconsin\"), termfreq(content,\"university of
> >> wisconsin\"), termfreq(content,\"university of wisconsin--madison\"),
> >> termfreq(title,\"university\"), termfreq(title,\"wisconsin\"),
> >> termfreq(title,\"university of wisconsin\"),
> termfreq(title,\"university of
> >> wisconsin--madison\"), score",
> >>   "indent": "true",
> >>   "q": "21610003",
> >>   "_": "1494864119360",
> >>   "wt": "json"
> >> }
> >>   },
> >>   "response": {
> >> "numFound": 1,
> >> "start": 0,
> >> "maxScore": 0.26968884,
> >> "docs": [
> >>   {
> >> "content": [
> >>   "University of Wisconsin--Madison. Department of Physics |
> >> Scientific Biographies Menu ☰ Back to Top Home History Programs Niels
> Bohr
> >> Library & Archive Physics History Network Institutions Physics History
> >> Network Over 850 biographies of physicists and histories of institutions
> >> with information pertaining to their lives, careers, and research. Tip:
> >> Search within this page by using Ctrl + F or ⌘ + F Search Our Catalogs
> >> Archives Books Collections Emilio Segré Visual Archives Digital
> Collections
> >> Oral Histories Archival Finding Aids Physics History Network
> Preservation
> >> and Support Donating 

Re: Query Regarding SOLR

2017-05-16 Thread Shawn Heisey
On 5/11/2017 12:26 PM, Deepak Mali wrote:
> if there is any way to set threshold memory to the solr indexing process.
> My computer is hung and the indexing process is killed by the OS.
>
> So , I was wondering if there is any way to set threshold memory usage to
> solr indexing process in linux environments.

Not really sure exactly what you are asking.

The members of this mailing list have no way of knowing what software
you are using for your indexing process, so we cannot speculate about
the memory usage of that process or how you can limit it.

The Solr server (and the servlet container that it runs in) is a Java
program.  Total memory usage is therefore controlled by Java.  If you
tell Java that it is allowed to use a 4GB heap, then the total memory
usage of that instance will never exceed 4GB plus a little bit (maybe a
few hundred megabytes) for Java itself.

Thanks,
Shawn



Re: How to Speed Up Solr ResposeWriter

2017-05-16 Thread Shawn Heisey
On 5/10/2017 6:34 AM, Prashobh Chandran wrote:
>Currently we are using solr 5.3.1 engine, Im getting json format results
> from engine. But it's taking time to getting results, So i need to speed up
> solr response writer. Is there anyway?

It seems very unlikely that the response writer is the reason it's
slow.  The response writer just converts the data gathered for the
response to the specific format requested.  This is typically a VERY
fast process compared to executing the query and retrieving results.

Asking Erick's questions in a different way:  What is your evidence that
the response writer is the source of whatever problem you're having?

Thansk,
Shawn



RE: Terms not being indexed; not sure why

2017-05-16 Thread Chip Calhoun
Now that I know what to look for in the debugQuery it's becoming more clear. 
Yes, it's just searching "text" unless I specify otherwise. More importantly, 
title searches don't work at all unless I search on the complete title; words 
used in the title have no effect. Clearly I'm doing something wrong.

I've included a link to my schema, and links to some representative queries. 

Solr schema:
https://drive.google.com/open?id=0Bz0ceORxyQb9bkFtTERMZTdaWEE

q: University of Wisconsin (no quotes)
https://drive.google.com/open?id=0Bz0ceORxyQb9UlhOMmZtbUxQdTA

q: "University of Wisconsin" (in quotes)
https://drive.google.com/open?id=0Bz0ceORxyQb9NWR0NjZFdmM5WEU

q: title:University
https://drive.google.com/open?id=0Bz0ceORxyQb9X2pfM25CbFNNTmc

q: title:"University of Wisconsin--Madison. Department of Physics | Scientific 
Biographies"
https://drive.google.com/open?id=0Bz0ceORxyQb9dVkwNFRDZlRBWWs

Chip


From: Erick Erickson [erickerick...@gmail.com]
Sent: Monday, May 15, 2017 5:37 PM
To: solr-user
Subject: Re: Terms not being indexed; not sure why

Most likely you're searching against your default field, often "text".
A frequent problem is that you enter a search like

q=content:University of Wisconsin

and the search is actually

q=content:university text:of text:wisconsin

Try your debug=query with the original maybe?

In fact, somehow you're getting lucky, I'm not sure you you're even
getting a hit when you search by title since the parsed query is:

"parsedquery": "_text_:21610003",
"parsedquery_toString": "_text_:21610003",

i.e you're searching against your _text_ field not your filename field.

So my guess is that you're throwing everything in a _text_ field and
always searching against that. Since it's such a "bag of words", it's
just happening to score your query below the top 10.

You'll also want to be boosting the title field, perhaps use edismax.

Best,
Erick

On Mon, May 15, 2017 at 1:17 PM, Susheel Kumar  wrote:
> Can you upload your schema to some site like dropbox etc. to look and send
> the query which you are using and returning no results?
>
> Thanks,
> Susheel
>
> On Mon, May 15, 2017 at 1:46 PM, Chip Calhoun  wrote:
>
>> I'm creating a new Solr core to crawl a local site. We have a page on
>> "University of Wisconsin--Madison", but a search for that name in any form
>> won't appear within the first 10 results. the page is indexed, and I can
>> search for it by filename. Termfreq(title) shows 0s for search terms which
>> are very clearly in the title. What would cause this?
>>
>> In case it's useful, I'm pasting my results for a search on the filename,
>> with termfreq arguments for the terms I'd actually like to search on.
>>
>> {
>>   "responseHeader": {
>> "status": 0,
>> "QTime": 33,
>> "params": {
>>   "debugQuery": "true",
>>   "fl": "content, id, title, url, termfreq(content,\"university\"),
>> termfreq(content,\"wisconsin\"), termfreq(content,\"university of
>> wisconsin\"), termfreq(content,\"university of wisconsin--madison\"),
>> termfreq(title,\"university\"), termfreq(title,\"wisconsin\"),
>> termfreq(title,\"university of wisconsin\"), termfreq(title,\"university of
>> wisconsin--madison\"), score",
>>   "indent": "true",
>>   "q": "21610003",
>>   "_": "1494864119360",
>>   "wt": "json"
>> }
>>   },
>>   "response": {
>> "numFound": 1,
>> "start": 0,
>> "maxScore": 0.26968884,
>> "docs": [
>>   {
>> "content": [
>>   "University of Wisconsin--Madison. Department of Physics |
>> Scientific Biographies Menu ☰ Back to Top Home History Programs Niels Bohr
>> Library & Archive Physics History Network Institutions Physics History
>> Network Over 850 biographies of physicists and histories of institutions
>> with information pertaining to their lives, careers, and research. Tip:
>> Search within this page by using Ctrl + F or ⌘ + F Search Our Catalogs
>> Archives Books Collections Emilio Segré Visual Archives Digital Collections
>> Oral Histories Archival Finding Aids Physics History Network Preservation
>> and Support Donating Materials Saving Archival Collections Grants to
>> Archives Documentation Projects History Newsletters Center for History of
>> Physics Scholarship & Outreach Main Navigation Home About Topic Guides
>> Feedback Table of Contents Institutional History Abstract Department chairs
>> Important Dates Places Subjects Citations Relationships People Employees &
>> Officers PhD Students Associates & Members Institutions Institutional
>> Hierarchy Associates Resources Archival as Author Archival as Subject
>> Published as Author University of Wisconsin--Madison. Department of Physics
>> Dates 1868 – present Authorized Form of Name University of
>> Wisconsin--Madison. Department of Physics Additional Forms of Names
>> University of Wisconsin--Madison. Dept. of Physics Institutional History
>> Abstract The 

Re: Too many logs recorded in zookeeper.out

2017-05-16 Thread Shawn Heisey
On 5/16/2017 3:07 AM, Noriyuki TAKEI wrote:
> I use Solr Cloud with 3 Zoo Keepers and 2 Solr Servers,
> having 3 shards and 2 replicas.
>
> These servers are running as virtual machine on VMWare and
> virtual machines are stored in the iSCSI storage attached to VMWare.
>
> One day,iSCSI storage failure suddenly occurred and then 1 Solr Server and
> 2 Zoo Keepers were inaccessible via SSH.But indexing and searching
> seemed to work properly.

Indexing should not have worked with the loss of two ZK servers.  With
three total ZK servers, you must have at least two of them operational
to maintain quorum.  When quorum is lost, Solr will go read-only.

> In order to recover, I powered down and started up virtual machines
> inaccessible via SSH. For a few minutes after Zoo Keeper starting up,too many 
> logs as below were recorded in zookeeper.out

SolrCloud uses zookeeper, but it's a completely separate software
project, and we aren't experts on it.  The problem you're experiencing
looks like it might require expert help.  You're going to need to go to
the zookeeper mailing list, or one of their other support avenues like
their IRC channel.  We can try to help you with the Solr side, and I
have a question about that.

You'll need to have ZK experts confirm this, but usually EOFException
with a TCP-based protocol (like the one that zookeeper uses) means that
the other side (client in this case) disconnected the TCP connection
before the side logging the exception (server) had sent its response. 
Solr is the client here.  What's in the Solr logfile?

Thanks,
Shawn



Re: TrieIntField vs IntPointField performance only for equality comparison (no range filtering)

2017-05-16 Thread Shawn Heisey
On 5/16/2017 3:33 AM, Dorian Hoxha wrote:
> Has anyone measured which is more efficient/performant between the 2
> intfields if we don't need to do range-checking ? (precisionStep=0) 

Point field support in Solr is *BRAND NEW*.  Very little information is
available yet on the Solr implementation.  Benchmarks were done at the
Lucene level, but I do not know what the numbers were.  If any Solr
benchmarks were done, which I can't be sure about, I do not know where
the results might be.

Lucene had Points support long before Solr did.  The Lucene developers
felt so strongly about the superiority of the Point implementations that
they completely deprecated the legacy numeric field classes (which is
what Trie classes use) early in the 6.x development cycle, slating them
for removal in 7.0.

If you wonder about backward compatibility in Solr 7.0 because the
Lucene legacy numerics are disappearing, then you've discovered a
dilemma that we're facing before the 7.0 release.

Thanks,
Shawn



Re: solr /export handler - behavior during close()

2017-05-16 Thread Joel Bernstein
Your approach looks OK. The single sharded worker collection is only needed
if you were using CloudSolrStream to send the initial Streaming Expression
to the /stream handler. You are not doing this, so you're approach is fine.

Here are some thoughts on what you described:

1) If you are closing the parallel stream after the top 1000 results, then
try wrapping the intersect in a LimitStream. This stream doesn't exist yet
so it will be a custom stream. The LimitStream can return the EOF tuple
after it reads N tuples. This will cause the worker nodes to close the
underlying stream and cause the Broken Pipe exception to occur at the
/export handler, which will stop the /export.

Here is the basic approach:

parallel(limit(intersect(search, search)))


2) It can be tricky to understand where the bottleneck lies when using the
ParallelStream for parallel relational algebra. You can use the NullStream
to get an understanding of why performance is not increasing when you
increase the workers. Here is the basic approach:

parallel(null(intersect(search, search)))

The NullStream will eat all the tuples on the workers and return a single
tuple with the tuple count and the time taken to run the expression. So
you'll get one tuple from each worker. This will eliminate any bottleneck
on tuples returning through the ParallelStream and you can focus on the
performance of the intersect and the /export handler.

Then experiment with:

1) Increasing the number of parallel workers.
2) Increasing the number of replicas in the data collections.

And watch the timing information coming back from the NullStream tuples. If
increasing the workers is not improving performance then the bottleneck may
be in the /export handler. So try increasing replicas and see if that
improves performance. Different partitions of the streams will be served by
different replicas.

If performance doesn't improve with the NullStream after increasing both
workers and replicas then we know the bottleneck is the network.

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, May 15, 2017 at 10:37 PM, Susmit Shukla 
wrote:

> Hi Joel,
>
> Regarding the implementation, I am wrapping the topmost TupleStream in a
> ParallelStream and execute it on the worker cluster (one of the joined
> cluster doubles up as worker cluster). ParallelStream does submit the query
> to /stream handler.
> for #2, for e.g. I am creating 2 CloudSolrStreams , wrapping them in
> IntersectStream and wrapping that in ParallelStream and reading out the
> tuples from parallel stream. close() is called on parallelStream. I do have
> custom streams but that is similar to intersectStream.
> I am on solr 6.3.1
> The 2 solr clusters serving the join queries are having many shards. Worker
> collection is also multi sharded and is one from the main clusters, so do
> you imply I should be using a single sharded "worker" collection? Would the
> joins execute faster?
> On a side note, increasing the workers beyond 1 was not improving the
> execution times but was degrading if number was 3 and above. That is
> counter intuitive since the joins are huge and putting more workers should
> have improved the performance.
>
> Thanks,
> Susmit
>
>
> On Mon, May 15, 2017 at 6:47 AM, Joel Bernstein 
> wrote:
>
> > Ok please do report any issues you run into. This is quite a good bug
> > report.
> >
> > I reviewed the code and I believe I see the problem. The problem seems to
> > be that output code from the /stream handler is not properly accounting
> for
> > client disconnects and closing the underlying stream. What I see in the
> > code is that exceptions coming from read() in the stream do automatically
> > close the underlying stream. But exceptions from the writing of the
> stream
> > do not close the stream. This needs to be fixed.
> >
> > A few questions about your streaming implementation:
> >
> > 1) Are you sending requests to the /stream handler? Or are you embedding
> > CloudSolrStream in your application and bypassing the /stream handler?
> >
> > 2) If you're sending Streaming Expressions to the stream handler are you
> > using SolrStream or CloudSolrStream to send the expression?
> >
> > 3) What version of Solr are you using.
> >
> > 4) Have you implemented any custom streams?
> >
> >
> > #2 is an important question. If you're sending expressions to the /stream
> > handler using CloudSolrStream the collection running the expression would
> > have to be setup a specific way. The collection running the expression
> will
> > have to be a* single shard collection*. You can have as many replicas as
> > you want but only one shard. That's because CloudSolrStream picks one
> > replica in each shard to forward the request to then merges the results
> > from the shards. So if you send in an expression using CloudSolrStream
> that
> > expression will be sent to each shard to be run and each shard will be
> > duplicating the work and return duplicate 

Re: Date field by Atomic Update

2017-05-16 Thread Shawn Heisey
On 5/15/2017 11:31 PM, Noriyuki TAKEI wrote:
> I update some fields by Solj Atomic Update.But in 
> particular case, an error occurred.
>
> When I try to set  the value "2017-01-01" to date filed
> by Solrj Atomic Update,the following error message appears.

If the field is using the TrieDateField class, or one of the deprecated
date classes, then that is not a valid date string.  You need the full
ISO timestamp.  The milliseconds in the following string are the only
part that's optional:

2017-01-01T00:00:00.000Z

If you use DateRangeField instead of TrieDateField, then you can send your 
partial timestamp string, and Solr will do the right thing.  But you'll have to 
wipe the index and rebuild it if you change your schema.

https://cwiki.apache.org/confluence/display/solr/Working+with+Dates
https://wiki.apache.org/solr/HowToReindex

Thanks,
Shawn



Re: Seeing odd behavior with implicit routing

2017-05-16 Thread Chris Troullis
Shalin,

Thanks for the response and explanation! I logged a JIRA per your request
here: https://issues.apache.org/jira/browse/SOLR-10695

Chris


On Mon, May 15, 2017 at 3:40 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Sun, May 14, 2017 at 7:40 PM, Chris Troullis 
> wrote:
> > Hi,
> >
> > I've been experimenting with various sharding strategies with Solr cloud
> > (6.5.1), and am seeing some odd behavior when using the implicit router.
> I
> > am probably either doing something wrong or misinterpreting what I am
> > seeing in the logs, but if someone could help clarify that would be
> awesome.
> >
> > I created a collection using the implicit router, created 10 shards,
> named
> > shard1, shard2, etc. I indexed 3000 documents to each shard, routed by
> > setting the _route_ field on the documents in my schema. All works fine,
> I
> > verified there are 3000 documents in each shard.
> >
> > The odd behavior I am seeing is when I try to route a query to a specific
> > shard. I submitted a simple query to shard1 using the request parameter
> > _route_=shard1. The query comes back fine, but when I looked in the logs,
> > it looked like it was issuing 3 separate requests:
> >
> > 1. The original query to shard1
> > 2. A 2nd query to shard1 with the parameter ids=a bunch of document ids
> > 3. The original query to a random shard (changes every time I run the
> query)
> >
> > It looks like the first query is getting back a list of ids, and the 2nd
> > query is retrieving the documents for those ids? I assume this is some
> solr
> > cloud implementation detail.
> >
> > What I don't understand is the 3rd query. Why is it issuing the original
> > query to a random shard every time, when I am specifying the _route_? The
> > _route_ parameter is definitely doing something, because if I remove it,
> it
> > is querying all shards (which I would expect).
> >
> > Any ideas? I can provide the actual queries from the logs if required.
>
> How many nodes is this collection distributed across? I suspect that
> you are using a single node for experimentation?
>
> What happens with _route_=shard1 parameter and implicit routing is
> that the _route_ parameter is resolved to a list of replicas of
> shard1. But, SolrJ uses only the node name of the replica along with
> the collection name to make the request (this is important, we'll come
> back to this later). So, ordinarily, that node hosts a single shard
> (shard1) and when it receives the request, it will optimize the search
> to go the non-distributed code path (since the replica has all the
> data needed to satisfy the search).
>
> But interesting things happen when the node hosts more than one shard
> (say shard1 and shard3 both). When we query such a node using just the
> collection name, the collection name can be resolved to either shard1
> or shard3 -- this is picked randomly without looking at _route_
> parameter at all. If shard3 is picked, it looks at the request, sees
> that it doesn't have all the necessary data and decides to follow the
> two-phase distributed search path where phase 1 is to get the ids and
> score of the documents matching the query from all participating
> shards (the list of such shards is limited by _route_ parameter, which
> in our case will be only shard1) and a second phase where we get the
> actual stored fields to be returned to the user. So you get three
> queries in the log, 1) phase 1 of distributed search hitting shard1,
> 2) phase two of distributed search hitting shard1 and 3) the
> distributed scatter-gather search run by shard3.
>
> So to recap, this is happening because you have more than one shard1
> hosted on a node. Easy workaround is to have each shard hosted on a
> unique node. But we can improve things on the solr side as well by 1)
> having SolrJ resolve requests down to node name and core name, 2)
> having the collection name to core name resolution take _route_ param
> into account. Both 1 and 2 can solve the problem. Can you please open
> a Jira issue?
>
> >
> > Thanks,
> >
> > Chris
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


TrieIntField vs IntPointField performance only for equality comparison (no range filtering)

2017-05-16 Thread Dorian Hoxha
Hi,

Has anyone measured which is more efficient/performant between the 2
intfields if we don't need to do range-checking ? (precisionStep=0)

Regards,
Dorian


Too many logs recorded in zookeeper.out

2017-05-16 Thread Noriyuki TAKEI
Hi.All.

I use Solr Cloud with 3 Zoo Keepers and 2 Solr Servers,
having 3 shards and 2 replicas.

These servers are running as virtual machine on VMWare and
virtual machines are stored in the iSCSI storage attached to VMWare.

One day,iSCSI storage failure suddenly occurred and then 1 Solr Server and
2 Zoo Keepers were inaccessible via SSH.But indexing and searching
seemed to work properly.

In order to recover, I powered down and started up virtual machines
inaccessible via SSH.
For a few minutes after Zoo Keeper starting up,too many logs as below were
recorded in
zookeeper.out

  ERROR[LearnerHandler-/XXX.XXX.XXX.XXX:36524:LearnerHandler@631]
  Unexpected exception causing shutdown while sock still open
 
  java.io.EOFEception
   at java.io.DataInputStream.readInt(DataInputStream.java:392)
   at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)


The logs shown above were recorded in zookeeper.out once per 3 millisecond.

Why were too many logs recorded?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-many-logs-recorded-in-zookeeper-out-tp4335238.html
Sent from the Solr - User mailing list archive at Nabble.com.


AW: Configure query parser to handle field name case-insensitive

2017-05-16 Thread Peemöller , Björn
Hi all,

thank you for your replies!

We do not directly expose the Solr API, but provide an endpoint in our backend 
which acts as a proxy for a specific search handler. One requirement in our 
application is to search for people using various properties, e.g., first name, 
last name, description, date of birth. For simplicity reasons, we want to 
provide only a single search input and allow the user to narrow down its 
results using the query syntax, e.g. "firstname:John".

Based on your suggestions, I can see the following solutions for our problem:

1) Train the users to denote fieldnames in lowercase - they need to know the 
exact field names anyway.
2) Modify (i.e., lowercase) the search term in the backend (Java)
3) Modify (i.e., lowercase) the search term in the frontend (JS)
4) Modify the Solr query parser (provide a customized implementation)
5) Define *a lot* of field aliases 
6) Define *a lot* of copy fields

I assess these solutions to be ordered in decreasing quality, so I think that 
we will start to improve with more user guidance.

Thanks to all,
Björn

-Ursprüngliche Nachricht-
Von: Rick Leir [mailto:rl...@leirtech.com] 
Gesendet: Montag, 15. Mai 2017 18:33
An: solr-user@lucene.apache.org
Betreff: Re: Configure query parser to handle field name case-insensitive

Björn
Yes, at query time you could downcase the names. Not in Solr, but in the 
front-end web app you have in front of Solr. It needs to be a bit smart, so it 
can downcase the field names but not the query terms.

I assume you do not expose Solr directly to the web.

This downcasing might be easier to do in Javascript in the browser. 
Particularly if the user never has to enter a field name.

Another solution, this time inside Solr, is to provide copyfields for ID, Id, 
and maybe iD. And for other fields that you mention in queries. This will 
consume some memory, particularly for saved fields, so I hesitate to even 
suggest it. Cheers - Rick


On May 15, 2017 9:16:59 AM EDT, "Peemöller, Björn" 
 wrote:
>Hi Rick,
>
>thank you for your reply! I really meant field *names*, since our 
>values are already processed by a lower case filter (both index and 
>query). However, our users are confused because they can search for 
>"id:1" but not for "ID:1". Furthermore, we employ the EDisMax query 
>parser, so then even get no error message.
>
>Therefore, I thought it may be sufficient to map all field names to 
>lower case at the query level so that I do not have to introduce 
>additional fields.
>
>Regards,
>Björn
>
>-Ursprüngliche Nachricht-
>Von: Rick Leir [mailto:rl...@leirtech.com]
>Gesendet: Montag, 15. Mai 2017 13:48
>An: solr-user@lucene.apache.org
>Betreff: Re: Configure query parser to handle field name 
>case-insensitive
>
>Björn
>Field names or values? I assume values. Your analysis chain in 
>schema.xml probably downcases chars, if not then that could be your 
>problem.
>
>Field _name_? Then you might have to copyfield the field to a new field 
>with the desired case. Avoid doing that if you can. Cheers -- Rick
>
>On May 15, 2017 5:48:09 AM EDT, "Peemöller, Björn"
> wrote:
>>Hi all,
>>
>>I'm fairly new at using Solr and I need to configure our instance to 
>>accept field names in both uppercase and lowercase (they are defined
>as
>>lowercase in our configuration). Is there a simple way to achieve
>this?
>>
>>Thanks in advance,
>>Björn
>>
>>Björn Peemöller
>>IT & IT Operations
>>
>>BERENBERG
>>Joh. Berenberg, Gossler & Co. KG
>>Neuer Jungfernstieg 20
>>20354 Hamburg
>>
>>Telefon +49 40 350 60-8548
>>Telefax +49 40 350 60-900
>>E-Mail
>>bjoern.peemoel...@berenberg.de
>>www.berenberg.de
>>
>>Sitz: Hamburg - Amtsgericht Hamburg HRA 42659
>>
>>
>>Diese Nachricht einschliesslich etwa beigefuegter Anhaenge ist 
>>vertraulich und kann dem Bank- und Datengeheimnis unterliegen oder 
>>sonst rechtlich geschuetzte Daten und Informationen enthalten. Wenn
>Sie
>>nicht der richtige Adressat sind oder diese Nachricht irrtuemlich 
>>erhalten haben, informieren Sie bitte sofort den Absender über die 
>>Antwortfunktion. Anschliessend moechten Sie bitte diese Nachricht 
>>einschliesslich etwa beigefuegter Anhaenge unverzueglich vollstaendig 
>>loeschen. Das unerlaubte Kopieren oder Speichern dieser Nachricht 
>>und/oder der ihr etwa beigefuegten Anhaenge sowie die unbefugte 
>>Weitergabe der darin enthaltenen Daten und Informationen sind nicht 
>>gestattet. Wir weisen darauf hin, dass rechtsverbindliche Erklaerungen
>
>>namens unseres Hauses grundsaetzlich der Unterschriften zweier 
>>ausreichend bevollmaechtigter Vertreter unseres Hauses beduerfen. Wir 
>>verschicken daher keine rechtsverbindlichen Erklaerungen per E-Mail an
>
>>Dritte. Demgemaess nehmen wir per E-Mail auch keine
>rechtsverbindlichen
>>Erklaerungen oder Auftraege von Dritten entgegen.
>>Sollten Sie Schwierigkeiten beim Oeffnen dieser