Re: deletebyQuery vs deletebyId

2018-05-23 Thread Erick Erickson
Hmmm, this looks like https://issues.apache.org/jira/browse/SOLR-8889?
And are you the "Jay" who commented there?

On Wed, May 23, 2018 at 11:28 PM, Erick Erickson
 wrote:
> Tell us some more about your setup, particularly:
> - you mention routing key. Is the collection used with implicit
> routing or compositeID?
> - What does adding =query show?
> - I'm not entirely sure, frankly, how delete by id and having a
> different routing field play together. The supposition behind
> deleteById is that the deletions can be routed to the correct leader
> by hashing on the id field.
>
> Best,
> Erick
>
> On Wed, May 23, 2018 at 6:02 PM, Jay Potharaju  wrote:
>> Thanks Emir & Shawn for chiming in!.
>> I am testing deleteById in solr6.6.3 and it does not seem to work. I have a
>> 6 shards in my collection and when sending query to solr a routing key is
>> also passed. Also tested this in solr 5.3 also, with same results.
>> Any suggestions why that would be happening?
>>
>> Thanks
>> Jay
>>
>>
>>
>> On Wed, May 23, 2018 at 1:26 AM Emir Arnautović <
>> emir.arnauto...@sematext.com> wrote:
>>
>>> Hi Jay,
>>> Solr does not handle it differently from any other DBQ. It will show less
>>> issues then some other DBQ because affects less documents but the mechanics
>>> of DBQ is the same and does not play well with concurrent changes of index
>>> (merges/updates) especially in SolrCloud mode. Here are some thoughts on
>>> DBQ: http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html <
>>> http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html>
>>>
>>> Thanks,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>>
>>>
>>>
>>> > On 23 May 2018, at 02:35, Jay Potharaju  wrote:
>>> >
>>> > Hi,
>>> > I have a quick question about deletebyQuery vs deleteById. When using
>>> > deleteByQuery, if query is id:123 is that same as deleteById in terms of
>>> > performance.
>>> >
>>> >
>>> > Thanks
>>> > Jay
>>>
>>>


Re: deletebyQuery vs deletebyId

2018-05-23 Thread Erick Erickson
Tell us some more about your setup, particularly:
- you mention routing key. Is the collection used with implicit
routing or compositeID?
- What does adding =query show?
- I'm not entirely sure, frankly, how delete by id and having a
different routing field play together. The supposition behind
deleteById is that the deletions can be routed to the correct leader
by hashing on the id field.

Best,
Erick

On Wed, May 23, 2018 at 6:02 PM, Jay Potharaju  wrote:
> Thanks Emir & Shawn for chiming in!.
> I am testing deleteById in solr6.6.3 and it does not seem to work. I have a
> 6 shards in my collection and when sending query to solr a routing key is
> also passed. Also tested this in solr 5.3 also, with same results.
> Any suggestions why that would be happening?
>
> Thanks
> Jay
>
>
>
> On Wed, May 23, 2018 at 1:26 AM Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
>
>> Hi Jay,
>> Solr does not handle it differently from any other DBQ. It will show less
>> issues then some other DBQ because affects less documents but the mechanics
>> of DBQ is the same and does not play well with concurrent changes of index
>> (merges/updates) especially in SolrCloud mode. Here are some thoughts on
>> DBQ: http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html <
>> http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html>
>>
>> Thanks,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>>
>> > On 23 May 2018, at 02:35, Jay Potharaju  wrote:
>> >
>> > Hi,
>> > I have a quick question about deletebyQuery vs deleteById. When using
>> > deleteByQuery, if query is id:123 is that same as deleteById in terms of
>> > performance.
>> >
>> >
>> > Thanks
>> > Jay
>>
>>


Re: How to do parallel indexing on files (not on HDFS)

2018-05-23 Thread Raymond Xie
Thank you Rahul despite that's very high level.

With no offense, do you have a successful implementation or it is just your
unproven idea? I never used Rabbit nor Kafka before but would be very
interested in knowing more detail on the Kafka idea as Kafka is available
in my environment.

Thank you again and look forward to hearing more from you or anyone in this
Solr community.


**
*Sincerely yours,*


*Raymond*

On Wed, May 23, 2018 at 8:15 AM, Rahul Singh 
wrote:

> Enumerate the file locations (map) , put them in a queue like rabbit or
> Kafka (Persist the map), have a bunch of threads , workers, containers,
> whatever pop off the queue , process the item (reduce).
>
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On May 20, 2018, 7:24 AM -0400, Raymond Xie , wrote:
>
> I know how to do indexing on file system like single file or folder, but
> how do I do that in a parallel way? The data I need to index is of huge
> volume and can't be put on HDFS.
>
> Thank you
>
> **
> *Sincerely yours,*
>
>
> *Raymond*
>
>


Re: Unable to make IN queries on a particular field in solr

2018-05-23 Thread Shawn Heisey

On 5/23/2018 5:40 PM, RAUNAK AGRAWAL wrote:

I am facing an issue where I have a collection named employee collection.

Suppose I was to search employee by its id, so my query is *id:(1 2 3*) and
it is working fine in solr. Now let say I want to search by their name. So
my query is name:(Alice Bob).

Now the problem is when I am querying by *name:(Alice Bob)*, I am not
getting any result but if I query by *name:(Alice OR Bob)*, I am able to
fetch the result.

Can someone please explain:

- Why IN query for name is not working with space and working with *OR*
- *Why IN query for id is working with space and not working for name
though both are fields in same collection.*


Please add =true=0=all to the URL for both 
versions of the name query (removing any existing rows parameter) and 
send us the entirety of both responses.


These parameters are discussed here:

https://lucene.apache.org/solr/guide/7_3/common-query-parameters.html

What version of Solr are you running?

Thanks,
Shawn



Re: Trying to update Solrj in our app...

2018-05-23 Thread Shawn Heisey
On 5/23/2018 11:46 AM, BlackIce wrote:
> Is there a list of things that have been deprecated in solr since 5.0.0? Or
> do I have to read EVERY release readme till I get to 7.3.1?

The javadoc for each release has pages that list all deprecations in
that release on a per-module basis, so there are different pages for
solr-solrj, solr-core, and other modules within the source code.

Typically for user code, you're only going to care about SolrJ.  Here's
the page for SolrJ 7.3.1:

https://lucene.apache.org/solr/7_3_1/solr-solrj/deprecated-list.html

You can change the version number in the URL to get different versions. 
Here's the one for 6.6.3:

https://lucene.apache.org/solr/6_6_3/solr-solrj/deprecated-list.html

If you want to get an idea of what's completely eliminated from a
certain major release, look at the javadoc for the class you're
interested, find the page specific to the latest release in the previous
major version, and click on "deprecated" at the top of the page.  So to
find out what's missing from 7.x, you would load the page for 6.6.3,
which is currently the latest 6.x release.

Thanks,
Shawn



Unable to make IN queries on a particular field in solr

2018-05-23 Thread RAUNAK AGRAWAL
Hi,

I am facing an issue where I have a collection named employee collection.

Suppose I was to search employee by its id, so my query is *id:(1 2 3*) and
it is working fine in solr. Now let say I want to search by their name. So
my query is name:(Alice Bob).

Now the problem is when I am querying by *name:(Alice Bob)*, I am not
getting any result but if I query by *name:(Alice OR Bob)*, I am able to
fetch the result.

Can someone please explain:


   - Why IN query for name is not working with space and working with *OR*
   - *Why IN query for id is working with space and not working for name
   though both are fields in same collection.*


Thanks


Re: deletebyQuery vs deletebyId

2018-05-23 Thread Jay Potharaju
Thanks Emir & Shawn for chiming in!.
I am testing deleteById in solr6.6.3 and it does not seem to work. I have a
6 shards in my collection and when sending query to solr a routing key is
also passed. Also tested this in solr 5.3 also, with same results.
Any suggestions why that would be happening?

Thanks
Jay



On Wed, May 23, 2018 at 1:26 AM Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Jay,
> Solr does not handle it differently from any other DBQ. It will show less
> issues then some other DBQ because affects less documents but the mechanics
> of DBQ is the same and does not play well with concurrent changes of index
> (merges/updates) especially in SolrCloud mode. Here are some thoughts on
> DBQ: http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html <
> http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html>
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 23 May 2018, at 02:35, Jay Potharaju  wrote:
> >
> > Hi,
> > I have a quick question about deletebyQuery vs deleteById. When using
> > deleteByQuery, if query is id:123 is that same as deleteById in terms of
> > performance.
> >
> >
> > Thanks
> > Jay
>
>


Restore of Solr Collection with schema changes ?

2018-05-23 Thread Satyanarayana Kalahasthi
Hi Team,

Is the back-up of Solr collection is possible if there are any schema changes 
(addition of new field or deletion of existing field in managed-schema.xml) 
during restore? I am not a registered user, but please do the needful.

Thanks
Kalahasthi Satyanarayana
Mobile : 08884581161


The information contained in this email message and any attachments is 
confidential and intended only for the addressee(s). If you are not an 
addressee, you may not copy or disclose the information, or act upon it, and 
you should delete it entirely from your email system. Please notify the sender 
that you received this email in error.


Re: Search Analytics Help

2018-05-23 Thread Sameer Maggon
Ennio,

Have you taken a look at SearchStax Analytics?

https://www.searchstax.com/docs/search-analytics-start/

Thanks,




On Wed, May 23, 2018 at 11:34 AM, ennio  wrote:

> Thanks all for the comments. I'm looking at the ELK option here.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>



-- 
Sameer Maggon
https://www.searchstax.com


Re: Search Analytics Help

2018-05-23 Thread ennio
Thanks all for the comments. I'm looking at the ELK option here. 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Trying to update Solrj in our app...

2018-05-23 Thread BlackIce
Hi again,

Is there a list of things that have been deprecated in solr since 5.0.0? Or
do I have to read EVERY release readme till I get to 7.3.1?

On Wed, May 23, 2018 at 4:31 PM, BlackIce  wrote:

> Thnx, but that doesn't compile either... lemme read up this...
>
> On Wed, May 23, 2018 at 3:52 PM, Shawn Heisey  wrote:
>
>> On 5/23/2018 7:25 AM, BlackIce wrote:
>>
>>> I've got an app here that posts data to Solr using Solrj...
>>> I'm trying to update all our apps dependencies, and now I've reached
>>> Solrj
>>>   Last kown working version is 5.5.0, anything after that dies at compile
>>> time with:
>>>
>> 
>>
>>> if (val instanceof Date) {
>>>val2 =  DateUtil.getThreadLocalDateFormat().format(val);
>>> }
>>>
>>
>> Use this instead:
>>
>> val2 = DateTimeFormatter.ISO_INSTANT.format(val.toInstant());
>>
>> ISO_INSTANT is probably what you want, but there are other choices if
>> that's not the correct format.
>>
>> This will require a new import -- java.time.format.DateTimeFormatter.
>> And you will need JDK 8, which you should already have because SolrJ 6.0
>> and later requires it.
>>
>> Thanks,
>> Shawn
>>
>>
>


Re: Solr Dates TimeZone

2018-05-23 Thread Erick Erickson
And you do _not_ want to store anything except UTC for dates.
IMO, all the expectations about timezones are a remnant of when
computers were on-prem. You'll spend _endless_ hours trying
to deal with "the time shown is an hour off" if you try to store
anything except UTC on any server anywhere and deliver it
anywhere else.
 ;)

Best,
Erick

On Wed, May 23, 2018 at 11:12 AM, Shawn Heisey  wrote:
> On 5/22/2018 9:26 AM, LOPEZ-CORTES Mariano-ext wrote:
>> It's possible to configure Solr with a timezone other than GMT?
>
> No, at least not in the way that you're thinking.
>
>> It's possible to configure Solr Admin to view dates with a timezone other 
>> than GMT?
>
> As far as I know, this is not possible.  The information in search
> results is not interpreted at all, it is shown exactly as it is received
> from the server.  The server is going to send UTC, and it is going to
> expect UTC at index time if the input is a string.  It is up to client
> software to translate to the users timezone.
>
> It is possible to tell Solr what timezone it should use to determine day
> boundaries for date math -- NOW/DAY, NOW/WEEK, etc.  But the actual data
> will still be in UTC.
>
> Thanks,
> Shawn
>


Re: Debugging/scoring question

2018-05-23 Thread Erick Erickson
Well, first you have to be using that similarity ;)

Since Solr 6.0, BM25 has been the default similarity algorithm.

If you insist, you can modify the score with function queries, see the
docfreq method.

Best,
Erck

On Wed, May 23, 2018 at 12:17 PM, LOPEZ-CORTES Mariano-ext
 wrote:
> Yes. This make sense.
>
> I guess you talk about this doc:
>
> https://lucene.apache.org/core/6_0_1/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
>
> How I can decrease the effect of the IDF component in my query?
>
> Thanks!!
>
> -Message d'origine-
> De : Alessandro Benedetti [mailto:a.benede...@sease.io]
> Envoyé : mercredi 23 mai 2018 18:05
> À : solr-user@lucene.apache.org
> Objet : Re: Debugging/scoring question
>
> Hi Mariano,
> From the documentation :
>
> docCount = total number of documents containing this field, in the range [1 
> .. {@link #maxDoc()}]
>
> In your debug the fields involved in the score computation are indeed 
> different ( nomUsageE, prenomE) .
>
> Does this make sense ?
>
> Cheers
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Debugging/scoring question

2018-05-23 Thread LOPEZ-CORTES Mariano-ext
Yes. This make sense.

I guess you talk about this doc:

https://lucene.apache.org/core/6_0_1/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

How I can decrease the effect of the IDF component in my query?

Thanks!!

-Message d'origine-
De : Alessandro Benedetti [mailto:a.benede...@sease.io] 
Envoyé : mercredi 23 mai 2018 18:05
À : solr-user@lucene.apache.org
Objet : Re: Debugging/scoring question

Hi Mariano,
>From the documentation :

docCount = total number of documents containing this field, in the range [1 .. 
{@link #maxDoc()}]

In your debug the fields involved in the score computation are indeed different 
( nomUsageE, prenomE) .

Does this make sense ?

Cheers



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Debugging/scoring question

2018-05-23 Thread Alessandro Benedetti
Hi Mariano,
>From the documentation :

docCount = total number of documents containing this field, in the range [1
.. {@link #maxDoc()}]

In your debug the fields involved in the score computation are indeed
different ( nomUsageE, prenomE) .

Does this make sense ?

Cheers



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Sorting on pseudo field(The one which is added during doctransformer)

2018-05-23 Thread prateek . agarwal
Thanks, Mikhail

On 2018/05/18 11:33:09, Mikhail Khludnev  wrote: 
> Right
> https://wiki.apache.org/solr/SolrPlugins#ValueSourceParser
> 
> On Fri, May 18, 2018 at 8:04 AM, prateek.agar...@bigbasket.com <
> prateek.agar...@bigbasket.com> wrote:
> 
> > Hi Mikhail,
> >
> > I think you forgot to link the reference.
> >
> > Thanks
> >
> >
> > Regards,
> > Prateek
> >
> > On 2018/05/17 13:18:22, Mikhail Khludnev  wrote:
> > > Here is the reference I've found so far.
> > >
> > > On Thu, May 17, 2018 at 12:26 PM, prateek.agar...@bigbasket.com <
> > > prateek.agar...@bigbasket.com> wrote:
> > >
> > > >
> > > > Hi Mikhail,
> > > >
> > > > > You can either sort by function that needs to turn the logic into
> > value
> > > > > source parser.
> > > >
> > > > But like my requirement for this was to add a field dynamically from
> > cache
> > > > or external source to the returned documents from the solr and perform
> > > > sorting in the solr itself if required otherwise use the score to sort.
> > > > So how would you advise to go about this??
> > > >
> > > > And how to go about your way "to turn the logic into value source
> > parser"
> > > > like how to do this for this case??
> > > >
> > > >
> > > > > If you need to toss just result page, check rerank.
> > > >
> > > > I don't want to use it to rank the relevancy of results.
> > > >
> > > > Thanks for the response.
> > > >
> > > >
> > > >
> > > > Regards,
> > > > Prateek
> > > >
> > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > >
> >
> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev
> 


Re: Zookeeper 3.4.12 with Solr 6.6.2?

2018-05-23 Thread Walter Underwood
Thanks for a detailed and clear answer.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 23, 2018, at 8:20 AM, Shawn Heisey  wrote:
> 
> On 5/22/2018 10:44 AM, Walter Underwood wrote:
>> Is anybody running Zookeeper 3.4.12 with Solr 6.6.2? Is that a recommended 
>> combination? Not recommended?
> 
> Solr 6.6.2 shipped with ZK 3.4.10, which the ZK project released
> 2017-Mar-30.
> 
> I asked the zk mailing list about any gotchas they're aware of when
> running mismatched versions, assuming the server is running a newer
> version than the client.  They replied with a document discussing their
> backward compatibility goals (scroll to the end):
> 
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
> 
> Solr 4.0.0 (the first stable version with Cloud mode) shipped with ZK
> 3.3.6.  In 4.1.0, that was upgraded to ZK 3.4.5.  Solr 7.4.0 will most
> likely ship with ZK 3.4.12. (SOLR-12346)
> 
> ALL versions of SolrCloud should work with 3.4.12 on the ZK server side.
> 
> If you upgrade your ZK servers to the 3.5 beta, then no guarantees can
> be made for Solr 4.0, but 4.1 and later should work.
> 
> Thanks,
> Shawn
> 



Re: Zookeeper 3.4.12 with Solr 6.6.2?

2018-05-23 Thread Shawn Heisey
On 5/22/2018 10:44 AM, Walter Underwood wrote:
> Is anybody running Zookeeper 3.4.12 with Solr 6.6.2? Is that a recommended 
> combination? Not recommended?

Solr 6.6.2 shipped with ZK 3.4.10, which the ZK project released
2017-Mar-30.

I asked the zk mailing list about any gotchas they're aware of when
running mismatched versions, assuming the server is running a newer
version than the client.  They replied with a document discussing their
backward compatibility goals (scroll to the end):

https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement

Solr 4.0.0 (the first stable version with Cloud mode) shipped with ZK
3.3.6.  In 4.1.0, that was upgraded to ZK 3.4.5.  Solr 7.4.0 will most
likely ship with ZK 3.4.12. (SOLR-12346)

ALL versions of SolrCloud should work with 3.4.12 on the ZK server side.

If you upgrade your ZK servers to the 3.5 beta, then no guarantees can
be made for Solr 4.0, but 4.1 and later should work.

Thanks,
Shawn



Re: Solr Dates TimeZone

2018-05-23 Thread Shawn Heisey
On 5/22/2018 9:26 AM, LOPEZ-CORTES Mariano-ext wrote:
> It's possible to configure Solr with a timezone other than GMT?

No, at least not in the way that you're thinking.

> It's possible to configure Solr Admin to view dates with a timezone other 
> than GMT?

As far as I know, this is not possible.  The information in search
results is not interpreted at all, it is shown exactly as it is received
from the server.  The server is going to send UTC, and it is going to
expect UTC at index time if the input is a string.  It is up to client
software to translate to the users timezone.

It is possible to tell Solr what timezone it should use to determine day
boundaries for date math -- NOW/DAY, NOW/WEEK, etc.  But the actual data
will still be in UTC.

Thanks,
Shawn



Re: Index filename while indexing JSON file

2018-05-23 Thread Shawn Heisey
On 5/18/2018 1:47 PM, S.Ashwath wrote:
> I have 2 directories: 1 with txt files and the other with corresponding
> JSON (metadata) files (around 9 of each). There is one JSON file for
> each CSV file, and they share the same name (they don't share any other
> fields).
>
> The txt files just have plain text, I mapped each line to a field call
> 'sentence' and included the file name as a field using the data import
> handler. No problems here.
>
> The JSON file has metadata: 3 tags: a URL, author and title (for the
> content in the corresponding txt file).
> When I index the JSON file (I just used the _default schema, and posted the
> fields to the schema, as explained in the official solr tutorial),* I don't
> know how to get the file name into the index as a field.* As far as i know,
> that's no way to use the Data import handler for JSON files. I've read that
> I can pass a literal through the bin/post tool, but again, as far as I
> understand, I can't pass in the file name dynamically as a literal.
>
> I NEED to get the file name, it is the only way in which I can associate
> the metadata with each sentence in the txt files in my downstream Python
> code.
>
> So if anybody has a suggestion about how I should index the JSON file name
> along with the JSON content (or even some workaround), I'd be eternally
> grateful.

The indexing tools included with Solr are good for simple use cases. 
They're generic tools with limits.

The bin/post tool calls a class that is literally called
SimplePostTool.  It is never going to have a lot of capability.

The dataimport handler, while certainly capable of far more than the
simple post tool, is somewhat rigid in its operation. 

A sizable percentage of Solr users end up writing their own indexing
software because what's included with Solr isn't capable of adjusting to
their needs.  Your situation sounds like one that is going to require
custom indexing software that you or somebody in your company must write.

Thanks,
Shawn



Re: Trying to update Solrj in our app...

2018-05-23 Thread BlackIce
Thnx, but that doesn't compile either... lemme read up this...

On Wed, May 23, 2018 at 3:52 PM, Shawn Heisey  wrote:

> On 5/23/2018 7:25 AM, BlackIce wrote:
>
>> I've got an app here that posts data to Solr using Solrj...
>> I'm trying to update all our apps dependencies, and now I've reached Solrj
>>   Last kown working version is 5.5.0, anything after that dies at compile
>> time with:
>>
> 
>
>> if (val instanceof Date) {
>>val2 =  DateUtil.getThreadLocalDateFormat().format(val);
>> }
>>
>
> Use this instead:
>
> val2 = DateTimeFormatter.ISO_INSTANT.format(val.toInstant());
>
> ISO_INSTANT is probably what you want, but there are other choices if
> that's not the correct format.
>
> This will require a new import -- java.time.format.DateTimeFormatter. And
> you will need JDK 8, which you should already have because SolrJ 6.0 and
> later requires it.
>
> Thanks,
> Shawn
>
>


Re: Question regarding TLS version for solr

2018-05-23 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Anchal,

On 5/23/18 2:38 AM, Anchal Sharma2 wrote:
> Thank you for replying .But ,I checked the java version solr using
> ,and it is already  version 1.8.
> 
> @Christopher ,can you let me know what steps you followed for TLS
> authentication on solr version 7.3.0.

Sure. Here are my deployment notes. You may have to adjust them
slightly for your environment. Note that we are using standalone Solr
without any Zookeeper, clustering, etc. This is just about configuring
a single instance. Also, this guide says 7.3.0, but 7.3.1 would be
better as it contains a fix for a CVE.

=== CUT ===


 Instructions for installing Solr and working with Cores


Installation
- 

Installing Solr is fairly simple. One can simply untar the distribution
tarball and work from that directory, but it is better to install it
in a somewhat more centralized place with a separate data directory
to facilitate upgrades, etc.

1. Obtain the distribution tarball
   Go to https://lucene.apache.org/solr/mirrors-solr-latest-redir.html
   and obtain the latest supported version of Solr.
   (7.3.0 as of this writing).

2. Untar the archive
   $ tar xzf solr-x.y.x.tgz

3. Install Solr
   $ cd solr-x.y.z
   $ sudo bin/install_solr_service.sh ../solr-x.y.z.tgz \
 -i /usr/local \
 -d /mnt/securefs/solr \
 -n
   (that last -n says "don't start Solr")

4. Configure Solr Settings
   Edit the file /etc/default/solr.in.sh

   Settings you may want to explicitly set:

   SOLR_JAVA_HOME=(java home)
   SOLR_HEAP="1024M"

5. Configure Solr for TLS
   Create a server key and certificate:
   $ sudo mkdir /etc/solr
   $ sudo keytool -genkey -keyalg EC -sigalg SHA256withECDSA -keysize
256 -validity 730 \
  -alias 'solr-ssl' -keystore /etc/solr/solr.p12 -storetype
PKCS12 \
  -ext san=dns:localhost,ip:192.168.10.20
 Use the following information for the certificate:
 First and Last name: 192.168.10.20 (or "localhost", or your
IP address)
 Org unit:  [whatever]
 Everything else should be obvious

   Now, export the public key from the keystore.

   $ sudo /usr/local/java-8/bin/keytool -list -rfc -keystore
/etc/solr/solr.p12 -storetype PKCS12 -alias solr-ssl

   Copy that certificate and paste it into this command's stdin:

   $ sudo keytool -importcert -keystore /etc/solr/solr-server.p12
- -storetype PKCS12 -alias 'solr-ssl'

   Now, fix the ownership and permissions on these files:

   $ sudo chown root:solr /etc/solr/solr.p12 /etc/solr/solr-server.p12
   $ sudo chmod 0640 /etc/solr/solr.p12

   Edit the file /etc/default/solr.in.sh

   Set the following settings:

   SOLR_SSL_KEY_STORE=/etc/solr/solr.p12
   SOLR_SSL_KEY_STORE_TYPE=PKCS12
   SOLR_SSL_KEY_STORE_PASSWORD=whatever

   # You MUST set the trust store for some reason.
   SOLR_SSL_TRUST_STORE=/etc/solr/solr-server.p12
   SOLR_SSL_TRUST_STORE_TYPE=PKCS12
   SOLR_SSL_TRUST_STORE_PASSWORD=whatever

   Then, patch the file bin/post; you are going to need this, later.

- --- bin/post2017-09-03 13:29:15.0 -0400
+++ /usr/local/solr/bin/post2018-04-11 20:08:17.0 -0400
@@ -231,8 +231,8 @@
   PROPS+=('-Drecursive=yes')
 fi

- -echo "$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}"
org.apache.solr.util.SimplePostTool "${PARAMS[@]}"
- -"$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}"
org.apache.solr.util.SimplePostTool "${PARAMS[@]}"
+echo "$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}"
${SOLR_POST_OPTS} org.apache.solr.util.SimplePostTool "${PARAMS[@]}"
+"$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}" ${SOLR_POST_OPTS}
org.apache.solr.util.SimplePostTool "${PARAMS[@]}"

6. Configure Solr to Require Client TLS Certificates

  On each client, create a client key and certificate:

  $ keytool -genkey -keyalg EC -sigalg SHA256withECDSA -keysize 256 \
-validity 730 -alias 'solr-client-ssl'

  Now dump the certificate for the next step:

  $ keytool -exportcert -keystore [client-key-store] -storetype PKCS12 \
-alias 'solr-client-ssl'

  Don't forget that you might want to generate your own client certifica
te
  to use from you own web browser if you want to be able to connect to t
he
  server's dashboard.

  Use the output of that command on each client to put the cert(s)
into this
  trust store on the server:

  $ sudo keytool -importcert -keystore
/etc/solr/solr-trusted-clients.p12 \
 -storetype PKCS12 -alias '[client key alias]'

Edit /etc/default/solr.in.sh and add the following entries:

  SOLR_SSL_NEED_CLIENT_AUTH=true
  SOLR_SSL_TRUST_STORE=/etc/solr/solr-trusted-clients.p12
  SOLR_SSL_TRUST_STORE_TYPE=PKCS12
  SOLR_SSL_TRUST_STORE_PASSWORD=whatever

Summary of Files in /etc/solr
- -

solr-client.p12   Client keystore. Contains client key and certificate.
  Used by clients to 

Re: Trying to update Solrj in our app...

2018-05-23 Thread Shawn Heisey

On 5/23/2018 7:25 AM, BlackIce wrote:

I've got an app here that posts data to Solr using Solrj...
I'm trying to update all our apps dependencies, and now I've reached Solrj
  Last kown working version is 5.5.0, anything after that dies at compile
time with:



if (val instanceof Date) {
   val2 =  DateUtil.getThreadLocalDateFormat().format(val);
}


Use this instead:

val2 = DateTimeFormatter.ISO_INSTANT.format(val.toInstant());

ISO_INSTANT is probably what you want, but there are other choices if 
that's not the correct format.


This will require a new import -- java.time.format.DateTimeFormatter. 
And you will need JDK 8, which you should already have because SolrJ 6.0 
and later requires it.


Thanks,
Shawn



Trying to update Solrj in our app...

2018-05-23 Thread BlackIce
Hi,

I've got an app here that posts data to Solr using Solrj...
I'm trying to update all our apps dependencies, and now I've reached Solrj
 Last kown working version is 5.5.0, anything after that dies at compile
time with:

 cannot find symbol
[javac] import org.apache.solr.common.util.DateUtil;
[javac] ^
[javac]   symbol:   class DateUtil
[javac]   location: package org.apache.solr.common.util

AND:


 error: cannot find symbol
[javac]   val2 =
DateUtil.getThreadLocalDateFormat().format(val);
[javac]   ^
[javac]   symbol:   variable DateUtil
[javac]   location: class SolrIndexWriter

Doing some research, it tells me that at some point DateUtil was deprecated
from the solr ... and it says something like to use Instant.format()
instead, if someone needs to format the date for some reason..
so I comment out..
import org.apache.solr.common.util.DateUtil;

AND this is being imported...

import java.util.Date;


and I'm only left with the second error.

So, the question is what does this code have to look like for a current
version of Solrj?:

// normalise the string representation for a Date
Object val2 = val;

if (val instanceof Date) {
  val2 =  DateUtil.getThreadLocalDateFormat().format(val);
}


If someone would be so kind to give me a hand here it would be greatly
apreciated..


thnx


Debugging/scoring question

2018-05-23 Thread LOPEZ-CORTES Mariano-ext
Hi all

I've a 20 document collection. In a debugging plan, we have:

"100051":"
20.794415 = max of:
  20.794415 = weight(nomUsageE:jean in 1) [SchemaSimilarity], result of:
20.794415 = score(doc=1,freq=1.0 = termFreq=1.0
), product of:
  15.0 = boost
  1.3862944 = idf, computed as log(1 + (docCount - docFreq + 0.5) / 
(docFreq + 0.5)) from:
1.0 = docFreq
5.0 = docCount
  1.0 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * 
fieldLength / avgFieldLength)) from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
1.0 = avgFieldLength
1.0 = fieldLength

  "100053":"
21.11246 = max of:
  21.11246 = weight(prenomE:jean in 3) [SchemaSimilarity], result of:
21.11246 = score(doc=3,freq=1.0 = termFreq=1.0
), product of:
  8.0 = boost
  2.6390574 = idf, computed as log(1 + (docCount - docFreq + 0.5) / 
(docFreq + 0.5)) from:
1.0 = docFreq
20.0 = docCount
  1.0 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * 
fieldLength / avgFieldLength)) from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
1.0 = avgFieldLength
1.0 = fieldLength

docCount = 5.0 for the document 100051. Why? docCount is the total number 
of documents, isn't it?

Thanks in advance!




Re: How to do parallel indexing on files (not on HDFS)

2018-05-23 Thread Rahul Singh
Enumerate the file locations (map) , put them in a queue like rabbit or Kafka 
(Persist the map), have a bunch of threads , workers, containers, whatever pop 
off the queue , process the item (reduce).


--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On May 20, 2018, 7:24 AM -0400, Raymond Xie , wrote:
> I know how to do indexing on file system like single file or folder, but
> how do I do that in a parallel way? The data I need to index is of huge
> volume and can't be put on HDFS.
>
> Thank you
>
> **
> *Sincerely yours,*
>
>
> *Raymond*


Configuring aliases in ZooKeeper first

2018-05-23 Thread Gael Jourdan-Weil
Hello everyone,

We are running a SolrCloud cluster with ZooKeeper.
This SolrCloud cluster is down most of the time (backup environment) but the 
ZooKeeper instances are always up so that we can easily update configuration.

This has been working fine for a long time with Solr 6.4.0 then 6.6.0, but 
since upgrading to 7.2.1, we ran into an issue where Solr ignores aliases.json 
stored in ZooKeeper.

Steps to reproduce the problem:
1/ SolrCloud cluster is down
2/ Direct update of aliases.json file in ZooKeeper with Solr ZkCLI *without 
using Collections API* :
java ... org.apache.solr.cloud.ZkCLI -zkhost ... -cmd clear /aliases.json
java ... org.apache.solr.cloud.ZkCLI -zkhost ... -cmd put /aliases.json "new 
content"
3/ SolrCloud cluster is started => aliases.json not taken into account

Digging a bit in the code, what is actually causing the issue is that, when 
starting, Solr now checks for the metadata of the aliases.json file and if the 
version metadata from ZooKeeper is lower or equal to local version, it keeps 
the local version.
When it starts, Solr has a local version of 0 for the aliases but ZooKeeper 
also has a version of 0 of the file because we just recreated it. So Solr 
ignores ZooKeeper configuration and never has a chance to load aliases.

Relevant parts of Solr code are:
- 
https://github.com/apache/lucene-solr/blob/branch_7_2/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java
 : line 4562 : method setIfNewer
- 
https://github.com/apache/lucene-solr/blob/branch_7_2/solr/solrj/src/java/org/apache/solr/common/cloud/Aliases.java
 : line 45 : the "empty" Aliases object with default version 0

Obviously, a workaround is to force ZooKeeper to have a version greater than 0 
for aliases.json file (for instance by not clearing the file and just 
overwriting it again and again).


But we were wondering, is this the intended behavior for Solr ?

Thanks for reading,

Gaël

Re: deletebyQuery vs deletebyId

2018-05-23 Thread Emir Arnautović
Hi Jay,
Solr does not handle it differently from any other DBQ. It will show less 
issues then some other DBQ because affects less documents but the mechanics of 
DBQ is the same and does not play well with concurrent changes of index 
(merges/updates) especially in SolrCloud mode. Here are some thoughts on DBQ: 
http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html 


Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 23 May 2018, at 02:35, Jay Potharaju  wrote:
> 
> Hi,
> I have a quick question about deletebyQuery vs deleteById. When using
> deleteByQuery, if query is id:123 is that same as deleteById in terms of
> performance.
> 
> 
> Thanks
> Jay



Re: Is it possible to index documents without storing their content?

2018-05-23 Thread Emir Arnautović
Hi Tom,
Yes it is possible - see field options: 
https://lucene.apache.org/solr/guide/6_6/defining-fields.html#DefiningFields-OptionalFieldTypeOverrideProperties
 
.
 There is stored option.
If you are asking about actual documents in original format, it is not even 
recommended to be stored in Solr.
If you are asking if someone will be able to reconstruct document from Solr 
even if it is not stored then answer is it depends on how you index, one might 
be able to partially reconstruct it.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 23 May 2018, at 06:46, Thomas Lustig  wrote:
> 
> dear community,
> 
> Is it possible to index documents (e.g. pdf, word,...)  for fulltextsearch
> without storing their content(payload) inside Solr server?
> 
> Thanking you in advance for your help
> 
> BR
> 
> Tom



Re: How to maintain fast query speed during heavy indexing?

2018-05-23 Thread Nguyen Nguyen
Great info!  Thanks, Erick!

Cheers,
Nguyen

On Tue, May 22, 2018 at 5:45 AM Erick Erickson 
wrote:

> There are two issues:
>
> 1> autowarming on the replicas
>
> 2> Until https://issues.apache.org/jira/browse/SOLR-11982 (Solr 7.4,
> unreleased), requests would go to the leaders along with the PULL and
> TLOG replicas. Since the leaders were busily indexing, the entire
> query would suffer speed-wise.
>
> So what I'd do is see if you can apply the patch there and adjust your
> autowarming. Solr 7.4 will be out in the not-too-distant future,
> perhaps over the summer. No real schedule has been agreed on though,
>
> Best,
> Erick
>
> On Mon, May 21, 2018 at 9:23 PM, Nguyen Nguyen
>  wrote:
> > Hello everyone,
> >
> > I'm running SolrCloud cluster of 5 nodes with 5 shards and 3 replicas per
> > shard.  I usually see spikes in query performance during high indexing
> > period. I would like to have stable query response time even during high
> > indexing period.  I recently upgraded to Solr 7.3 and running with 2 TLOG
> > replicas and 1 PULL replica.  Using a small maxWriteMBPerSec for
> > replication and only query PULL replicas during indexing period, I'm
> still
> > seeing long query time for some queries (although not as often as before
> > the change).
> >
> > My first question is 'Is it possible to control replication of non-leader
> > like in master/slave configuration (eg: disablepoll, fetchindex)?'.  This
> > way, I can disable replication on the followers until committing is
> > completed on the leaders while sending query requests to the followers
> (or
> > just PULL replica) only.  Then when data is committed on leaders, I would
> > send query requests back to only leaders and tell the followers to start
> to
> > fetch the newly updated index.
> >
> > If manual replication control isn't possible, I'm planning to have
> > duplicate collections and use an alias to switch between the two
> collection
> > at different times.  For example: while 'collection1' collection being
> > indexed, and alias 'search' would point to 'collection2' collection to
> > serve query request.  Once indexing is completed on 'collection1',
> 'search'
> > alias would now point to 'collection1', and 'collection2' will be updated
> > to be in sync with 'collection1'.  The cycle repeats for  next indexing
> > cycle.  My question for this method would be if there is any existing
> > method to sync one collection to another so that I don't have to send the
> > same update requests to the two collections.
> >
> > Also wondering if there are other better methods everyone is using?
> >
> > Thanks much!
> >
> > Cheers,
> >
> > -Nguyen
>


Re: Question regarding TLS version for solr

2018-05-23 Thread Anchal Sharma2
 Hi Christopher /Shawn ,

Thank you for replying .But ,I checked the java version solr using ,and it is 
already  version 1.8.

@Christopher ,can you let me know what steps you followed for TLS 
authentication on solr version 7.3.0.

Thanks & Regards,
-
Anchal Sharma
e-Pricer Development
ES Team
Mobile: +9871290248

-Christopher Schultz  wrote: -
To: solr-user@lucene.apache.org
From: Christopher Schultz 
Date: 05/17/2018 06:29PM
Subject: Re: Question regarding TLS version for solr

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 5/17/18 4:23 AM, Shawn Heisey wrote:
> On 5/17/2018 1:53 AM, Anchal Sharma2 wrote:
>> We are using solr version 5.3.0 and  have been  trying to enable 
>> security on our solr .We followed steps mentioned on site 
>> -https://lucene.apache.org/solr/guide/6_6/enabling-ssl.html .But
>> by default it picks ,TLS version  1.0,which is causing an issue
>> as our application uses TLSv 1.2.We tried using online resources
>> ,but could not find anything regarding TLS enablement for solr .
>> 
>> It will be a huge help if anyone can provide some suggestions as
>> to how we can enable TLS v 1.2 for solr.
> 
> The choice of ciphers and encryption protocols is mostly made by
> Java. The servlet container might influence it as well. The only
> servlet container that is supported since Solr 5.0 is the Jetty
> that is bundled in the Solr download.
> 
> TLS 1.2 was added in Java 7, and it became default in Java 8. If
> you can install the latest version of Java 8 and make sure that it
> has the policy files for unlimited crypto strength installed,
> support for TLS 1.2 might happen automatically.

There is no "default" TLS version for either the client or the server:
the two endpoints always negotiate the highest mutual version they
both support. The key agreement, authentication, and cipher suites are
the items that are negotiated during the handshake.

> Solr 5.3.0 is running a fairly old version of Jetty -- 9.2.11. 
> Information for 9.2.x versions is hard to find, so although I think
> it probably CAN do TLS 1.2 if the Java version supports it, I can't
> be absolutely sure.  You'll need to upgrade Solr to get an upgraded
> Jetty.

I would be shocked if Jetty ships with its own crypto libraries; it
should be using JSSE.

Anchal,

Java 1.7 or later is an absolute requirement if you want to use
TLSv1.2 (and you SHOULD want to use it).

I have recently spent a lot of time getting Solr 7.3.0 running with
TLS mutual-authentication, but I haven't worked with the 5.3.x line. I
can tell you have I've done things for my version, but they may need
some adjustments for yours.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlr9fKYACgkQHPApP6U8
pFh8lRAAmmvBMUSk35keW0OG0/SHpUy/ExJK69JGIKGwi96ddbz2yH8MG+OjjE3G
GNq/o5+EMT7tP/nW6XuPQou5UQvA2nlA9jsskox3A+CqOH7e6cbSxfxIkTqf9YDl
Kxr4J6mYjvTIjJAqLXGF+ghJfswS6RjZezDgo1PdSUox+gUOvmY61tlSjuYTaAYw
vH1i1DRzb8PkkR4ULePF48Y4r5+ZYz/4ZwSvnJTTkyl97KCw93rZ/kI5v9p3cCHK
Ycuwi/ZirO/VNf/9ruAOtgET3aojNfuNCX/A+vrSbJfiY7mXo05lYKN+eT80elQr
X8OKQaqHP6haF2aNPHrqXGtY2YoiGrdyaGtrXkUHFDfXgQeOmlk/eSVWemcSsatk
eEHSWW9NALMaalRAM7NuXQtgqq1badJhKysiJwSqFgcdgVKcSt8SsQ/09qTPjaNE
Ce1/EHdR6j1hM0Bnv5Hzf85cZjM7PfLmh7P8fnUD5d8eSbBpeWYVBDsS+fXp8WWv
FO5axbnSYIScOIz33i0UZyxpJgcsAkABLGghL6WWQSkfBf4ANgdTumS7K9Pn7Thz
Uq+lD9QPEPWJ91Fc0gnCWtDAEIRjOyLLbYzgI4ebV5qo41GO1WDDHfQZEcqA0Vod
+K8oAMD8nnwU+TprTFkjlQwbDnW1q1efTD6IrpEL5H7h6Xw2cgg=
=RpO6
-END PGP SIGNATURE-