RE: Transactional Behavior

2015-05-13 Thread Amr Ali
Hello Emir,

But this is not a transaction because if some of the bulk I need to add is 
committed; they will be searchable. In a transaction I need to insert a bulk of 
data (all bulk data will be searchable once) or roll it back according to some 
business scenarios.

--
Regards,
Amr Ali

City stars capital 8 - 3rd floor, Nasr city, Cairo, Egypt
Ext: 278



-Original Message-
From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com] 
Sent: Tuesday, May 12, 2015 10:46 PM
To: solr-user@lucene.apache.org
Subject: Re: Transactional Behavior

Hi Amr,
One option is to include transaction id in your documents and do delete in case 
of failed transaction. It is not cheap option - additional field if you don't 
have something to use to identify transaction. Assuming rollback will not 
happen to often deleting is not that big issue.

Thanks,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr  
Elasticsearch Support * http://sematext.com/



On 12.05.2015 22:37, Amr Ali wrote:
 Please check this

 https://lucene.apache.org/solr/4_1_0/solr-solrj/org/apache/solr/client/solrj/SolrServer.html#rollback()
 Note that this is not a true rollback as in databases. Content you have 
 previously added may have been committed due to autoCommit, buffer full, 
 other client performing a commit etc.

 It is not a real rollback if you have two threads T1 and T2 that are adding. 
 If T1 is adding 500 and T2 is adding 3 then T2 will commit its 3 document 
 PLUS the documents added by T1 (because T2 will finish add/commit before T2 
 due to the documents number). Solr transactions are server side only.


 --
 Regards,
 Amr Ali

 City stars capital 8 - 3rd floor, Nasr city, Cairo, Egypt
 Ext: 278



 -Original Message-
 From: Jack Krupansky [mailto:jack.krupan...@gmail.com]
 Sent: Tuesday, May 12, 2015 10:24 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Transactional Behavior

 Solr does have a rollback/ command, but it is an expert feature and not so 
 clear how it works in SolrCloud.

 See:
 https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers
 and
 https://wiki.apache.org/solr/UpdateXmlMessages#A.22rollback.22


 -- Jack Krupansky

 On Tue, May 12, 2015 at 12:58 PM, Amr Ali amr_...@siliconexpert.com wrote:

 Hello,

 I have a business case in which I need to be able for the rollback.
 When I tried add/commit I was not able to prevent other threads that
 write to a given Solr core from committing everything. I also tried
 indexwriter but Solr did not get changes until we restart it.


 --
 Regards,
 Amr Ali

 City stars capital 8 - 3rd floor, Nasr city, Cairo, Egypt
 Ext: 278




-- 
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/



Re: Unable to identify why faceting is taking so much time

2015-05-13 Thread Toke Eskildsen
On Wed, 2015-05-13 at 09:22 +, Abhishek Gupta wrote:

 Yes we have that many documents (exact count: 522664425), but I am not
 sure why that matters because what I understood from documentation is
 that fc will only work on the documents filtered by filter query and
 query.

What the documentation does not mention explicitly is the UnInversion
that takes place on first call. If you look in your Solr-log after
UnInverted, you will see how many milliseconds it takes at the
parameter time.

For example:
UnInverted multi-valued field {field=lsubject,memSize=216343445,
tindexSize=1037315,time=36620,phase1=35868,nTerms=4440544,bigTerms=1,
termInstances=55196823,uses=0}
took 36620 milliseconds to UnInvert 4.440.544 terms.

The number of references from documents to terms is 55.196.823. If we
assume you have approximately 1 reference/document, you will have half a
billion references or about 10 times my number. 10 times 37 seconds is
quite close to the 300 seconds you state below. Of course our numbers
cannot be compared directly, but it means that your measurements passed
the sanity check.

 For my query there are only 137 documents for fc to work on and to
 make FieldCache.

The mapping structure from your 522.664.425 documents to the values in
your field (also in the higher millions, as I understand it) is
independent of your search result.

After the structure has been created, it is used to look up the terms
used by your 137 hits.

 Also subsequent calls are not fast:
 First call time: 297572
 Second call time (made with in 2 sec): 249287

Are you indexing while searching? Each time the index is changed, the
UnInversion will have to be re-done. facet.method=fcs seems a better
choice with an often-changing index of your size.
 
- Toke Eskildsen, State and University Library, Denmark

 




Re: Unable to identify why faceting is taking so much time

2015-05-13 Thread Abhishek Gupta
Toke thanks for a quick reply. I am still confused, pls find the doubts I
have inline:

On Mon, May 11, 2015 at 1:22 PM Toke Eskildsen t...@statsbiblioteket.dk
wrote:

 On Mon, 2015-05-11 at 05:48 +, Abhishek Gupta wrote:
  According to this there are 137 records. Now I am faceting over these 137
  records with facet.method=fc. Ideally it should just iterate over these
 137
  records and sub up the facets.

 That is only the ideal method if you are not planning on issuing
 subsequent calls: facet.method=fc does more work up front to ensure that
 later calls are fast.


  http://localhost:9020/search/p1-umShard-1/select?q=*:*;
  fq=(msgType:38+AND+snCreatedTime:[2015-04-15T00:00:00Z%20TO%20*])
  facet.field=conversationIdfacet=trueindent=onwt=jsonrows=0
  facet.method=fcdebug=timing
  {
 
 - responseHeader:
 {
- status: 0,
- QTime: 395103
},

 [...]

  According to this faceting is taking 395036 time. Why its taking *395
  seconds* to just calculate facets of 137 records?

 6½ minute is a long time, even for first call. Do you have tens to
 hundreds of millions of documents in your index? Or do you have a
 similiar amount of unique values in your facet?


Yes we have that many documents (exact count: 522664425), but I am not sure
why that matters because what I understood from documentation
https://wiki.apache.org/solr/SimpleFacetParameters#facet.method is that
*fc* will only work on the documents filtered by filter query and query.
For my query there are only 137 documents for fc to work on and to make
*FieldCache*. But seeing the faceting result it seems that faceting is
being applied on all the documents which is not according to
documentation *The
facet counts are calculated by iterating over documents that match the
query and summing the terms that appear in each document*.  I am not able
to understand why fc is calculating facets over all the documents?

Just for your information the cardinality of the field(conversationId) on
which I am faceting is very high but the possible values for this field
matching my query and filter query is about 100 only.


 Either way, subsequent faceting calls should be much faster and a switch
 to DocValues should lower your first-call time significantly.


Also subsequent calls are not fast:
First call time: 297572
Second call time (made with in 2 sec): 249287

Yeah I agree docValues will reduce the time.


 Toke Eskildsen, State and University Library, Denmark





Re: How is the most relevant document of each group chosen when group.truncate is used?

2015-05-13 Thread Andrii Berezhynskyi
Ok. Figured out it myself. Research has shown that when group.truncate (or
collapsing query) is used then only head of the group is picked. That's why
results are different. However group.facet gives facet results that I would
want. The only thing that group.facet is very slow comparing to collapsing
query.

On Tue, May 12, 2015 at 6:15 PM, Andrii Berezhynskyi 
andrii.berezhyns...@home24.de wrote:

 Hi all,

 When I use group.truncate and filtering I'm getting strange faceting
 results. If I use just grouping without filtering:


 group=truegroup.field=parent_skugroup.ngroups=truegroup.truncate=truefacet=truefacet.field=color,

  then I get:

 facet_fields: { color: [ white, 19742,

 19742 white items.

 However if I filter by white items:


 group=truegroup.field=parent_skugroup.ngroups=truegroup.truncate=truefacet=truefacet.field=colorfq=color:white,


 I'm getting 20543 items. The same happens when I use collapse query parser
 instead of grouping.

 I would expect those two numbers to be equal. So I assume the most
 relevant document of each group is chosen somehow differently when
 filtering is used. How can this be explained?

 Best regards,
 Andrii



Re: Beginner problems with solr.ICUCollationField

2015-05-13 Thread Björn Keil


Thanks you for your help. That was only part of the problem, though.You also 
need ${solr.install.dir}/dist/solr-analysis-extras-X.jar
where X is the version.

The other two libraries are dependencies, but the do not contain the actual 
ICUCollationField class. It might be helpful if that was mentioned in the 
respective spots in the documentation and README.txt file.


Re: Reading an index while it is being updated?

2015-05-13 Thread Shawn Heisey
On 5/13/2015 1:03 AM, Guy Thomas wrote:
 Up to now we’ve been using Lucene without Solr.
 
 The Lucene index is being updated and when the update is finished we
 notify a Hessian proxy service running on the web server that wants to
 read the index. When this proxy service is notified, the server knows it
 can read the updated index.
 
 Do we have the use a similar set-up when using Solr, that is:
 
 1. Create/update the index
 
 2. Notify the Solr client

In Solr, the Solr server has complete control of the Lucene index and
maintains the write lock at all times.

Generally you create or update the index via requests to Solr, through
the update handler.  As soon as you issue a commit with
openSearcher=true, and it completes, all clients can see the changes.
There is no need to do any kind of notification.  Commits may be fully
automated within the Solr configuration or they may be explicitly sent
by clients.

If you are creating the index in some other way, then you generally need
to reload the core.  Recently (5.x versions) at least one person has
been having trouble with loading a new index using RELOAD:

https://issues.apache.org/jira/browse/SOLR-7526

Thanks,
Shawn



Re: Beginner problems with solr.ICUCollationField

2015-05-13 Thread Shawn Heisey
On 5/13/2015 4:16 AM, Björn Keil wrote:
 Thanks you for your help. That was only part of the problem, though.You also 
 need ${solr.install.dir}/dist/solr-analysis-extras-X.jar
 where X is the version.
 
 The other two libraries are dependencies, but the do not contain the actual 
 ICUCollationField class. It might be helpful if that was mentioned in the 
 respective spots in the documentation and README.txt file.

I have not used that particular class.  I have used the ICU tokenizers
and filters, which are in the lucene jar.

The docs you quoted say this:

---
solr.ICUCollationField is included in the Solr analysis-extras contrib
see solr/contrib/analysis-extras/README.txt for instructions on which
jars you need to add to your SOLR_HOME/lib in order to use it.
---

That sounds to me like an indication that you need the solr analysis
extras jar, which has the lucene-analyzers and icu4j jars as additional
dependencies.  The referenced README probably should mention that the
required jar can be found in the dist/ folder of the binary download.

Thanks,
Shawn



Re: Wiki new user

2015-05-13 Thread Erik Hatcher
Sergio - what is your wiki username?   We can add you as an editor once you 
provide the username. 

Erik

 On May 13, 2015, at 10:33, Sergio Velasco ser...@mitula.com wrote:
 
 Hi,
  
 I would like to become a member of the Solr wiki. I have requested it to the 
 Solr user lists and they have send me to this list to request access to the 
 wiki.
  
 I am the Mitula CTO and we have been using Solr from the very beginning, 6 
 years ago. I think I can contribute a lot to this wiki.
  
 Thank you.
  
  
  
  
 
 www.mitula.com
  Sergio Velasco   |   Dpto. de Desarrollo
  Contáctame:
  ser...@mitula.com   |   Tfno. +34 917 08 21 47   |   Fax +34 917 08 21 56
  Síguenos en:
  Facebook.com/mitula.es.latam   |   @mitula_es   |   Linkedin.com/mitula   |  
  Blog   
 
 El contenido de este correo electrónico puede ser confidencial o 
 privilegiado. Si ha recibido este mensaje por error, por favor, no lo reenvíe 
 a nadie. Le rogamos que borre todas las copias, mensajes adjuntos y por favor 
 comuníquenos que lo ha recibido la persona equivocada. Gracias
 Antes de imprimir este mensaje, asegúrese de que es necesario. El medio 
 ambiente está en nuestra mano
  
  
 
 
   
 El software de antivirus Avast ha analizado este correo electrónico en busca 
 de virus. 
 www.avast.com
 
 


Re: Is copyField a must?

2015-05-13 Thread Alessandro Benedetti
I think with a proper configuration of the Edismax query parser and a
proper management of field boosting,

it's much more precise to use the list of interesting fields than a big
blob copy field.

Cheers

2015-05-13 15:54 GMT+01:00 Steven White swhite4...@gmail.com:

 Hi Everyone,

 In my search need, I will always be using df to specify the list of fields
 a search will be done in (the list of fields is group based which my
 application defines).

 Given this, is there any reason to use copyField to copy the data into a
 single master-field to search against?  Am I losing any thing by not using
 copyField?

 Thanks,

 Steve




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Is copyField a must?

2015-05-13 Thread Steven White
Hi Everyone,

In my search need, I will always be using df to specify the list of fields
a search will be done in (the list of fields is group based which my
application defines).

Given this, is there any reason to use copyField to copy the data into a
single master-field to search against?  Am I losing any thing by not using
copyField?

Thanks,

Steve


Wiki new user

2015-05-13 Thread Sergio Velasco
Hi,



I would like to become a member of the Solr wiki. I have requested it to the
Solr user lists and they have send me to this list to request access to the
wiki.



I am the Mitula CTO and we have been using Solr from the very beginning, 6
years ago. I think I can contribute a lot to this wiki.



Thank you.











http://img.mitula.net/www/mitula/images/firmas/logo_espanol.jpg



 http://www.mitula.com/ www.mitula.com



 Sergio Velasco   |   Dpto. de Desarrollo


 Contáctame:
  mailto:ser...@mitula.com ser...@mitula.com   |   Tfno. +34 917 08 21 47
|   Fax +34 917 08 21 56


 Síguenos en:
  http://www.facebook.com/mitula.es.latam Facebook.com/mitula.es.latam   |
http://twitter.com/mitula_es @mitula_es   |
http://www.linkedin.com/company/mitula.com Linkedin.com/mitula   |
http://blog-es.mitula.com/ Blog




http://img.mitula.net/www/mitula/images/firmas/mitula-paises.jpg


El contenido de este correo electrónico puede ser confidencial o
privilegiado. Si ha recibido este mensaje por error, por favor, no lo
reenvíe a nadie. Le rogamos que borre todas las copias, mensajes adjuntos y
por favor comuníquenos que lo ha recibido la persona equivocada. Gracias 


Antes de imprimir este mensaje, asegúrese de que es necesario. El medio
ambiente está en nuestra mano







---
El software de antivirus Avast ha analizado este correo electrónico en busca de 
virus.
http://www.avast.com


Re: Setting system property

2015-05-13 Thread Erik Hatcher
Clemens -

For this particular property, it is only accessed as a system property 
directly, so it must be set on the JVM startup and cannot be set any other way.

Erik

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com http://www.lucidworks.com/




 On May 13, 2015, at 3:49 AM, Clemens Wyss DEV clemens...@mysign.ch wrote:
 
 I'd like to make use of solr.allow.unsafe.resourceloading=true.
 Is the commandline -D solr.allow.unsafe.resourceloading=true the only way 
 to inject/set this property or can it be done (e.g.) in solr.xml ?
 
 Thx
 Clemens



Re: Is copyField a must?

2015-05-13 Thread Erik Hatcher
No, there is no requirement for having a copyField of any kind. 


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com http://www.lucidworks.com/




 On May 13, 2015, at 1:50 PM, Steven White swhite4...@gmail.com wrote:
 
 I don't have a need for Edismax.  That said, do I still have a need for
 copyField into a default-field?
 
 Steve
 
 On Wed, May 13, 2015 at 11:13 AM, Alessandro Benedetti 
 benedetti.ale...@gmail.com wrote:
 
 I think with a proper configuration of the Edismax query parser and a
 proper management of field boosting,
 
 it's much more precise to use the list of interesting fields than a big
 blob copy field.
 
 Cheers
 
 2015-05-13 15:54 GMT+01:00 Steven White swhite4...@gmail.com:
 
 Hi Everyone,
 
 In my search need, I will always be using df to specify the list of
 fields
 a search will be done in (the list of fields is group based which my
 application defines).
 
 Given this, is there any reason to use copyField to copy the data into a
 single master-field to search against?  Am I losing any thing by not
 using
 copyField?
 
 Thanks,
 
 Steve
 
 
 
 
 --
 --
 
 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti
 
 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?
 
 William Blake - Songs of Experience -1794 England
 



Re: Is copyField a must?

2015-05-13 Thread Steven White
Hmm, looks like I'm missing something here as I cannot get this to work.

My need is as follows.  From my application, I need to issue a generic
search which is limited to a set of fields based on the group the user
belongs to.  For example, user-1 is in group-A which has default fields of
F1, F2, F3.  User-2 is in group-B which has default fields of F2, F3, F5,
etc.  What I tried to do is create multiple request handlers solrconfig.xml
like so:

  requestHandler name=/select_group_a class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows20/int
   str name=dfF1,F2,F3/str
   str name=flid,score/str
 /lst
  /requestHandler

And

  requestHandler name=/select_group_a class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows20/int
   str name=dfF2,F3,F5/str
   str name=flid,score/str
 /lst
  /requestHandler

However, this isn't working because whatever is in df is being treated as
single field name.

How can I achieve my need?

Note, I want to avoid a URL base solution (sending the list of fields over
HTTP) because the list of fields could be large (1000+) and thus I will
exceed GET limit quickly (does Solr support POST for searching, if so, than
I can use URL base solution?)

Thanks in advance.

Steve

On Wed, May 13, 2015 at 2:29 PM, Erik Hatcher erik.hatc...@gmail.com
wrote:

 No, there is no requirement for having a copyField of any kind.


 —
 Erik Hatcher, Senior Solutions Architect
 http://www.lucidworks.com http://www.lucidworks.com/




  On May 13, 2015, at 1:50 PM, Steven White swhite4...@gmail.com wrote:
 
  I don't have a need for Edismax.  That said, do I still have a need for
  copyField into a default-field?
 
  Steve
 
  On Wed, May 13, 2015 at 11:13 AM, Alessandro Benedetti 
  benedetti.ale...@gmail.com wrote:
 
  I think with a proper configuration of the Edismax query parser and a
  proper management of field boosting,
 
  it's much more precise to use the list of interesting fields than a big
  blob copy field.
 
  Cheers
 
  2015-05-13 15:54 GMT+01:00 Steven White swhite4...@gmail.com:
 
  Hi Everyone,
 
  In my search need, I will always be using df to specify the list of
  fields
  a search will be done in (the list of fields is group based which my
  application defines).
 
  Given this, is there any reason to use copyField to copy the data into
 a
  single master-field to search against?  Am I losing any thing by not
  using
  copyField?
 
  Thanks,
 
  Steve
 
 
 
 
  --
  --
 
  Benedetti Alessandro
  Visiting card : http://about.me/alessandro_benedetti
 
  Tyger, tyger burning bright
  In the forests of the night,
  What immortal hand or eye
  Could frame thy fearful symmetry?
 
  William Blake - Songs of Experience -1794 England
 




Re: Is copyField a must?

2015-05-13 Thread Steven White
Looks like I got it working (however I still have an outstanding issue, see
end of my email).

Here is what I have done:

1) In my solrconfig.xml, I created:

  requestHandler name=/select_group_a class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows20/int
   str name=defTypeedismax/str
   str name=qfF1 F2 F3/str
   str name=fltype,id,score/str
   str name=wtxml/str
   str name=indenttrue/str
 /lst
  /requestHandler

And

  requestHandler name=/select_group_b class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows20/int
   str name=defTypeedismax/str
   str name=qfF2 F3 F5/str
   str name=fltype,id,score/str
   str name=wtxml/str
   str name=indenttrue/str
 /lst
  /requestHandler

2) My search URL is now:
http://localhost:8983/solr/db/select_group_a?q.op=ORq=search string  and
 http://localhost:8983/solr/db/select_group_b?q.op=ORq=search string

This all works, BUT when I use q=type:(PDF OR DOC OR TXT) so that I can
further narrow down search to within, for example, file-extensions, this
doesn't seem to work.  Is this because using qf with edismax ends doesn't
parse the string the same way as the default defType?

Steve

On Wed, May 13, 2015 at 6:11 PM, Steven White swhite4...@gmail.com wrote:

 Thanks for the quick reply Shawn.  I will dig into dismax and edismax and
 come back with questions if I cannot figure it out.  I avoided them
 thinking they are for faceting use only, my need is generic search (all the
 features I get via solr.SearchHandler) but limited to a set of fields.

 Steve

 On Wed, May 13, 2015 at 5:58 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 5/13/2015 3:36 PM, Steven White wrote:
requestHandler name=/select_group_a class=solr.SearchHandler
   lst name=defaults
 str name=echoParamsexplicit/str
 int name=rows20/int
 str name=dfF2,F3,F5/str
 str name=flid,score/str
   /lst
/requestHandler
 
  However, this isn't working because whatever is in df is being
 treated as
  single field name.

 The df parameter is shorthand for default field.  It is, by
 definition, a single field -- it is the field searched by default when
 you don't specify a field directly in a query handled by the default
 (lucene) query parser.  The default parser doesn't search multiple
 fields for your search terms.

 What you're going to want to do here is use a different query parser --
 dismax or edismax -- and put your field list in the qf field, separated
 by spaces rather than commas.  The qf parameter means query fields and
 is specific to the dismax/edismax parsers.  Depending on your exact
 needs, you may also want to define the pf parameter as well (phrase
 fields).

 There is a LOT of detail on these parsers, so I'll give you the
 documentation links rather than try and explain everything:

 https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser

 https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser

 Thanks,
 Shawn





Re: Is copyField a must?

2015-05-13 Thread Shawn Heisey
On 5/13/2015 3:36 PM, Steven White wrote:
   requestHandler name=/select_group_a class=solr.SearchHandler
  lst name=defaults
str name=echoParamsexplicit/str
int name=rows20/int
str name=dfF2,F3,F5/str
str name=flid,score/str
  /lst
   /requestHandler

 However, this isn't working because whatever is in df is being treated as
 single field name.

The df parameter is shorthand for default field.  It is, by
definition, a single field -- it is the field searched by default when
you don't specify a field directly in a query handled by the default
(lucene) query parser.  The default parser doesn't search multiple
fields for your search terms.

What you're going to want to do here is use a different query parser --
dismax or edismax -- and put your field list in the qf field, separated
by spaces rather than commas.  The qf parameter means query fields and
is specific to the dismax/edismax parsers.  Depending on your exact
needs, you may also want to define the pf parameter as well (phrase fields).

There is a LOT of detail on these parsers, so I'll give you the
documentation links rather than try and explain everything:

https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser
https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser

Thanks,
Shawn



Re: Is copyField a must?

2015-05-13 Thread Erick Erickson
Two things:


1 There's really no need to define two request handlers here. The
defaults section is exactly that, defaults which can be overridden
by the URL. So rather than have select_group_b, use something like

... solr/collection/select_group_a?q=whateverqf=F2,F3,F5

2 When you add a field qualifier to an edismax query, it overrides
all the qf definitions. Consider using an fq (filter query) clause by
adding  fq=type:(PDF OR DOC OR TXT)


Best,
Erick

On Wed, May 13, 2015 at 6:15 PM, Steven White swhite4...@gmail.com wrote:
 Looks like I got it working (however I still have an outstanding issue, see
 end of my email).

 Here is what I have done:

 1) In my solrconfig.xml, I created:

   requestHandler name=/select_group_a class=solr.SearchHandler
  lst name=defaults
str name=echoParamsexplicit/str
int name=rows20/int
str name=defTypeedismax/str
str name=qfF1 F2 F3/str
str name=fltype,id,score/str
str name=wtxml/str
str name=indenttrue/str
  /lst
   /requestHandler

 And

   requestHandler name=/select_group_b class=solr.SearchHandler
  lst name=defaults
str name=echoParamsexplicit/str
int name=rows20/int
str name=defTypeedismax/str
str name=qfF2 F3 F5/str
str name=fltype,id,score/str
str name=wtxml/str
str name=indenttrue/str
  /lst
   /requestHandler

 2) My search URL is now:
 http://localhost:8983/solr/db/select_group_a?q.op=ORq=search string  and
  http://localhost:8983/solr/db/select_group_b?q.op=ORq=search string

 This all works, BUT when I use q=type:(PDF OR DOC OR TXT) so that I can
 further narrow down search to within, for example, file-extensions, this
 doesn't seem to work.  Is this because using qf with edismax ends doesn't
 parse the string the same way as the default defType?

 Steve

 On Wed, May 13, 2015 at 6:11 PM, Steven White swhite4...@gmail.com wrote:

 Thanks for the quick reply Shawn.  I will dig into dismax and edismax and
 come back with questions if I cannot figure it out.  I avoided them
 thinking they are for faceting use only, my need is generic search (all the
 features I get via solr.SearchHandler) but limited to a set of fields.

 Steve

 On Wed, May 13, 2015 at 5:58 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 5/13/2015 3:36 PM, Steven White wrote:
requestHandler name=/select_group_a class=solr.SearchHandler
   lst name=defaults
 str name=echoParamsexplicit/str
 int name=rows20/int
 str name=dfF2,F3,F5/str
 str name=flid,score/str
   /lst
/requestHandler
 
  However, this isn't working because whatever is in df is being
 treated as
  single field name.

 The df parameter is shorthand for default field.  It is, by
 definition, a single field -- it is the field searched by default when
 you don't specify a field directly in a query handled by the default
 (lucene) query parser.  The default parser doesn't search multiple
 fields for your search terms.

 What you're going to want to do here is use a different query parser --
 dismax or edismax -- and put your field list in the qf field, separated
 by spaces rather than commas.  The qf parameter means query fields and
 is specific to the dismax/edismax parsers.  Depending on your exact
 needs, you may also want to define the pf parameter as well (phrase
 fields).

 There is a LOT of detail on these parsers, so I'll give you the
 documentation links rather than try and explain everything:

 https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser

 https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser

 Thanks,
 Shawn





Re: Is copyField a must?

2015-05-13 Thread Shawn Heisey
On 5/13/2015 3:36 PM, Steven White wrote:
 Note, I want to avoid a URL base solution (sending the list of fields over
 HTTP) because the list of fields could be large (1000+) and thus I will
 exceed GET limit quickly (does Solr support POST for searching, if so, than
 I can use URL base solution?)

Solr does indeed support a query sent as the body in a POST request. 
I'm not completely positive, but I think you'd use the same format as
you put on the URL:

q=foorows=1fq=bar

If anyone knows for sure what should be in the POST body, please let me
and Steven know.  In particular, should the content be URL escaped, as
might be required for a GET?

Thanks,
Shawn



Re: Is copyField a must?

2015-05-13 Thread Steven White
Thanks for the quick reply Shawn.  I will dig into dismax and edismax and
come back with questions if I cannot figure it out.  I avoided them
thinking they are for faceting use only, my need is generic search (all the
features I get via solr.SearchHandler) but limited to a set of fields.

Steve

On Wed, May 13, 2015 at 5:58 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 5/13/2015 3:36 PM, Steven White wrote:
requestHandler name=/select_group_a class=solr.SearchHandler
   lst name=defaults
 str name=echoParamsexplicit/str
 int name=rows20/int
 str name=dfF2,F3,F5/str
 str name=flid,score/str
   /lst
/requestHandler
 
  However, this isn't working because whatever is in df is being treated
 as
  single field name.

 The df parameter is shorthand for default field.  It is, by
 definition, a single field -- it is the field searched by default when
 you don't specify a field directly in a query handled by the default
 (lucene) query parser.  The default parser doesn't search multiple
 fields for your search terms.

 What you're going to want to do here is use a different query parser --
 dismax or edismax -- and put your field list in the qf field, separated
 by spaces rather than commas.  The qf parameter means query fields and
 is specific to the dismax/edismax parsers.  Depending on your exact
 needs, you may also want to define the pf parameter as well (phrase
 fields).

 There is a LOT of detail on these parsers, so I'll give you the
 documentation links rather than try and explain everything:

 https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser

 https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser

 Thanks,
 Shawn




Re: Is copyField a must?

2015-05-13 Thread Steven White
Hi Erick,

The fq did the trick.  This basically solved my need, and I can call it a
day (now that it is late Friday)

The reason why I'm using two (and there will be move) handlers vs qf in
the URL, is due to the GET limit.  The list of fields will be large
(nearing 1000) and each field name can be long (up to 40 characters).  If
Solr will accept POST, than I can pass the list via qf and call it a day
(I might have to worry a bit about the larger than normal network traffic
per search request, but I can deal with that).  So, do you know or does any
one know if Solr supports POST request?  If so, what's the body format that
I need to send (Shawn asked this question too)?

Thanks

Steve


On Wed, May 13, 2015 at 8:13 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Two things:


 1 There's really no need to define two request handlers here. The
 defaults section is exactly that, defaults which can be overridden
 by the URL. So rather than have select_group_b, use something like

 ... solr/collection/select_group_a?q=whateverqf=F2,F3,F5

 2 When you add a field qualifier to an edismax query, it overrides
 all the qf definitions. Consider using an fq (filter query) clause by
 adding  fq=type:(PDF OR DOC OR TXT)


 Best,
 Erick

 On Wed, May 13, 2015 at 6:15 PM, Steven White swhite4...@gmail.com
 wrote:
  Looks like I got it working (however I still have an outstanding issue,
 see
  end of my email).
 
  Here is what I have done:
 
  1) In my solrconfig.xml, I created:
 
requestHandler name=/select_group_a class=solr.SearchHandler
   lst name=defaults
 str name=echoParamsexplicit/str
 int name=rows20/int
 str name=defTypeedismax/str
 str name=qfF1 F2 F3/str
 str name=fltype,id,score/str
 str name=wtxml/str
 str name=indenttrue/str
   /lst
/requestHandler
 
  And
 
requestHandler name=/select_group_b class=solr.SearchHandler
   lst name=defaults
 str name=echoParamsexplicit/str
 int name=rows20/int
 str name=defTypeedismax/str
 str name=qfF2 F3 F5/str
 str name=fltype,id,score/str
 str name=wtxml/str
 str name=indenttrue/str
   /lst
/requestHandler
 
  2) My search URL is now:
  http://localhost:8983/solr/db/select_group_a?q.op=ORq=search string
 and
   http://localhost:8983/solr/db/select_group_b?q.op=ORq=search string
 
  This all works, BUT when I use q=type:(PDF OR DOC OR TXT) so that I can
  further narrow down search to within, for example, file-extensions, this
  doesn't seem to work.  Is this because using qf with edismax ends
 doesn't
  parse the string the same way as the default defType?
 
  Steve
 
  On Wed, May 13, 2015 at 6:11 PM, Steven White swhite4...@gmail.com
 wrote:
 
  Thanks for the quick reply Shawn.  I will dig into dismax and edismax
 and
  come back with questions if I cannot figure it out.  I avoided them
  thinking they are for faceting use only, my need is generic search (all
 the
  features I get via solr.SearchHandler) but limited to a set of fields.
 
  Steve
 
  On Wed, May 13, 2015 at 5:58 PM, Shawn Heisey apa...@elyograg.org
 wrote:
 
  On 5/13/2015 3:36 PM, Steven White wrote:
 requestHandler name=/select_group_a class=solr.SearchHandler
lst name=defaults
  str name=echoParamsexplicit/str
  int name=rows20/int
  str name=dfF2,F3,F5/str
  str name=flid,score/str
/lst
 /requestHandler
  
   However, this isn't working because whatever is in df is being
  treated as
   single field name.
 
  The df parameter is shorthand for default field.  It is, by
  definition, a single field -- it is the field searched by default when
  you don't specify a field directly in a query handled by the default
  (lucene) query parser.  The default parser doesn't search multiple
  fields for your search terms.
 
  What you're going to want to do here is use a different query parser --
  dismax or edismax -- and put your field list in the qf field, separated
  by spaces rather than commas.  The qf parameter means query fields
 and
  is specific to the dismax/edismax parsers.  Depending on your exact
  needs, you may also want to define the pf parameter as well (phrase
  fields).
 
  There is a LOT of detail on these parsers, so I'll give you the
  documentation links rather than try and explain everything:
 
 
 https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser
 
 
 https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser
 
  Thanks,
  Shawn
 
 
 



Re: QQ on segments during indexing.

2015-05-13 Thread Manohar Sripada
Thanks Shawn, In my case, the document size is small. So, for sure it will
reach 50k docs first than 100MB buffer size.

Thanks,
Manohar

On Thu, May 14, 2015 at 10:49 AM, Shawn Heisey apa...@elyograg.org wrote:

 On 5/13/2015 10:01 PM, Manohar Sripada wrote:
  I have a question on segment creation on disk during indexing.
 
  In my solrconfig.xml, I have commented maxBufferedDocs and
 ramBufferSizeMB.
  I am controlling the flushing of data to disk using autoCommit's maxDocs
  and maxTime.
 
  Here, maxDocs is set to 5 and will be hit first, so that commit of
 data
  to disk happens every 5 docs. So, my question here is will it create
 a
  new segment when this commit happens?
 
  In the wiki
  https://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor, it is
  mentioned that a new segment creation is determined based on
  maxBufferedDocs parameter. As I have commented this parameter, how a new
  segment creation is determined?

 In recent Solr versions, the ramBufferSizeMB setting defaults to 100 and
 maxBufferedDocs defaults to -1.  A setting of -1 on maxBufferedDocs
 means that the number of docs doesn't matter, it will use
 ramBufferSizeMB unless a commit happens before the buffer fills up.  A
 commit does trigger a segment flush, although if it's a soft commit, the
 situation might be more complicated.

 Unless the docs are very small, I would expect a 100MB buffer to fill up
 before you reach 5 docs.  It's been a while since I watched index
 segments get created, but if I remember correctly, the amount of space
 required in the RAM buffer to index documents is more than the size of
 the segment that eventually gets flushed to disk.

 Thanks,
 Shawn




Re: QQ on segments during indexing.

2015-05-13 Thread Shawn Heisey
On 5/13/2015 10:01 PM, Manohar Sripada wrote:
 I have a question on segment creation on disk during indexing.
 
 In my solrconfig.xml, I have commented maxBufferedDocs and ramBufferSizeMB.
 I am controlling the flushing of data to disk using autoCommit's maxDocs
 and maxTime.
 
 Here, maxDocs is set to 5 and will be hit first, so that commit of data
 to disk happens every 5 docs. So, my question here is will it create a
 new segment when this commit happens?
 
 In the wiki
 https://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor, it is
 mentioned that a new segment creation is determined based on
 maxBufferedDocs parameter. As I have commented this parameter, how a new
 segment creation is determined?

In recent Solr versions, the ramBufferSizeMB setting defaults to 100 and
maxBufferedDocs defaults to -1.  A setting of -1 on maxBufferedDocs
means that the number of docs doesn't matter, it will use
ramBufferSizeMB unless a commit happens before the buffer fills up.  A
commit does trigger a segment flush, although if it's a soft commit, the
situation might be more complicated.

Unless the docs are very small, I would expect a 100MB buffer to fill up
before you reach 5 docs.  It's been a while since I watched index
segments get created, but if I remember correctly, the amount of space
required in the RAM buffer to index documents is more than the size of
the segment that eventually gets flushed to disk.

Thanks,
Shawn



Confusion about zkcli.sh and solr.war

2015-05-13 Thread Jim . Musil
I'm trying to use zkcli.sh to upload configurations to zookeeper and solr 5.1.

It's throwing an error because it references webapps/solr.war which no longer 
exists.

Do I have to build my own solr.war in order to use zkcli.sh?

Please forgive me if I'm missing something here.

Jim Musil


Re: Is copyField a must?

2015-05-13 Thread Steven White
I don't have a need for Edismax.  That said, do I still have a need for
copyField into a default-field?

Steve

On Wed, May 13, 2015 at 11:13 AM, Alessandro Benedetti 
benedetti.ale...@gmail.com wrote:

 I think with a proper configuration of the Edismax query parser and a
 proper management of field boosting,

 it's much more precise to use the list of interesting fields than a big
 blob copy field.

 Cheers

 2015-05-13 15:54 GMT+01:00 Steven White swhite4...@gmail.com:

  Hi Everyone,
 
  In my search need, I will always be using df to specify the list of
 fields
  a search will be done in (the list of fields is group based which my
  application defines).
 
  Given this, is there any reason to use copyField to copy the data into a
  single master-field to search against?  Am I losing any thing by not
 using
  copyField?
 
  Thanks,
 
  Steve
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England



QQ on segments during indexing.

2015-05-13 Thread Manohar Sripada
I have a question on segment creation on disk during indexing.

In my solrconfig.xml, I have commented maxBufferedDocs and ramBufferSizeMB.
I am controlling the flushing of data to disk using autoCommit's maxDocs
and maxTime.

Here, maxDocs is set to 5 and will be hit first, so that commit of data
to disk happens every 5 docs. So, my question here is will it create a
new segment when this commit happens?

In the wiki
https://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor, it is
mentioned that a new segment creation is determined based on
maxBufferedDocs parameter. As I have commented this parameter, how a new
segment creation is determined?

Thanks,
Manohar


Re: utility methods to get field values from index

2015-05-13 Thread Shalin Shekhar Mangar
In Solr 5.0+ you can use Lucene's DocValues API to read the indexed
information. This is a unifying API over field cache and doc values so it
can be used on all indexed fields.

e.g. for single-valued field use
searcher.getLeafReader().getSortedDocValues(fieldName);
and for multi-valued fields
use searcher.getLeafReader().getSortedSetDocValues(fieldName);

On Wed, May 13, 2015 at 11:11 AM, Parvesh Garg parv...@zettata.com wrote:

 Hi All,

 Was wondering if there is any class in Solr that provides utility methods
 to fetch indexed field values for documents using docId. Something simple
 like

 getMultiLong(String field, int docId)

 getLong(String field, int docId)

 We have written a solr component to return group level stats like avg
 score, max score etc over a large number of documents (say 5000+) against a
 query executed using edismax. Need to get the group id fields value to do
 that, this is a single valued long field.

 This component also looks at one more field that is a multivalued long
 field for each document and compute a score based on frequency + document
 score for each value.

 Currently we are using stored fields and was wondering if this approach
 would be faster.

 Apologies if this is too much to ask for.

 Parvesh Garg,




-- 
Regards,
Shalin Shekhar Mangar.


Reading an index while it is being updated?

2015-05-13 Thread Guy Thomas
Up to now we've been using Lucene without Solr.

The Lucene index is being updated and when the update is finished we notify a 
Hessian proxy service running on the web server that wants to read the index. 
When this proxy service is notified, the server knows it can read the updated 
index.

Do we have the use a similar set-up when using Solr, that is:

1. Create/update the index

2. Notify the Solr client



[cid:image001.jpg@01D08D5B.0112E420]

  Guy Thomas
  Analist-Programmeur

  Provincie Vlaams-Brabant
  Dienst Projecten en Ontwikkelingen
  Provincieplein 1 - 3010 Leuven
  Tel: 016-26 79 45
  www.vlaamsbrabant.behttp://www.vlaamsbrabant.be/




Aan dit bericht kunnen geen rechten worden ontleend. Alle berichten naar dit
professioneel e-mailadres kunnen door de werkgever gelezen worden. In het kader
van de vervulling van onze taak van openbaar belang nemen wij uw relevante
persoonlijke gegevens op in onze bestanden. U kunt deze inzien en verbeteren
conform de Wet Verwerking Persoonsgegevens van 8 december 1992.

Het ondernemingsnummer van het provinciebestuur is 0253.973.219



Re: utility methods to get field values from index

2015-05-13 Thread Parvesh Garg
Hi Shalin,

Thanks for your answer. Forgot to mention that we are using 4.10 solr.
Also, I tried using docValues and the performance was worse than getting it
from stored values. Time taken to retrieve data for 2000 docs  for 2 fields
was 120 ms vs 230 ms previously and for docValues respectively.

May be there is something wrong in my code.

The code used for retrieving docValues is:

  *public* *static* *long* getSingleLong(*SolrIndexSearcher* searcher, *int*
docId,

  *String* field) *throws* IOException {


*NumericDocValues* sdv = *DocValues*.*getNumeric*
(searcher.getAtomicReader(),

field);


*return* sdv.get(docId);

  }

and

  *public* *static* *ListLong* getMultiLong(*SolrIndexSearcher* searcher,

  *int* docId, *String* field) *throws* IOException {

*SortedSetDocValues* ssdv = *DocValues*.*getSortedSet*(

searcher.getAtomicReader(), field);


ssdv.setDocument(docId);

*long* l;

*ListLong* retval = *new* *ArrayListLong*(40);


*while* ((l = ssdv.nextOrd()) != *SortedSetDocValues*.*NO_MORE_ORDS*) {

  *BytesRef* bytes = ssdv.lookupOrd(l);

  retval.add(*NumericUtils*.*prefixCodedToLong*(bytes));

}


*return* retval;

  }



Parvesh Garg

On Wed, May 13, 2015 at 11:36 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 In Solr 5.0+ you can use Lucene's DocValues API to read the indexed
 information. This is a unifying API over field cache and doc values so it
 can be used on all indexed fields.

 e.g. for single-valued field use
 searcher.getLeafReader().getSortedDocValues(fieldName);
 and for multi-valued fields
 use searcher.getLeafReader().getSortedSetDocValues(fieldName);

 On Wed, May 13, 2015 at 11:11 AM, Parvesh Garg parv...@zettata.com
 wrote:

  Hi All,
 
  Was wondering if there is any class in Solr that provides utility methods
  to fetch indexed field values for documents using docId. Something simple
  like
 
  getMultiLong(String field, int docId)
 
  getLong(String field, int docId)
 
  We have written a solr component to return group level stats like avg
  score, max score etc over a large number of documents (say 5000+)
 against a
  query executed using edismax. Need to get the group id fields value to do
  that, this is a single valued long field.
 
  This component also looks at one more field that is a multivalued long
  field for each document and compute a score based on frequency + document
  score for each value.
 
  Currently we are using stored fields and was wondering if this approach
  would be faster.
 
  Apologies if this is too much to ask for.
 
  Parvesh Garg,
 



 --
 Regards,
 Shalin Shekhar Mangar.



Setting system property

2015-05-13 Thread Clemens Wyss DEV
I'd like to make use of solr.allow.unsafe.resourceloading=true.
Is the commandline -D solr.allow.unsafe.resourceloading=true the only way to 
inject/set this property or can it be done (e.g.) in solr.xml ?

Thx
Clemens


Upgrading from Solr 5.0.0 to Solr 5.1.0

2015-05-13 Thread Zheng Lin Edwin Yeo
Hi,

As this is my first time planning to do an upgrade between different Solr
version, would like to check, how should we go about doing the upgrade so
that I can start up my Solr 5.1.0 with my config and index built on Solr
5.0.0?

Like what files do I need to copy and what are the things to take note of?

I'm also using external ZooKeeper 3.4.6, so my config files are loaded at
the ZooKeeper.

Regards,
Edwin


Block Join Query update documents, how to do it correctly?

2015-05-13 Thread Tom Devel
I am using the Block Join Query Parser with success, following the example
on:

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers

As this example shows, each parent document can have a number of documents
embedded, and each document, be it a parent or a child, has its own unique
identifier.

Now I would like to update some of the parent documents, and read that
there are horror stories with duplicate documents, scrambled data etc., the
two prominent JIRA entries for this are:

https://issues.apache.org/jira/browse/SOLR-6700
https://issues.apache.org/jira/browse/SOLR-6096

My question is, how do you usually update such documents, for example to
update a value for the parent or a value for one of its children?

I tried to repost the whole modified document (the parent and ALL of its
children as one file), and it seems to work on a small toy example, but of
course I cannot be sure for a larger instance with thousands of documents,
and I would like to know if this is the correct way to go or not.

To make it clear, if originally I used bin/solr post on on the following
file:

add
doc
field name=id1/field
field name=titleSolr has block join support/field
  field name=content_typeparentDocument/field
doc
 field name=id2/field
field name=commentsSolrCloud supports it too!/field
/doc
/doc
/add

Now I could do bin/solr post on a file:

add
doc
field name=id1/field
field name=titleUpdated field: Solr has block join support/field
  field name=content_typeparentDocument/field
doc
 field name=id2/field
field name=commentsUpdated field: SolrCloud supports it
too!/field
/doc
/doc
/add

Will this avoid these inconsistent and scrambled or duplicate data on Solr
instances as discussed in the JIRAs? How do you usually do this?

Thanks for any help or hints.

Tom