where should I check for solrj SolrServerException

2014-08-05 Thread Lee Chunki
Hi,

I am using Solrj and sometimes get connection problems like :

org.apache.solr.client.solrj.SolrServerException: IOException occured when 
talking to server at

Caused by: java.net.SocketException: Connection reset 
Caused by: org.apache.http.NoHttpResponseException: The target server failed to 
respond 
Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to 
10.10.68.183:8983 timed out 

it seems that it’s because of resource problem but the system load is not high.

which parameter or log should i check to find the reason?

Thanks,
Chunk.

Re: Modify/add/remove params at search component

2014-08-05 Thread Lee Chunki

Hi Umesh Prassad,

it makes my code simpler than before.

Thank you,
Chunki.

On Aug 4, 2014, at 9:48 PM, Umesh Prasad umesh.i...@gmail.com wrote:

 Use ModifiableParams
 
 SolrParams params = rb.req.getParams();
ModifiableSolrParams modifableSolrParams = new
 ModifiableSolrParams(params);
modifableSolrParams.set(ParamName, paramValue);
rb.req.setParams(modifableSolrParams)
 
 
 
 
 On 4 August 2014 12:47, Lee Chunki lck7...@coupang.com wrote:
 
 Hi,
 
 I am building a new search component and it runs after QueryComponent.
 What I want to do is set params  like start, rows, query and so on at new
 search component.
 
 I could set/get query by using
 setQueryString()
 
 http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/handler/component/ResponseBuilder.html#setQueryString(java.lang.String)
 getQueryString()
 
 http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/handler/component/ResponseBuilder.html#getQueryString()
 
 and get params by using
 rb.req.getParams()
 
 but how can I set params at search component?
 
 Thanks,
 Chunki.
 
 
 
 
 -- 
 Thanks  Regards
 Umesh Prasad
 Search l...@flipkart.com
 
 in.linkedin.com/pub/umesh-prasad/6/5bb/580/



Re: where should I check for solrj SolrServerException

2014-08-05 Thread Alexandre Rafalovitch
Does it by any chance happen after a period of inactivity? And you are
holding on to the client? If so, check you don't have a firewall in
between that times out and drops the assumed dead connection.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Tue, Aug 5, 2014 at 8:15 AM, Lee Chunki lck7...@coupang.com wrote:
 Hi,

 I am using Solrj and sometimes get connection problems like :

 org.apache.solr.client.solrj.SolrServerException: IOException occured when 
 talking to server at

 Caused by: java.net.SocketException: Connection reset
 Caused by: org.apache.http.NoHttpResponseException: The target server failed 
 to respond
 Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to 
 10.10.68.183:8983 timed out

 it seems that it’s because of resource problem but the system load is not 
 high.

 which parameter or log should i check to find the reason?

 Thanks,
 Chunk.


Re: where should I check for solrj SolrServerException

2014-08-05 Thread Lee Chunki
The system architecture is

solrj client —— L4 ——— three sold servers

it works well most of time but  the error occurs less 20 times a day.

Thanks,
Chunki

On Aug 5, 2014, at 3:35 PM, Alexandre Rafalovitch arafa...@gmail.com wrote:

 Does it by any chance happen after a period of inactivity? And you are
 holding on to the client? If so, check you don't have a firewall in
 between that times out and drops the assumed dead connection.
 
 Regards,
   Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
 
 
 On Tue, Aug 5, 2014 at 8:15 AM, Lee Chunki lck7...@coupang.com wrote:
 Hi,
 
 I am using Solrj and sometimes get connection problems like :
 
 org.apache.solr.client.solrj.SolrServerException: IOException occured when 
 talking to server at
 
 Caused by: java.net.SocketException: Connection reset
 Caused by: org.apache.http.NoHttpResponseException: The target server failed 
 to respond
 Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to 
 10.10.68.183:8983 timed out
 
 it seems that it’s because of resource problem but the system load is not 
 high.
 
 which parameter or log should i check to find the reason?
 
 Thanks,
 Chunk.



AUTO: Nicholas M. Wertzberger is out of the office (returning 08/08/2014)

2014-08-05 Thread Nicholas M. Wertzberger


I am out of the office until 08/08/2014.

I am in a training class until Friday.
Please contact Jason Brown for anything JAS Team related.


Note: This is an automated response to your message  Re: Solr Faceting
issue sent on 8/5/2014 12:12:04 AM.

This is the only notification you will receive while this person is away.
**

This email and any attachments may contain information that is confidential 
and/or privileged for the sole use of the intended recipient.  Any use, review, 
disclosure, copying, distribution or reliance by others, and any forwarding of 
this email or its contents, without the express permission of the sender is 
strictly prohibited by law.  If you are not the intended recipient, please 
contact the sender immediately, delete the e-mail and destroy all copies.
**


Re: Paging bug in ReRankingQParserPlugin?

2014-08-05 Thread Joel Bernstein
The comment in the code reads slightly different:

// This enusres that reRankDocs = docs needed to satisfy the result set.
reRankDocs = Math.max(start+rows, reRankDocs);

I think you're right though that this is confusing. The way the
ReRankingQParserPlugin works is that it grabs the top X documents
(reRankDocs) and reRanks them. If the top X (reRankDocs) isn't large enough
to satisfy the page then the result won't have enough documents.

The intended use of this was actually to stop using query re-ranking when
you paged past the reRanked results. So if you re-rank the top 200
documents, you would drop the re-ranking parameter when you page to
documents 201-220.

So the line:
reRankDocs = Math.max(start+rows, reRankDocs);

Saves you from an unexpected shortfall in documents if you do page beyond
the reRankDocs. At the very least the expected use should be documented and
if we can figure out better behavior here that would be great.














Joel Bernstein
Search Engineer at Heliosearch


On Mon, Aug 4, 2014 at 7:56 PM, Adair Kovac adairko...@gmail.com wrote:

 Looking at this line in the code:

 // This enusres that reRankDocs = docs needed to satisfy the result set.
 reRankDocs = Math.max(start+rows, reRankDocs);

 This looks like it would cause skips and duplicates while paging through
 the results, since if you exceed the reRankDocs parameter and keep finding
 things that match the re-ranking query, they'll get boosted earlier
 (skipped), thus pushing down items you already saw (causing duplicates).

 It's obviously intentional behavior, but there's no documentation I can
 see of why, if you request fewer documents to be re-ranked than you're
 asking to view, it goes ahead and ignores the number you asked for. What if
 I only want the top 10 out of 50 rows to be reranked? Wouldn't it be better
 to make the client choose whether to increase the reRankDocs or leave it
 the same?

 If no one replies and I have time, I might check out 4.9 and see if I can
 confirm or disprove the bug, but figured I'd bring it up now in case I
 don't end up having time. It would be good to document the reason for this
 behavior if it turns out it's necessary.

 Thanks. I'm excited about this feature btw.

 --Adair



no of request count in solr

2014-08-05 Thread rockstar007
is there any way to get the request count per hour or per day in
solr.Thanks,RR



--
View this message in context: 
http://lucene.472066.n3.nabble.com/no-of-request-count-in-solr-tp4151191.html
Sent from the Solr - User mailing list archive at Nabble.com.

solr over hdfs for accessing/ changing indexes outside solr

2014-08-05 Thread Ali Nazemian
Dear all,
Hi,
I changed solr 4.9 to write index and data on hdfs. Now I am going to
connect to those data from the outside of solr for changing some of the
values. Could somebody please tell me how that is possible? Suppose I am
using Hbase over hdfs for do these changes.
Best regards.

-- 
A.Nazemian


Re: Auto Complete

2014-08-05 Thread benjelloun
hello,

did you find any solution to this problem ?

regards


2014-08-04 16:16 GMT+02:00 Michael Della Bitta-2 [via Lucene] 
ml-node+s472066n4150990...@n3.nabble.com:

 How are you implementing autosuggest? I'm assuming you're querying an
 indexed field and getting a stored value back. But there are a wide
 variety
 of ways of doing it.

 Michael Della Bitta

 Applications Developer

 o: +1 646 532 3062

 appinions inc.

 “The Science of Influence Marketing”

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts

 w: appinions.com http://www.appinions.com/


 On Mon, Aug 4, 2014 at 10:10 AM, benjelloun [hidden email]
 http://user/SendEmail.jtp?type=nodenode=4150990i=0 wrote:

  hello you didnt enderstand well my probleme,
 
  i give exemple: i have document contain genève with accent
  when i do q=gene -- autoSuggest geneve because of
  ASCIIFoldingFilterFactory preserveOriginal=true
  when i do q=genè -- autoSuggest genève
  but what i need to is:
  q=gene without accent and get this result: genève with accent
 
 
 
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4150989.html

  Sent from the Solr - User mailing list archive at Nabble.com.
 


 --
  If you reply to this email, your message will be added to the discussion
 below:
 http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4150990.html
  To unsubscribe from Auto Complete, click here
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4150987code=YW5hc3MuYm5qQGdtYWlsLmNvbXw0MTUwOTg3fC0xMDQyNjMzMDgx
 .
 NAML
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4151211.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Auto Complete

2014-08-05 Thread Michael Della Bitta
Unless I'm mistaken, it seems like you've created this index specifically
for autocomplete? Or is this index used for general search also?

The easy way to understand this question: Is there one entry in your index
for each term you want to autocomplete? Or are there multiple entries that
might contain the same term?

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Tue, Aug 5, 2014 at 9:10 AM, benjelloun anass@gmail.com wrote:

 hello,

 did you find any solution to this problem ?

 regards


 2014-08-04 16:16 GMT+02:00 Michael Della Bitta-2 [via Lucene] 
 ml-node+s472066n4150990...@n3.nabble.com:

  How are you implementing autosuggest? I'm assuming you're querying an
  indexed field and getting a stored value back. But there are a wide
  variety
  of ways of doing it.
 
  Michael Della Bitta
 
  Applications Developer
 
  o: +1 646 532 3062
 
  appinions inc.
 
  “The Science of Influence Marketing”
 
  18 East 41st Street
 
  New York, NY 10017
 
  t: @appinions https://twitter.com/Appinions | g+:
  plus.google.com/appinions
  
 
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 
 
  w: appinions.com http://www.appinions.com/
 
 
  On Mon, Aug 4, 2014 at 10:10 AM, benjelloun [hidden email]
  http://user/SendEmail.jtp?type=nodenode=4150990i=0 wrote:
 
   hello you didnt enderstand well my probleme,
  
   i give exemple: i have document contain genève with accent
   when i do q=gene -- autoSuggest geneve because of
   ASCIIFoldingFilterFactory preserveOriginal=true
   when i do q=genè -- autoSuggest genève
   but what i need to is:
   q=gene without accent and get this result: genève with accent
  
  
  
   --
   View this message in context:
  
 http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4150989.html
 
   Sent from the Solr - User mailing list archive at Nabble.com.
  
 
 
  --
   If you reply to this email, your message will be added to the discussion
  below:
  http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4150990.html
   To unsubscribe from Auto Complete, click here
  
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4150987code=YW5hc3MuYm5qQGdtYWlsLmNvbXw0MTUwOTg3fC0xMDQyNjMzMDgx
 
  .
  NAML
  
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
 
 




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4151211.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Delta Import - Cleaning Index

2014-08-05 Thread Jako de Wet
Hi everyone

I have a Solr Index that has 20+ million products, the core is about 70GB.

What I would like to do, is a weekly delta-import, but it seems to be
growing in size each week. (Currently its running a full-import +
clean=false)

Shouldn't the Delta-Import with the Clean=True option import the records
and update the old records in the core? It should result in +- the same
size?

When I do a delta-import + clean=true via the Solr Dashboard, it cleans the
whole 20+million and only the update records are left.

Any ideas?

Thank you.

-- 

*Jako de Wet*


*Business DevelopmentSpecialist*

*SAPnet*
98 Beach Rd, 1st Floor
Metropole Plaza
Strand
7140
South Africa
Phone: +27-21-853-3564
Fax: +27-21-853-3479
Website: www.sapnet.co.za
E-mail: j...@sapnet.co.za
[image: http://www.sapnet.co.za/sapnet_logo.png]
This transmission is for the intended addressee only and is confidential
information.
If you have received this transmission in error, please delete it and
notify the sender.
The contents of this e-mail are the opinion of the writer only and are not
endorsed by
SAPnet unless expressly stated otherwise. All information contained in this
transmission
(attachments included) is the property of Publications Network (Pty) Ltd
t/a SAPnet and
protected under Copyright © 2005 by Publications Network (Pty) Ltd t/a
SAPnet. SAPnet
reserve all rights and unauthorized reproduction, in any manner, is
prohibited.


Re: solr over hdfs for accessing/ changing indexes outside solr

2014-08-05 Thread Shawn Heisey
On 8/5/2014 7:04 AM, Ali Nazemian wrote:
 I changed solr 4.9 to write index and data on hdfs. Now I am going to
 connect to those data from the outside of solr for changing some of the
 values. Could somebody please tell me how that is possible? Suppose I am
 using Hbase over hdfs for do these changes.

I don't know how you could safely modify the index without a Lucene
application or another instance of Solr, but if you do manage to modify
the index, simply reloading the core or restarting Solr should cause it
to pick up the changes. Either you would need to make sure that Solr
never modifies the index, or you would need some way of coordinating
updates so that Solr and the other application would never try to modify
the index at the same time.

Thanks,
Shawn



Re: Delta Import - Cleaning Index

2014-08-05 Thread Shawn Heisey
On 8/5/2014 7:20 AM, Jako de Wet wrote:
 I have a Solr Index that has 20+ million products, the core is about 70GB.
 
 What I would like to do, is a weekly delta-import, but it seems to be
 growing in size each week. (Currently its running a full-import +
 clean=false)
 
 Shouldn't the Delta-Import with the Clean=True option import the records
 and update the old records in the core? It should result in +- the same
 size?
 
 When I do a delta-import + clean=true via the Solr Dashboard, it cleans the
 whole 20+million and only the update records are left.

The clean parameter refers to the whole index.  You asked it to clean
the index, so it did -- it deleted all documents.

Deleted documents are not actually deleted, they are marked as deleted
-- they still take up disk space.  In order to actually get rid of them,
they need to be merged out.  When segments are merged, only the
non-deleted documents are copied to the new segment.  A full optimize
(which is a forced merge down to one segment) is the only way to be
absolutely sure that all deleted documents are gone.  A full optimize
will completely rewrite the index, which is a lot of disk I/O.  That can
lead to query performance issues while the optimize is happening and for
a short time afterwards.

Note that when you index a document with the same value in the uniqueKey
field as an existing document, the old document is deleted before the
new one is indexed.

Thanks,
Shawn



Re: Delta Import - Cleaning Index

2014-08-05 Thread Jako de Wet
Hi Shawn

Thanks for the insight. Why the size increase when not specifying the clean
parameter then? The PK for the documents remain the same throughout the
whole import process.

Should a full optimize combine all the results into one and decrease the
physical size of the core?


On Tue, Aug 5, 2014 at 3:28 PM, Shawn Heisey s...@elyograg.org wrote:

 On 8/5/2014 7:20 AM, Jako de Wet wrote:
  I have a Solr Index that has 20+ million products, the core is about
 70GB.
 
  What I would like to do, is a weekly delta-import, but it seems to be
  growing in size each week. (Currently its running a full-import +
  clean=false)
 
  Shouldn't the Delta-Import with the Clean=True option import the records
  and update the old records in the core? It should result in +- the same
  size?
 
  When I do a delta-import + clean=true via the Solr Dashboard, it cleans
 the
  whole 20+million and only the update records are left.

 The clean parameter refers to the whole index.  You asked it to clean
 the index, so it did -- it deleted all documents.

 Deleted documents are not actually deleted, they are marked as deleted
 -- they still take up disk space.  In order to actually get rid of them,
 they need to be merged out.  When segments are merged, only the
 non-deleted documents are copied to the new segment.  A full optimize
 (which is a forced merge down to one segment) is the only way to be
 absolutely sure that all deleted documents are gone.  A full optimize
 will completely rewrite the index, which is a lot of disk I/O.  That can
 lead to query performance issues while the optimize is happening and for
 a short time afterwards.

 Note that when you index a document with the same value in the uniqueKey
 field as an existing document, the old document is deleted before the
 new one is indexed.

 Thanks,
 Shawn




-- 

*Jako de Wet*


*Business DevelopmentSpecialist*

*SAPnet*
98 Beach Rd, 1st Floor
Metropole Plaza
Strand
7140
South Africa
Phone: +27-21-853-3564
Fax: +27-21-853-3479
Website: www.sapnet.co.za
E-mail: j...@sapnet.co.za
[image: http://www.sapnet.co.za/sapnet_logo.png]
This transmission is for the intended addressee only and is confidential
information.
If you have received this transmission in error, please delete it and
notify the sender.
The contents of this e-mail are the opinion of the writer only and are not
endorsed by
SAPnet unless expressly stated otherwise. All information contained in this
transmission
(attachments included) is the property of Publications Network (Pty) Ltd
t/a SAPnet and
protected under Copyright © 2005 by Publications Network (Pty) Ltd t/a
SAPnet. SAPnet
reserve all rights and unauthorized reproduction, in any manner, is
prohibited.


Re: no of request count in solr

2014-08-05 Thread Shawn Heisey
On 8/5/2014 6:06 AM, rockstar007 wrote:
 is there any way to get the request count per hour or per day in
 solr.Thanks,RR

There is no information about requests per hour or day, but the number
of requests is available, if you track it yourself on an hourly basis,
you can calculate it.

It's in the admin UI under Plugins/Stats, or you can use the same
handler the admin UI does at the following URL:

/solr/corename/admin/mbeans?stats=true

Thanks,
Shawn



Re: Auto Complete

2014-08-05 Thread benjelloun
yeah thats true i creat this index just for auto complete
here is my schema:

dynamicField name=*_en type=text_en indexed=true stored=false
required=false multiValued=true/
dynamicField name=*_fr type=text_fr indexed=true stored=false
required=false multiValued=true/
dynamicField name=*_ar type=text_ar indexed=true stored=false
required=false multiValued=true/

copyField source=*_en dest=suggestField/
copyField source=*_fr dest=suggestField/
copyField source=*_ar dest=suggestField/

the i use suggestField for autocomplet like i mentioned above
do you have any other configuration which can do what i need ?



2014-08-05 15:19 GMT+02:00 Michael Della Bitta-2 [via Lucene] 
ml-node+s472066n4151216...@n3.nabble.com:

 Unless I'm mistaken, it seems like you've created this index specifically
 for autocomplete? Or is this index used for general search also?

 The easy way to understand this question: Is there one entry in your index
 for each term you want to autocomplete? Or are there multiple entries that
 might contain the same term?

 Michael Della Bitta

 Applications Developer

 o: +1 646 532 3062

 appinions inc.

 “The Science of Influence Marketing”

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts

 w: appinions.com http://www.appinions.com/


 On Tue, Aug 5, 2014 at 9:10 AM, benjelloun [hidden email]
 http://user/SendEmail.jtp?type=nodenode=4151216i=0 wrote:

  hello,
 
  did you find any solution to this problem ?
 
  regards
 
 
  2014-08-04 16:16 GMT+02:00 Michael Della Bitta-2 [via Lucene] 
  [hidden email] http://user/SendEmail.jtp?type=nodenode=4151216i=1:
 
   How are you implementing autosuggest? I'm assuming you're querying an
   indexed field and getting a stored value back. But there are a wide
   variety
   of ways of doing it.
  
   Michael Della Bitta
  
   Applications Developer
  
   o: +1 646 532 3062
  
   appinions inc.
  
   “The Science of Influence Marketing”
  
   18 East 41st Street
  
   New York, NY 10017
  
   t: @appinions https://twitter.com/Appinions | g+:
   plus.google.com/appinions
   
  
 
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
  
  
   w: appinions.com http://www.appinions.com/
  
  
   On Mon, Aug 4, 2014 at 10:10 AM, benjelloun [hidden email]
   http://user/SendEmail.jtp?type=nodenode=4150990i=0 wrote:
  
hello you didnt enderstand well my probleme,
   
i give exemple: i have document contain genève with accent
when i do q=gene -- autoSuggest geneve because of
ASCIIFoldingFilterFactory preserveOriginal=true
when i do q=genè -- autoSuggest genève
but what i need to is:
q=gene without accent and get this result: genève with accent
   
   
   
--
View this message in context:
   
  http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4150989.html
  
Sent from the Solr - User mailing list archive at Nabble.com.
   
  
  
   --
If you reply to this email, your message will be added to the
 discussion
   below:
  
 http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4150990.html
To unsubscribe from Auto Complete, click here
   
 
 
   .
   NAML
   
 
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

  
  
 
 
 
 
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4151211.html

  Sent from the Solr - User mailing list archive at Nabble.com.
 


 --
  If you reply to this email, your message will be added to the discussion
 below:
 http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4151216.html
  To unsubscribe from Auto Complete, click here
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4150987code=YW5hc3MuYm5qQGdtYWlsLmNvbXw0MTUwOTg3fC0xMDQyNjMzMDgx
 .
 NAML
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4151222.html
Sent from the Solr - User mailing list archive at Nabble.com.

ExternalFileFieldReloader and commit

2014-08-05 Thread Peter Keegan
When there are multiple 'external file field' files available, Solr will
reload the last one (lexicographically) with a commit, but only if changes
were made to the index. Otherwise, it skips the reload and logs: No
uncommitted changes. Skipping IW.commit.  Has anyone else noticed this? It
seems like a bug to me. (yes, I do have firstSearcher and newSearcher event
listeners in solrconfig.xml)

Peter


Re: Getting Solr 4 to index the simple names of files

2014-08-05 Thread jrusnak
Solution found:

I was using the SimplePostTool utility to crawl and post documents to Solr
on the default example settings (except for having added a few file types to
be indexed).

Instead of finding a field that exactly passed the name of the document, I
used the resourcename text field that was already being parsed.

In my javascript in the AJAX interface I then cut the resourcename down into
simply a file name (and linked it to the corresponding file) with:

var fullFileName = doc.resourcename;
var output = 'div
' + fullFileName.substring(afterLastBackslash)  + '  +   
';

Thank you for the help!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-Solr-4-to-index-the-simple-names-of-files-tp4144318p4151227.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Delta Import - Cleaning Index

2014-08-05 Thread Shawn Heisey
On 8/5/2014 7:31 AM, Jako de Wet wrote:
 Thanks for the insight. Why the size increase when not specifying the clean
 parameter then? The PK for the documents remain the same throughout the
 whole import process.
 
 Should a full optimize combine all the results into one and decrease the
 physical size of the core?

When you delete all documents, all of the original segments have no
undeleted documents in them, so Lucene knows it can completely remove
those segments even when there is no merging.  I don't know what
situations will trigger such automatic removal, but Lucene is smart
enough to know that it can do it.

If you simply rely on uniqueKey replacement, the space taken up by
deleted documents cannot be automatically recovered, because there are
good documents in those segments.  Only a merge can recover the space,
and only an optimize can guarantee any specific document's segment will
be merged.

Thanks,
Shawn



Re: Implementing custom analyzer for multi-language stemming

2014-08-05 Thread Rich Cariens
I've started a GitHub project to try out some cross-lingual analysis ideas (
https://github.com/whateverdood/cross-lingual-search). I haven't played
over there for about 3 months, but plan on restarting work there shortly.
In a nutshell, the interesting component
(SimplePolyGlotStemmingTokenFilter) relies on ICU4J ScriptAttributes:
each token is inspected for it's script, i.e. latin or arabic, and then
a ScriptStemmer recruits the appropriate stemmer to handle the token.

Of course this is extremely primitive and basic, but I think it would be
possible to write a CharFilter or TokenFilter that inspects the entire
TokenStream to guess the language(s), perhaps even noting where languages
change. Language and position information could be tracked, the TokenStream
rewound and then Tokens emitted with LanguageAttributes for downstream
Token stemmers to deal with.

Or is that a crazy idea?


On Tue, Aug 5, 2014 at 12:10 AM, TK kuros...@sonic.net wrote:

 On 7/30/14, 10:47 AM, Eugene wrote:

  Hello, fellow Solr and Lucene users and developers!

  In our project we receive text from users in different languages. We
 detect language automatically and use Google Translate APIs a lot (so
 having arbitrary number of languages in our system doesn't concern us).
 However we need to be able to search using stemming. Having nearly hundred
 of fields (several fields for each language with language-specific
 stemmers) listed in our search query is not an option. So we need a way to
 have a single index which has stemmed tokens for different languages.


 Do you mean to have a Tokenizer that switches among supported languages
 depending on the lang field? This is something I thought about when I
 started working on Solr/Lucene and soon I realized it is not possible
 because
 of the way Lucene is designed; The Tokenizer in an analyzer chain cannot
 peek
 other field's value, or there is no way to control which field is processed
 first.

 If that's not what you are trying to achieve, could you tell us what
 it is? If you have different language text in a single field, and if
 someone search for a word common to many languages,
 such as sports (or Lucene for that matter), Solr will return
 the documents of different languages, most of which the user
 doesn't understand. Would that be useful? If you have
 a special use case, would you like to share it?

 --
 Kuro



Re: Auto Complete

2014-08-05 Thread benjelloun
i found this solution but when i test it nothing in suggestion

searchComponent class=solr.SpellCheckComponent name=fuzzySuggest
lst name=spellchecker
  str name=namefuzzySuggest/str
  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
name=lookupImplorg.apache.solr.spelling.suggest.fst.FuzzyLookupFactory/str
  str name=fieldsuggestField/str
  str name=storeDirsuggestFolders/str
  str name=buildOnCommittrue/str
  bool name=exactMatchFirsttrue/bool
  str name=suggestAnalyzerFieldTypetexts/str
  bool name=preserveSepfalse/bool
  int name=maxEdits2/int
  str name=sourceLocationsuggestFolders/fuzzysuggest.txt/str
  /lst
str name=queryAnalyzerFieldTypephrase_suggest/str
  /searchComponent

  requestHandler class=org.apache.solr.handler.component.SearchHandler
name=/fuzzySuggest
lst name=defaults
  str name=namefuzzySuggest/str
  str name=spellchecktrue/str
  str name=spellcheck.dictionaryfuzzySuggest/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.count10/str
  str name=spellcheck.collatetrue/str
  str name=spellcheck.maxCollations10/str
  str name=spellcheck.collateExtendedResultstrue/str
/lst
arr name=components
  strfuzzySuggest/str
/arr
  /requestHandler


2014-08-05 15:32 GMT+02:00 anass benjelloun anass@gmail.com:

 yeah thats true i creat this index just for auto complete
 here is my schema:

 dynamicField name=*_en type=text_en indexed=true stored=false
 required=false multiValued=true/
 dynamicField name=*_fr type=text_fr indexed=true stored=false
 required=false multiValued=true/
 dynamicField name=*_ar type=text_ar indexed=true stored=false
 required=false multiValued=true/

 copyField source=*_en dest=suggestField/
 copyField source=*_fr dest=suggestField/
 copyField source=*_ar dest=suggestField/

 the i use suggestField for autocomplet like i mentioned above
 do you have any other configuration which can do what i need ?



 2014-08-05 15:19 GMT+02:00 Michael Della Bitta-2 [via Lucene] 
 ml-node+s472066n4151216...@n3.nabble.com:

  Unless I'm mistaken, it seems like you've created this index specifically
 for autocomplete? Or is this index used for general search also?

 The easy way to understand this question: Is there one entry in your
 index
 for each term you want to autocomplete? Or are there multiple entries
 that
 might contain the same term?

 Michael Della Bitta

 Applications Developer

 o: +1 646 532 3062

 appinions inc.

 “The Science of Influence Marketing”

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts

 w: appinions.com http://www.appinions.com/


 On Tue, Aug 5, 2014 at 9:10 AM, benjelloun [hidden email]
 http://user/SendEmail.jtp?type=nodenode=4151216i=0 wrote:

  hello,
 
  did you find any solution to this problem ?
 
  regards
 
 
  2014-08-04 16:16 GMT+02:00 Michael Della Bitta-2 [via Lucene] 
  [hidden email] http://user/SendEmail.jtp?type=nodenode=4151216i=1:

 
   How are you implementing autosuggest? I'm assuming you're querying an
   indexed field and getting a stored value back. But there are a wide
   variety
   of ways of doing it.
  
   Michael Della Bitta
  
   Applications Developer
  
   o: +1 646 532 3062
  
   appinions inc.
  
   “The Science of Influence Marketing”
  
   18 East 41st Street
  
   New York, NY 10017
  
   t: @appinions https://twitter.com/Appinions | g+:
   plus.google.com/appinions
   
  
 
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
  
  
   w: appinions.com http://www.appinions.com/
  
  
   On Mon, Aug 4, 2014 at 10:10 AM, benjelloun [hidden email]
   http://user/SendEmail.jtp?type=nodenode=4150990i=0 wrote:
  
hello you didnt enderstand well my probleme,
   
i give exemple: i have document contain genève with accent
when i do q=gene -- autoSuggest geneve because of
ASCIIFoldingFilterFactory preserveOriginal=true
when i do q=genè -- autoSuggest genève
but what i need to is:
q=gene without accent and get this result: genève with accent
   
   
   
--
View this message in context:
   
  http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4150989.html
  
Sent from the Solr - User mailing list archive at Nabble.com.
   
  
  
   --
If you reply to this email, your message will be added to the
 discussion
   below:
  
 http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4150990.html
To unsubscribe from Auto Complete, click here
   
 
 
   .
   NAML
   
 
 

Re: Auto Complete

2014-08-05 Thread Michael Della Bitta
In this case, I recommend using the approach that this tutorial uses:

http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/

Basically the idea is you index the data a few different ways and then use
edismax to query them all with different boosts. You'd use the stored
version of you field for display, so your accented characters would not get
stripped.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Tue, Aug 5, 2014 at 9:32 AM, benjelloun anass@gmail.com wrote:

 yeah thats true i creat this index just for auto complete
 here is my schema:

 dynamicField name=*_en type=text_en indexed=true stored=false
 required=false multiValued=true/
 dynamicField name=*_fr type=text_fr indexed=true stored=false
 required=false multiValued=true/
 dynamicField name=*_ar type=text_ar indexed=true stored=false
 required=false multiValued=true/

 copyField source=*_en dest=suggestField/
 copyField source=*_fr dest=suggestField/
 copyField source=*_ar dest=suggestField/

 the i use suggestField for autocomplet like i mentioned above
 do you have any other configuration which can do what i need ?



 2014-08-05 15:19 GMT+02:00 Michael Della Bitta-2 [via Lucene] 
 ml-node+s472066n4151216...@n3.nabble.com:

  Unless I'm mistaken, it seems like you've created this index specifically
  for autocomplete? Or is this index used for general search also?
 
  The easy way to understand this question: Is there one entry in your
 index
  for each term you want to autocomplete? Or are there multiple entries
 that
  might contain the same term?
 
  Michael Della Bitta
 
  Applications Developer
 
  o: +1 646 532 3062
 
  appinions inc.
 
  “The Science of Influence Marketing”
 
  18 East 41st Street
 
  New York, NY 10017
 
  t: @appinions https://twitter.com/Appinions | g+:
  plus.google.com/appinions
  
 
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 
 
  w: appinions.com http://www.appinions.com/
 
 
  On Tue, Aug 5, 2014 at 9:10 AM, benjelloun [hidden email]
  http://user/SendEmail.jtp?type=nodenode=4151216i=0 wrote:
 
   hello,
  
   did you find any solution to this problem ?
  
   regards
  
  
   2014-08-04 16:16 GMT+02:00 Michael Della Bitta-2 [via Lucene] 
   [hidden email] http://user/SendEmail.jtp?type=nodenode=4151216i=1
 :
  
How are you implementing autosuggest? I'm assuming you're querying an
indexed field and getting a stored value back. But there are a wide
variety
of ways of doing it.
   
Michael Della Bitta
   
Applications Developer
   
o: +1 646 532 3062
   
appinions inc.
   
“The Science of Influence Marketing”
   
18 East 41st Street
   
New York, NY 10017
   
t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions

   
  
 
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
   
   
w: appinions.com http://www.appinions.com/
   
   
On Mon, Aug 4, 2014 at 10:10 AM, benjelloun [hidden email]
http://user/SendEmail.jtp?type=nodenode=4150990i=0 wrote:
   
 hello you didnt enderstand well my probleme,

 i give exemple: i have document contain genève with accent
 when i do q=gene -- autoSuggest geneve because of
 ASCIIFoldingFilterFactory preserveOriginal=true
 when i do q=genè -- autoSuggest genève
 but what i need to is:
 q=gene without accent and get this result: genève with accent



 --
 View this message in context:

  
 http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4150989.html
   
 Sent from the Solr - User mailing list archive at Nabble.com.

   
   
--
 If you reply to this email, your message will be added to the
  discussion
below:
   
  http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4150990.html
 To unsubscribe from Auto Complete, click here

  
  
.
NAML

  
 
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
 
   
   
  
  
  
  
   --
   View this message in context:
  
 http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4151211.html
 
   Sent from the Solr - User mailing list archive at Nabble.com.
  
 
 
  --
   If you reply to this email, your message will be added to the discussion
  below:
  

Re: Paging bug in ReRankingQParserPlugin?

2014-08-05 Thread Adair Kovac
Thanks, great explanation! Yeah, if it keeps the current behavior added
documentation would be great.

Are there any other features that expect parameters to change as one pages?
If not I'm concerned that it might be hard to support for clients that
assume only the index params will change. It also makes it harder to work
if we want to add re-ranking on a strict small set of results on the first
page, because then we'd have to stitch together two result sets. We don't
currently want to do that, though.

For what it's worth, what my colleague who linked me the feature and I both
assumed the behavior would be is that it would get all the results and
return the ones past the re-ranking point as-is. Is that possible?

Thanks,

Adair




On Tue, Aug 5, 2014 at 5:53 AM, Joel Bernstein joels...@gmail.com wrote:

 The comment in the code reads slightly different:

 // This enusres that reRankDocs = docs needed to satisfy the result set.
 reRankDocs = Math.max(start+rows, reRankDocs);

 I think you're right though that this is confusing. The way the
 ReRankingQParserPlugin works is that it grabs the top X documents
 (reRankDocs) and reRanks them. If the top X (reRankDocs) isn't large enough
 to satisfy the page then the result won't have enough documents.

 The intended use of this was actually to stop using query re-ranking when
 you paged past the reRanked results. So if you re-rank the top 200
 documents, you would drop the re-ranking parameter when you page to
 documents 201-220.

 So the line:
 reRankDocs = Math.max(start+rows, reRankDocs);

 Saves you from an unexpected shortfall in documents if you do page beyond
 the reRankDocs. At the very least the expected use should be documented and
 if we can figure out better behavior here that would be great.














 Joel Bernstein
 Search Engineer at Heliosearch


 On Mon, Aug 4, 2014 at 7:56 PM, Adair Kovac adairko...@gmail.com wrote:

 Looking at this line in the code:

 // This enusres that reRankDocs = docs needed to satisfy the result set.
 reRankDocs = Math.max(start+rows, reRankDocs);

 This looks like it would cause skips and duplicates while paging through
 the results, since if you exceed the reRankDocs parameter and keep finding
 things that match the re-ranking query, they'll get boosted earlier
 (skipped), thus pushing down items you already saw (causing duplicates).

 It's obviously intentional behavior, but there's no documentation I can
 see of why, if you request fewer documents to be re-ranked than you're
 asking to view, it goes ahead and ignores the number you asked for. What if
 I only want the top 10 out of 50 rows to be reranked? Wouldn't it be better
 to make the client choose whether to increase the reRankDocs or leave it
 the same?

 If no one replies and I have time, I might check out 4.9 and see if I can
 confirm or disprove the bug, but figured I'd bring it up now in case I
 don't end up having time. It would be good to document the reason for this
 behavior if it turns out it's necessary.

 Thanks. I'm excited about this feature btw.

 --Adair





Re: solr over hdfs for accessing/ changing indexes outside solr

2014-08-05 Thread Michael Della Bitta
Probably the most correct way to modify the index would be to use the
Solr REST API to push your changes out.

Another thing you might want to look at is Lilly. Basically it's a way to
set up a Solr collection as an HBase replication target, so changes to your
HBase table would automatically propagate over to Solr.

http://www.ngdata.com/on-lily-hbase-hadoop-and-solr/

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Tue, Aug 5, 2014 at 9:04 AM, Ali Nazemian alinazem...@gmail.com wrote:

 Dear all,
 Hi,
 I changed solr 4.9 to write index and data on hdfs. Now I am going to
 connect to those data from the outside of solr for changing some of the
 values. Could somebody please tell me how that is possible? Suppose I am
 using Hbase over hdfs for do these changes.
 Best regards.

 --
 A.Nazemian



Re: Paging bug in ReRankingQParserPlugin?

2014-08-05 Thread Joel Bernstein
I updated the docs for now. But I agree this paging issue needs to be
handled transparently. Feel free to create a jira issue for this or I can
create one when I have time to start looking into it.

Joel Bernstein
Search Engineer at Heliosearch


On Tue, Aug 5, 2014 at 12:04 PM, Adair Kovac adairko...@gmail.com wrote:

 Thanks, great explanation! Yeah, if it keeps the current behavior added
 documentation would be great.

 Are there any other features that expect parameters to change as one
 pages? If not I'm concerned that it might be hard to support for clients
 that assume only the index params will change. It also makes it harder to
 work if we want to add re-ranking on a strict small set of results on the
 first page, because then we'd have to stitch together two result sets. We
 don't currently want to do that, though.

 For what it's worth, what my colleague who linked me the feature and I
 both assumed the behavior would be is that it would get all the results and
 return the ones past the re-ranking point as-is. Is that possible?

 Thanks,

 Adair




 On Tue, Aug 5, 2014 at 5:53 AM, Joel Bernstein joels...@gmail.com wrote:

 The comment in the code reads slightly different:

 // This enusres that reRankDocs = docs needed to satisfy the result set.
 reRankDocs = Math.max(start+rows, reRankDocs);

 I think you're right though that this is confusing. The way the
 ReRankingQParserPlugin works is that it grabs the top X documents
 (reRankDocs) and reRanks them. If the top X (reRankDocs) isn't large enough
 to satisfy the page then the result won't have enough documents.

 The intended use of this was actually to stop using query re-ranking when
 you paged past the reRanked results. So if you re-rank the top 200
 documents, you would drop the re-ranking parameter when you page to
 documents 201-220.

 So the line:
 reRankDocs = Math.max(start+rows, reRankDocs);

 Saves you from an unexpected shortfall in documents if you do page beyond
 the reRankDocs. At the very least the expected use should be documented and
 if we can figure out better behavior here that would be great.














 Joel Bernstein
 Search Engineer at Heliosearch


 On Mon, Aug 4, 2014 at 7:56 PM, Adair Kovac adairko...@gmail.com wrote:

 Looking at this line in the code:

 // This enusres that reRankDocs = docs needed to satisfy the result set.
 reRankDocs = Math.max(start+rows, reRankDocs);

 This looks like it would cause skips and duplicates while paging through
 the results, since if you exceed the reRankDocs parameter and keep finding
 things that match the re-ranking query, they'll get boosted earlier
 (skipped), thus pushing down items you already saw (causing duplicates).

 It's obviously intentional behavior, but there's no documentation I can
 see of why, if you request fewer documents to be re-ranked than you're
 asking to view, it goes ahead and ignores the number you asked for. What if
 I only want the top 10 out of 50 rows to be reranked? Wouldn't it be better
 to make the client choose whether to increase the reRankDocs or leave it
 the same?

 If no one replies and I have time, I might check out 4.9 and see if I
 can confirm or disprove the bug, but figured I'd bring it up now in case I
 don't end up having time. It would be good to document the reason for this
 behavior if it turns out it's necessary.

 Thanks. I'm excited about this feature btw.

 --Adair






Re: Paging bug in ReRankingQParserPlugin?

2014-08-05 Thread Walter Underwood
You can also have a sliding re-ranking horizon. That is how we did it in 
Ultraseek.

http://observer.wunderwood.org/2007/04/04/progressive-reranking/

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/


On Aug 5, 2014, at 9:38 AM, Joel Bernstein joels...@gmail.com wrote:

 I updated the docs for now. But I agree this paging issue needs to be
 handled transparently. Feel free to create a jira issue for this or I can
 create one when I have time to start looking into it.
 
 Joel Bernstein
 Search Engineer at Heliosearch
 
 
 On Tue, Aug 5, 2014 at 12:04 PM, Adair Kovac adairko...@gmail.com wrote:
 
 Thanks, great explanation! Yeah, if it keeps the current behavior added
 documentation would be great.
 
 Are there any other features that expect parameters to change as one
 pages? If not I'm concerned that it might be hard to support for clients
 that assume only the index params will change. It also makes it harder to
 work if we want to add re-ranking on a strict small set of results on the
 first page, because then we'd have to stitch together two result sets. We
 don't currently want to do that, though.
 
 For what it's worth, what my colleague who linked me the feature and I
 both assumed the behavior would be is that it would get all the results and
 return the ones past the re-ranking point as-is. Is that possible?
 
 Thanks,
 
 Adair
 
 
 
 
 On Tue, Aug 5, 2014 at 5:53 AM, Joel Bernstein joels...@gmail.com wrote:
 
 The comment in the code reads slightly different:
 
 // This enusres that reRankDocs = docs needed to satisfy the result set.
 reRankDocs = Math.max(start+rows, reRankDocs);
 
 I think you're right though that this is confusing. The way the
 ReRankingQParserPlugin works is that it grabs the top X documents
 (reRankDocs) and reRanks them. If the top X (reRankDocs) isn't large enough
 to satisfy the page then the result won't have enough documents.
 
 The intended use of this was actually to stop using query re-ranking when
 you paged past the reRanked results. So if you re-rank the top 200
 documents, you would drop the re-ranking parameter when you page to
 documents 201-220.
 
 So the line:
 reRankDocs = Math.max(start+rows, reRankDocs);
 
 Saves you from an unexpected shortfall in documents if you do page beyond
 the reRankDocs. At the very least the expected use should be documented and
 if we can figure out better behavior here that would be great.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Joel Bernstein
 Search Engineer at Heliosearch
 
 
 On Mon, Aug 4, 2014 at 7:56 PM, Adair Kovac adairko...@gmail.com wrote:
 
 Looking at this line in the code:
 
 // This enusres that reRankDocs = docs needed to satisfy the result set.
 reRankDocs = Math.max(start+rows, reRankDocs);
 
 This looks like it would cause skips and duplicates while paging through
 the results, since if you exceed the reRankDocs parameter and keep finding
 things that match the re-ranking query, they'll get boosted earlier
 (skipped), thus pushing down items you already saw (causing duplicates).
 
 It's obviously intentional behavior, but there's no documentation I can
 see of why, if you request fewer documents to be re-ranked than you're
 asking to view, it goes ahead and ignores the number you asked for. What if
 I only want the top 10 out of 50 rows to be reranked? Wouldn't it be better
 to make the client choose whether to increase the reRankDocs or leave it
 the same?
 
 If no one replies and I have time, I might check out 4.9 and see if I
 can confirm or disprove the bug, but figured I'd bring it up now in case I
 don't end up having time. It would be good to document the reason for this
 behavior if it turns out it's necessary.
 
 Thanks. I'm excited about this feature btw.
 
 --Adair
 
 
 
 



Re: solr over hdfs for accessing/ changing indexes outside solr

2014-08-05 Thread Ali Nazemian
Actually I am going to do some analysis on the solr data using map reduce.
For this purpose it might be needed to change some part of data or add new
fields from outside solr.


On Tue, Aug 5, 2014 at 5:51 PM, Shawn Heisey s...@elyograg.org wrote:

 On 8/5/2014 7:04 AM, Ali Nazemian wrote:
  I changed solr 4.9 to write index and data on hdfs. Now I am going to
  connect to those data from the outside of solr for changing some of the
  values. Could somebody please tell me how that is possible? Suppose I am
  using Hbase over hdfs for do these changes.

 I don't know how you could safely modify the index without a Lucene
 application or another instance of Solr, but if you do manage to modify
 the index, simply reloading the core or restarting Solr should cause it
 to pick up the changes. Either you would need to make sure that Solr
 never modifies the index, or you would need some way of coordinating
 updates so that Solr and the other application would never try to modify
 the index at the same time.

 Thanks,
 Shawn




-- 
A.Nazemian


Re: solr update dynamic field generates multiValued error

2014-08-05 Thread Franco Giacosa
Hey Erick, i think that you were right, there was a mix in the schemas and
that was generating the error on some of the documents.

Thanks for the help guys!


2014-08-05 1:28 GMT-03:00 Erick Erickson erickerick...@gmail.com:

 Hmmm, I jus tried this with a 4.x build and I can update the document
 multiple times without a problem. I just indexed the standard exampledocs
 and then updated a doc like this (vidcard.xml was the base):

 add
 doc
   field name=idEN7800GTX/2DHTV/256M/field

   field name=manu_id_s update=seteoe changed this puppy/field
 /doc
   !-- yes, you can add more than one document at a time --
 /add

 I'm not getting any multiple values in the _coordinate fields. However, I
 _do_ get the error if my dynamic *_coordinate field is set to
 stored=true.

 Did you perhaps change this at some point? Whenever I change the schema, I
 try to 'rm -rf solr/collection/data' just to be sure I've purged all traces
 of the former schema definition.

 Best,
 Erick


 On Mon, Aug 4, 2014 at 7:04 PM, Franco Giacosa fgiac...@gmail.com wrote:

  No, they are not declarad explicitly.
 
  This is how they are created:
 
  field name=latLong type=location indexed=true stored=true/
 
  dynamicField name=*_coordinate  type=tdouble indexed=true
   stored=false/
 
  fieldType name=location class=solr.LatLonType
  subFieldSuffix=_coordinate/
 
 
 
 
  2014-08-04 22:28 GMT-03:00 Michael Ryan mr...@moreover.com:
 
   Are the latLong_0_coordinate and latLong_1_coordinate fields populated
   using copyField? If so, this sounds like it could be
   https://issues.apache.org/jira/browse/SOLR-3502.
  
   -Michael
  
   -Original Message-
   From: Franco Giacosa [mailto:fgiac...@gmail.com]
   Sent: Monday, August 04, 2014 9:05 PM
   To: solr-user@lucene.apache.org
   Subject: solr update dynamic field generates multiValued error
  
   Hello everyone, this is my first time posting a question, so forgive me
  if
   i'm missing something.
  
   This is my problem:
  
   I have a schema.xml that has the following latLong information
  
   The dynamicField generates 2 dynamic fields that have the lat and the
  long
   (latLong_0_coordinate and latLong_1_coordinate)
  
   So for example a document will have
  
   latLong_0_coordinate: 40.4114, latLong_1_coordinate: -74.1031,
   latLong: 40.4114,-74.1031,
  
   Now when I try to update a document (i don't update the latLong field.
 I
   just update other parts of the document using atomic update) solr
   re-creates the dynamicField and adds the same value again, like its
 using
   add instead of set. So when i do an update the fields of the doc look
  like
   this
  
   latLong_0_coordinate: [40.4114,40.4114] latLong_1_coordinate:
   [-74.1031,-74.1031] latLong: 40.4114,-74.1031,
  
   So the dynamicFields now have 2 values, so the next time that I want to
   update the document a schema error is throw because im trying to store
 a
   collection into a none multivalued field.
  
  
   Thanks in advanced.
  
 



Re: Paging bug in ReRankingQParserPlugin?

2014-08-05 Thread Adair Kovac
Thanks, Joel. I created SOLR-6323.


On Tue, Aug 5, 2014 at 10:38 AM, Joel Bernstein joels...@gmail.com wrote:

 I updated the docs for now. But I agree this paging issue needs to be
 handled transparently. Feel free to create a jira issue for this or I can
 create one when I have time to start looking into it.

 Joel Bernstein
 Search Engineer at Heliosearch


 On Tue, Aug 5, 2014 at 12:04 PM, Adair Kovac adairko...@gmail.com wrote:

 Thanks, great explanation! Yeah, if it keeps the current behavior added
 documentation would be great.

 Are there any other features that expect parameters to change as one
 pages? If not I'm concerned that it might be hard to support for clients
 that assume only the index params will change. It also makes it harder to
 work if we want to add re-ranking on a strict small set of results on the
 first page, because then we'd have to stitch together two result sets. We
 don't currently want to do that, though.

 For what it's worth, what my colleague who linked me the feature and I
 both assumed the behavior would be is that it would get all the results and
 return the ones past the re-ranking point as-is. Is that possible?

 Thanks,

 Adair




 On Tue, Aug 5, 2014 at 5:53 AM, Joel Bernstein joels...@gmail.com
 wrote:

 The comment in the code reads slightly different:

 // This enusres that reRankDocs = docs needed to satisfy the result set.
 reRankDocs = Math.max(start+rows, reRankDocs);

 I think you're right though that this is confusing. The way the
 ReRankingQParserPlugin works is that it grabs the top X documents
 (reRankDocs) and reRanks them. If the top X (reRankDocs) isn't large enough
 to satisfy the page then the result won't have enough documents.

 The intended use of this was actually to stop using query re-ranking
 when you paged past the reRanked results. So if you re-rank the top 200
 documents, you would drop the re-ranking parameter when you page to
 documents 201-220.

 So the line:
 reRankDocs = Math.max(start+rows, reRankDocs);

 Saves you from an unexpected shortfall in documents if you do page
 beyond the reRankDocs. At the very least the expected use should be
 documented and if we can figure out better behavior here that would be
 great.














 Joel Bernstein
 Search Engineer at Heliosearch


 On Mon, Aug 4, 2014 at 7:56 PM, Adair Kovac adairko...@gmail.com
 wrote:

 Looking at this line in the code:

 // This enusres that reRankDocs = docs needed to satisfy the result
 set.
 reRankDocs = Math.max(start+rows, reRankDocs);

 This looks like it would cause skips and duplicates while paging
 through the results, since if you exceed the reRankDocs parameter and keep
 finding things that match the re-ranking query, they'll get boosted earlier
 (skipped), thus pushing down items you already saw (causing duplicates).

 It's obviously intentional behavior, but there's no documentation I can
 see of why, if you request fewer documents to be re-ranked than you're
 asking to view, it goes ahead and ignores the number you asked for. What if
 I only want the top 10 out of 50 rows to be reranked? Wouldn't it be better
 to make the client choose whether to increase the reRankDocs or leave it
 the same?

 If no one replies and I have time, I might check out 4.9 and see if I
 can confirm or disprove the bug, but figured I'd bring it up now in case I
 don't end up having time. It would be good to document the reason for this
 behavior if it turns out it's necessary.

 Thanks. I'm excited about this feature btw.

 --Adair







Re: solr over hdfs for accessing/ changing indexes outside solr

2014-08-05 Thread Erick Erickson
What you haven't told us is what you mean by modify the
index outside Solr. SolrJ? Using raw Lucene? Trying to modify
things by writing your own codec? Standard Java I/O operations?
Other?

You could use SolrJ to connect to an existing Solr server and
both read and modify at will form your M/R jobs. But if you're
thinking of trying to write/modify the segment files by raw I/O
operations, good luck! I'm 99.99% certain that's going to cause
you endless grief.

Best,
Erick


On Tue, Aug 5, 2014 at 9:55 AM, Ali Nazemian alinazem...@gmail.com wrote:

 Actually I am going to do some analysis on the solr data using map reduce.
 For this purpose it might be needed to change some part of data or add new
 fields from outside solr.


 On Tue, Aug 5, 2014 at 5:51 PM, Shawn Heisey s...@elyograg.org wrote:

  On 8/5/2014 7:04 AM, Ali Nazemian wrote:
   I changed solr 4.9 to write index and data on hdfs. Now I am going to
   connect to those data from the outside of solr for changing some of the
   values. Could somebody please tell me how that is possible? Suppose I
 am
   using Hbase over hdfs for do these changes.
 
  I don't know how you could safely modify the index without a Lucene
  application or another instance of Solr, but if you do manage to modify
  the index, simply reloading the core or restarting Solr should cause it
  to pick up the changes. Either you would need to make sure that Solr
  never modifies the index, or you would need some way of coordinating
  updates so that Solr and the other application would never try to modify
  the index at the same time.
 
  Thanks,
  Shawn
 
 


 --
 A.Nazemian



Re: ExternalFileFieldReloader and commit

2014-08-05 Thread Koji Sekiguchi

Hi Peter,

It seems like a bug to me, too. Please file a JIRA ticket if you can
so that someone can take it.

Koji
--
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html

(2014/08/05 22:34), Peter Keegan wrote:

When there are multiple 'external file field' files available, Solr will
reload the last one (lexicographically) with a commit, but only if changes
were made to the index. Otherwise, it skips the reload and logs: No
uncommitted changes. Skipping IW.commit.  Has anyone else noticed this? It
seems like a bug to me. (yes, I do have firstSearcher and newSearcher event
listeners in solrconfig.xml)

Peter







Re: Implementing custom analyzer for multi-language stemming

2014-08-05 Thread TK


On 8/5/14, 8:36 AM, Rich Cariens wrote:

Of course this is extremely primitive and basic, but I think it would be
possible to write a CharFilter or TokenFilter that inspects the entire
TokenStream to guess the language(s), perhaps even noting where languages
change. Language and position information could be tracked, the TokenStream
rewound and then Tokens emitted with LanguageAttributes for downstream
Token stemmers to deal with.


I'm curious how you are planning to handle the languageAttribute.
Would each token have this attribute denoting a span of Tokens
with a language? But then how would you search
English documents that includes the term die while skipping
all the German documents which most likely to have die?

Automatic language detection works OK for long text of
regular kind of contents.  But it doesn't work well with short
text. What strategy would you use to deal with short text?

--
TK



Re: solr over hdfs for accessing/ changing indexes outside solr

2014-08-05 Thread Ali Nazemian
Dear Erick,
Hi,
Thank you for you reply. Yeah I am aware that SolrJ is my last option. I
was thinking about raw I/O operation. So according to your reply probably
it is not applicable somehow. What about the Lily project that Michael
mentioned? Is that consider SolrJ too? Are you aware of Cloudera search? I
know they provide an integrated Hadoop ecosystem. Do you know what is their
suggestion?
Best regards.



On Wed, Aug 6, 2014 at 12:28 AM, Erick Erickson erickerick...@gmail.com
wrote:

 What you haven't told us is what you mean by modify the
 index outside Solr. SolrJ? Using raw Lucene? Trying to modify
 things by writing your own codec? Standard Java I/O operations?
 Other?

 You could use SolrJ to connect to an existing Solr server and
 both read and modify at will form your M/R jobs. But if you're
 thinking of trying to write/modify the segment files by raw I/O
 operations, good luck! I'm 99.99% certain that's going to cause
 you endless grief.

 Best,
 Erick


 On Tue, Aug 5, 2014 at 9:55 AM, Ali Nazemian alinazem...@gmail.com
 wrote:

  Actually I am going to do some analysis on the solr data using map
 reduce.
  For this purpose it might be needed to change some part of data or add
 new
  fields from outside solr.
 
 
  On Tue, Aug 5, 2014 at 5:51 PM, Shawn Heisey s...@elyograg.org wrote:
 
   On 8/5/2014 7:04 AM, Ali Nazemian wrote:
I changed solr 4.9 to write index and data on hdfs. Now I am going to
connect to those data from the outside of solr for changing some of
 the
values. Could somebody please tell me how that is possible? Suppose I
  am
using Hbase over hdfs for do these changes.
  
   I don't know how you could safely modify the index without a Lucene
   application or another instance of Solr, but if you do manage to modify
   the index, simply reloading the core or restarting Solr should cause it
   to pick up the changes. Either you would need to make sure that Solr
   never modifies the index, or you would need some way of coordinating
   updates so that Solr and the other application would never try to
 modify
   the index at the same time.
  
   Thanks,
   Shawn
  
  
 
 
  --
  A.Nazemian
 




-- 
A.Nazemian


Re: solr over hdfs for accessing/ changing indexes outside solr

2014-08-05 Thread Ali Nazemian
Dear Erick,
I remembered some times ago, somebody asked about what is the point of
modify Solr to use HDFS for storing indexes. As far as I remember somebody
told him integrating Solr with HDFS has two advantages. 1) having hadoop
replication and HA. 2) using indexes and Solr documents for other purposes
such as Analysis. So why we go for HDFS in the case of analysis if we want
to use SolrJ for this purpose? What is the point?
Regards.


On Wed, Aug 6, 2014 at 8:59 AM, Ali Nazemian alinazem...@gmail.com wrote:

 Dear Erick,
 Hi,
 Thank you for you reply. Yeah I am aware that SolrJ is my last option. I
 was thinking about raw I/O operation. So according to your reply probably
 it is not applicable somehow. What about the Lily project that Michael
 mentioned? Is that consider SolrJ too? Are you aware of Cloudera search? I
 know they provide an integrated Hadoop ecosystem. Do you know what is their
 suggestion?
 Best regards.



 On Wed, Aug 6, 2014 at 12:28 AM, Erick Erickson erickerick...@gmail.com
 wrote:

 What you haven't told us is what you mean by modify the
 index outside Solr. SolrJ? Using raw Lucene? Trying to modify
 things by writing your own codec? Standard Java I/O operations?
 Other?

 You could use SolrJ to connect to an existing Solr server and
 both read and modify at will form your M/R jobs. But if you're
 thinking of trying to write/modify the segment files by raw I/O
 operations, good luck! I'm 99.99% certain that's going to cause
 you endless grief.

 Best,
 Erick


 On Tue, Aug 5, 2014 at 9:55 AM, Ali Nazemian alinazem...@gmail.com
 wrote:

  Actually I am going to do some analysis on the solr data using map
 reduce.
  For this purpose it might be needed to change some part of data or add
 new
  fields from outside solr.
 
 
  On Tue, Aug 5, 2014 at 5:51 PM, Shawn Heisey s...@elyograg.org wrote:
 
   On 8/5/2014 7:04 AM, Ali Nazemian wrote:
I changed solr 4.9 to write index and data on hdfs. Now I am going
 to
connect to those data from the outside of solr for changing some of
 the
values. Could somebody please tell me how that is possible? Suppose
 I
  am
using Hbase over hdfs for do these changes.
  
   I don't know how you could safely modify the index without a Lucene
   application or another instance of Solr, but if you do manage to
 modify
   the index, simply reloading the core or restarting Solr should cause
 it
   to pick up the changes. Either you would need to make sure that Solr
   never modifies the index, or you would need some way of coordinating
   updates so that Solr and the other application would never try to
 modify
   the index at the same time.
  
   Thanks,
   Shawn
  
  
 
 
  --
  A.Nazemian
 




 --
 A.Nazemian




-- 
A.Nazemian