RE: Unable to perform search query after changing uniqueKey

2015-04-01 Thread steve
Gently walking into rough waters here, but if you use any API with GET, you're 
sending a URI which must be properly encoded. This has nothing to do with with 
the programming language that generates key and store pairs on the browser or 
the one(s) used on the server. Lots and lots of good folks have tripped over 
this one.http://www.w3schools.com/tags/ref_urlencode.asp
Play hard, but play safe!

 Date: Wed, 1 Apr 2015 13:58:55 +0800
 Subject: Re: Unable to perform search query after changing uniqueKey
 From: edwinye...@gmail.com
 To: solr-user@lucene.apache.org
 
 Thanks Erick.
 
 Yes, it is able to work correct if I do not use spaces for the field names,
 especially for the uniqueKey.
 
 Regards,
 Edwin
 
 
 On 31 March 2015 at 13:58, Erick Erickson erickerick...@gmail.com wrote:
 
  I would never put spaces in my field names! Frankly I have no clue
  what Solr does with that, but it can't be good. Solr explicitly
  supports Java naming conventions, camel case, underscores and numbers.
  Special symbols are frowned upon, I never use anything but upper case,
  lower case and underscores. Actually, I don't use upper case either
  but that's a personal preference. Other things might work, but only by
  chance.
 
  Best,
  Erick
 
  On Mon, Mar 30, 2015 at 8:59 PM, Zheng Lin Edwin Yeo
  edwinye...@gmail.com wrote:
   Latest information that I've found for this is that the error only occurs
   for shard2.
  
   If I do a search for just shard1, those records that are assigned to
  shard1
   will be able to be displayed. Only when I search for shard2 will the
   NullPointerException error occurs. Previously I was doing a search for
  both
   shards.
  
   Is there any settings that I required to do for shard2 in order to solve
   this issue? Currently I have not made any changes to the shards since I
   created it using
  
  http://localhost:8983/solr/admin/collections?action=CREATEname=nps1numShards=2collection.configName=collection1
  
  
   Regards,
   Edwin
  
   On 31 March 2015 at 09:42, Zheng Lin Edwin Yeo edwinye...@gmail.com
  wrote:
  
   Hi Erick,
  
   I've changed the uniqueKey from id to Item No.
  
   uniqueKeyItem No/uniqueKey
  
  
   Below are my definitions for both the id and Item No.
  
   field name=id type=string indexed=true stored=true
   required=false multiValued=false /
   field name=Item No type=text_general indexed=true stored=true/
  
   Regards,
   Edwin
  
  
   On 30 March 2015 at 23:05, Erick Erickson erickerick...@gmail.com
  wrote:
  
   Well, let's see the definition of your ID field, 'cause I'm puzzled.
  
   It's definitely A Bad Thing to have it be any kind of tokenized field
   though, but that's a shot in the dark.
  
   Best,
   Erick
  
   On Mon, Mar 30, 2015 at 2:17 AM, Zheng Lin Edwin Yeo
   edwinye...@gmail.com wrote:
Hi Mostafa,
   
Yes, I've defined all the fields in schema.xml. It is able to work on
   the
version without SolrCloud, but it is not working for the one with
   SolrCloud.
Both of them are using the same schema.xml.
   
Regards,
Edwin
   
   
   
On 30 March 2015 at 14:34, Mostafa Gomaa mostafa.goma...@gmail.com
   wrote:
   
Hi Zheng,
   
It's possible that there's a problem with your schema.xml. Are all
   fields
defined and have appropriate options enabled?
   
Regards,
   
Mostafa.
   
On Mon, Mar 30, 2015 at 7:49 AM, Zheng Lin Edwin Yeo 
   edwinye...@gmail.com

wrote:
   
 Hi Erick,

 I've tried that, and removed the data directory from both the
   shards. But
 the same problem still occurs, so we probably can rule out the
   memory
 issue.

 Regards,
 Edwin

 On 30 March 2015 at 12:39, Erick Erickson 
  erickerick...@gmail.com
wrote:

  I meant shut down Solr and physically remove the entire data
  directory. Not saying this is the cure, but it can't hurt to
  rule
   out
  the index having memory...
 
  Best,
  Erick
 
  On Sun, Mar 29, 2015 at 6:35 PM, Zheng Lin Edwin Yeo
  edwinye...@gmail.com wrote:
   Hi Erick,
  
   I used the following query to delete all the index.
  
   http://localhost:8983/solr/update?stream.body=
  deletequery*:*/query/delete
  http://localhost:8983/solr/update?stream.body=commit/
  
  
   Or is it better to physically delete the entire data
  directory?
  
  
   Regards,
   Edwin
  
  
   On 28 March 2015 at 02:27, Erick Erickson 
   erickerick...@gmail.com
  wrote:
  
   You say you re-indexed, did you _completely_ remove the data
directory
   first, i.e. the parent of the index and, maybe, tlog
directories?
   I've occasionally seen remnants of old definitions pollute
   the new
   one, and since the uniqueKey key is so fundamental I can
  see
   it
   being a problem.
  
   Best,
   Erick
  
   On Fri, Mar 27, 2015 at 1:42 AM, Andrea Gazzarini 

RE: Spark-Solr in python

2015-04-01 Thread Chaushu, Shani
There is a package of python with solr-cloud
https://pypi.python.org/pypi/solrcloudpy

but I don't know if there is possibility to connect it to spark


-Original Message-
From: Timothy Potter [mailto:thelabd...@gmail.com] 
Sent: Tuesday, March 31, 2015 23:15
To: solr-user@lucene.apache.org
Subject: Re: Spark-Solr in python

You'll need a python lib that uses a python ZooKeeper client to be 
SolrCloud-aware so that you can do RDD like things, such as reading from all 
shards in a collection in parallel. I'm not aware of any Solr py libs that are 
cloud-aware yet, but it would be a good contribution to upgrade 
https://github.com/toastdriven/pysolr to be SolrCloud-aware

On Mon, Mar 30, 2015 at 11:31 PM, Chaushu, Shani shani.chau...@intel.com 
wrote:
 Hi,
 I saw there is a tool for reading solr into Spark RDD in JAVA I want 
 to do something like this in python, is there any package in python for 
 reading solr into spark RDD?

 Thanks ,
 Shani


 -
 Intel Electronics Ltd.

 This e-mail and any attachments may contain confidential material for 
 the sole use of the intended recipient(s). Any review or distribution 
 by others is strictly prohibited. If you are not the intended 
 recipient, please contact the sender and delete all copies.
-
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Solr Cloud Security not working for internal authentication

2015-04-01 Thread Swaraj Kumar
I am trying to use Solr Security on Solr 5.0 Cloud. Following process I
have used :-

 1. Modifying web.xml :-

security-constraintweb-resource-collection
   web-resource-nameAdminAllowedQueries/web-resource-name
   url-pattern/admin/*/url-pattern
 /web-resource-collection
 auth-constraint
   role-nameadmin/role-name
 /auth-constraint  /security-constraint
login-config
 auth-methodBASIC/auth-method
 realm-nameSolr Realm/realm-name/login-config

 security-role
 descriptionAdmin/description
 role-nameadmin/role-name   /security-role


   1.

   Changes in jetty.xml :-

   Call name=addBean Arg New
   class=org.eclipse.jetty.security.HashLoginService Set name=nameSolr
   Realm/Set Set name=configSystemProperty name=jetty.home
   default=.//etc/realm.properties/Set Set
   name=refreshInterval0/Set /New /Arg /Call
   2.

   Creating realm.properties:- solradmin: solradmin,admin
   3.

   Set SOLR OPTS in solr.in.sh:-

   SOLR_OPTS=$SOLR_OPTS
   -DinternalAuthCredentialsBasicAuthUsername=solradmin SOLR_OPTS=$SOLR_OPTS
   -DinternalAuthCredentialsBasicAuthPassword=solradmin

I am getting Unauthorized error while creating collection using following
command:-

curl -i -X GET \
   -H Authorization:Basic c29scmFkbWluOnNvbHJhZG1pbg== \
 
'http://localhost:8080/solr/admin/collections?action=CREATEname=testcollection.configName=testconfnumShards=1'

Kindly help or suggest the best to get this done.

Thanx in advance.


Regards,


Swaraj Kumar
Senior Software Engineer I
MakeMyTrip.com
✆ +91-9811774497


Re: Unable to perform search query after changing uniqueKey

2015-04-01 Thread Zheng Lin Edwin Yeo
Thanks Erick.

Yes, it is able to work correct if I do not use spaces for the field names,
especially for the uniqueKey.

Regards,
Edwin


On 31 March 2015 at 13:58, Erick Erickson erickerick...@gmail.com wrote:

 I would never put spaces in my field names! Frankly I have no clue
 what Solr does with that, but it can't be good. Solr explicitly
 supports Java naming conventions, camel case, underscores and numbers.
 Special symbols are frowned upon, I never use anything but upper case,
 lower case and underscores. Actually, I don't use upper case either
 but that's a personal preference. Other things might work, but only by
 chance.

 Best,
 Erick

 On Mon, Mar 30, 2015 at 8:59 PM, Zheng Lin Edwin Yeo
 edwinye...@gmail.com wrote:
  Latest information that I've found for this is that the error only occurs
  for shard2.
 
  If I do a search for just shard1, those records that are assigned to
 shard1
  will be able to be displayed. Only when I search for shard2 will the
  NullPointerException error occurs. Previously I was doing a search for
 both
  shards.
 
  Is there any settings that I required to do for shard2 in order to solve
  this issue? Currently I have not made any changes to the shards since I
  created it using
 
 http://localhost:8983/solr/admin/collections?action=CREATEname=nps1numShards=2collection.configName=collection1
 
 
  Regards,
  Edwin
 
  On 31 March 2015 at 09:42, Zheng Lin Edwin Yeo edwinye...@gmail.com
 wrote:
 
  Hi Erick,
 
  I've changed the uniqueKey from id to Item No.
 
  uniqueKeyItem No/uniqueKey
 
 
  Below are my definitions for both the id and Item No.
 
  field name=id type=string indexed=true stored=true
  required=false multiValued=false /
  field name=Item No type=text_general indexed=true stored=true/
 
  Regards,
  Edwin
 
 
  On 30 March 2015 at 23:05, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Well, let's see the definition of your ID field, 'cause I'm puzzled.
 
  It's definitely A Bad Thing to have it be any kind of tokenized field
  though, but that's a shot in the dark.
 
  Best,
  Erick
 
  On Mon, Mar 30, 2015 at 2:17 AM, Zheng Lin Edwin Yeo
  edwinye...@gmail.com wrote:
   Hi Mostafa,
  
   Yes, I've defined all the fields in schema.xml. It is able to work on
  the
   version without SolrCloud, but it is not working for the one with
  SolrCloud.
   Both of them are using the same schema.xml.
  
   Regards,
   Edwin
  
  
  
   On 30 March 2015 at 14:34, Mostafa Gomaa mostafa.goma...@gmail.com
  wrote:
  
   Hi Zheng,
  
   It's possible that there's a problem with your schema.xml. Are all
  fields
   defined and have appropriate options enabled?
  
   Regards,
  
   Mostafa.
  
   On Mon, Mar 30, 2015 at 7:49 AM, Zheng Lin Edwin Yeo 
  edwinye...@gmail.com
   
   wrote:
  
Hi Erick,
   
I've tried that, and removed the data directory from both the
  shards. But
the same problem still occurs, so we probably can rule out the
  memory
issue.
   
Regards,
Edwin
   
On 30 March 2015 at 12:39, Erick Erickson 
 erickerick...@gmail.com
   wrote:
   
 I meant shut down Solr and physically remove the entire data
 directory. Not saying this is the cure, but it can't hurt to
 rule
  out
 the index having memory...

 Best,
 Erick

 On Sun, Mar 29, 2015 at 6:35 PM, Zheng Lin Edwin Yeo
 edwinye...@gmail.com wrote:
  Hi Erick,
 
  I used the following query to delete all the index.
 
  http://localhost:8983/solr/update?stream.body=
 deletequery*:*/query/delete
 http://localhost:8983/solr/update?stream.body=commit/
 
 
  Or is it better to physically delete the entire data
 directory?
 
 
  Regards,
  Edwin
 
 
  On 28 March 2015 at 02:27, Erick Erickson 
  erickerick...@gmail.com
 wrote:
 
  You say you re-indexed, did you _completely_ remove the data
   directory
  first, i.e. the parent of the index and, maybe, tlog
   directories?
  I've occasionally seen remnants of old definitions pollute
  the new
  one, and since the uniqueKey key is so fundamental I can
 see
  it
  being a problem.
 
  Best,
  Erick
 
  On Fri, Mar 27, 2015 at 1:42 AM, Andrea Gazzarini 
 a.gazzar...@gmail.com
  wrote:
   Hi Edwin,
   please provide some other detail about your context, (e.g.
   complete
   stacktrace, query you're issuing)
  
   Best,
   Andrea
  
  
   On 03/27/2015 09:38 AM, Zheng Lin Edwin Yeo wrote:
  
   Hi everyone,
  
   I've changed my uniqueKey to another name, instead of
 using
  id,
   on
 the
   schema.xml.
  
   However, after I have done the indexing (the indexing is
successful),
  I'm
   not able to perform a search query on it. I gives the
 error
   java.lang.NullPointerException.
  
   Is there other place which I need to configure, besides
  changing
the
   uniqueKey field 

Re: Collapse and Expand behaviour on result with 1 document.

2015-04-01 Thread Derek Poh

Hi Joel

Correct me if my understanding is wrong.
Using supplier id as the field to collapse on.

- If thecollapse group heads inthe main result set has only 1document in 
each group, the expanded section will be empty since there are no 
documents to expandfor each collapse group.
- To render the page, I need to iterate the main result set. For each 
document I have to check if there is an expanded group with the same 
supplier id.
- The facets counts is based on the number of collapse groupsin the main 
result set (result maxScore=6.470696 name=response numFound=27 
start=0)


-Derek

On 3/31/2015 7:43 PM, Joel Bernstein wrote:

The way that collapse/expand is designed to be used is as follows:

The main result set will contain the collapsed group heads.

The expanded section will contain the expanded groups for the page of
results.

To render the page you iterate the main result set. For each document check
to see if there is an expanded group.




Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Mar 31, 2015 at 7:37 AM, Joel Bernstein joels...@gmail.com wrote:


You should be able to use collapse/expand with one result.

Does the document in the main result set have group members that aren't
being expanded?



Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Mar 31, 2015 at 2:00 AM, Derek Poh d...@globalsources.com wrote:


If I want to group the results (by a certain field) even if there is only
1 document, I should use the group parameter instead?
The requirement is to group the result of product documents by their
supplier id.
group=truegroup.field=P_SupplierIdgroup.limit=5

Is it true that the performance of collapse is better than group
parameter on large data set, say 10-20 million documents?

-Derek


On 3/31/2015 10:03 AM, Joel Bernstein wrote:


The expanded section will only include groups that have expanded
documents.

So, if the document that in the main result set has no documents to
expand,
then this is working as expected.



Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Mar 30, 2015 at 8:43 PM, Derek Poh d...@globalsources.com
wrote:

  Hi

I have a query which return 1 document.
When I add the collapse and expand parameters to it,
expand=trueexpand.rows=5fq={!collapse%20field=P_SupplierId}, the
expanded section is empty (lst name=expanded/).

Is this the behaviour of collapse and expand parameters on result which
contain only 1 document?

-Derek








solr 4.10.3 and index.xxxxxxxxxxx directory

2015-04-01 Thread Dominique Bejean
Hi,

Is it normal with Solr 4.10.3 that the data directory of replicas still
contains directories like

index.3636365667474747
index.999080980976

and files

index.properties
replica.properties

If yes, why and in which circumstances ?

Regards

Dominique


Solr -indexing from csv file having 28 cols taking lot of time ..plz help i m new to solr

2015-04-01 Thread avinash09

  entity name=test1
processor=LineEntityProcessor
dataSource=fds
url=test.csv
rootEntity=true
transformer=RegexTransformer,TemplateTransformer 
  field column=rawLine

regex=^(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*)$
 groupNames=test,,

,,,is_frequency_cap_enabled,,,daily_spend_limit,,, /
 field column=table_name name=table_name template=test1 /
/entity



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-from-csv-file-having-28-cols-taking-lot-of-time-plz-help-i-m-new-to-solr-tp4196904.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Collapse and Expand behaviour on result with 1 document.

2015-04-01 Thread Joel Bernstein
Exactly correct.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Apr 1, 2015 at 5:44 AM, Derek Poh d...@globalsources.com wrote:

 Hi Joel

 Correct me if my understanding is wrong.
 Using supplier id as the field to collapse on.

 - If thecollapse group heads inthe main result set has only 1document in
 each group, the expanded section will be empty since there are no documents
 to expandfor each collapse group.
 - To render the page, I need to iterate the main result set. For each
 document I have to check if there is an expanded group with the same
 supplier id.
 - The facets counts is based on the number of collapse groupsin the main
 result set (result maxScore=6.470696 name=response numFound=27
 start=0)

 -Derek


 On 3/31/2015 7:43 PM, Joel Bernstein wrote:

 The way that collapse/expand is designed to be used is as follows:

 The main result set will contain the collapsed group heads.

 The expanded section will contain the expanded groups for the page of
 results.

 To render the page you iterate the main result set. For each document
 check
 to see if there is an expanded group.




 Joel Bernstein
 http://joelsolr.blogspot.com/

 On Tue, Mar 31, 2015 at 7:37 AM, Joel Bernstein joels...@gmail.com
 wrote:

  You should be able to use collapse/expand with one result.

 Does the document in the main result set have group members that aren't
 being expanded?



 Joel Bernstein
 http://joelsolr.blogspot.com/

 On Tue, Mar 31, 2015 at 2:00 AM, Derek Poh d...@globalsources.com
 wrote:

  If I want to group the results (by a certain field) even if there is
 only
 1 document, I should use the group parameter instead?
 The requirement is to group the result of product documents by their
 supplier id.
 group=truegroup.field=P_SupplierIdgroup.limit=5

 Is it true that the performance of collapse is better than group
 parameter on large data set, say 10-20 million documents?

 -Derek


 On 3/31/2015 10:03 AM, Joel Bernstein wrote:

  The expanded section will only include groups that have expanded
 documents.

 So, if the document that in the main result set has no documents to
 expand,
 then this is working as expected.



 Joel Bernstein
 http://joelsolr.blogspot.com/

 On Mon, Mar 30, 2015 at 8:43 PM, Derek Poh d...@globalsources.com
 wrote:

   Hi

 I have a query which return 1 document.
 When I add the collapse and expand parameters to it,
 expand=trueexpand.rows=5fq={!collapse%20field=P_SupplierId}, the
 expanded section is empty (lst name=expanded/).

 Is this the behaviour of collapse and expand parameters on result
 which
 contain only 1 document?

 -Derek








Customzing Solr Dedupe

2015-04-01 Thread thakkar.aayush
I'm facing a challenges using de-dupliation of Solr documents.

De-duplicate is done using TextProfileSignature with following parameters: 
str name=fieldsfield1, field2, field3/str 
str name=quantRate0.5/str
str name=minTokenLen3/str

Here Field3 is normal text with few lines of data.
Field1 and Field2 can contain upto 5 or 6 words of data. 

I want to de-duplicate when data in field1 and field2 are exactly the same
and 90% of the lines in field3 is matched to that in another document. 

Is there anyway to achieve this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Customzing-Solr-Dedupe-tp4196879.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 3.6, Highlight and multi words?

2015-04-01 Thread Bruno Mannina
Sorry to disturb you with the renew but nobody use or have problem with 
multi-terms and highlight ?


regards,

Le 29/03/2015 21:15, Bruno Mannina a écrit :

Dear Solr User,

I try to work with highlight, it works well but only if I have only 
one keyword in my query?!

If my request is plastic AND bicycle then only plastic is highlight.

my request is:

./select/?q=ab%3A%28plastic+and+bicycle%29version=2.2start=0rows=10indent=onhl=truehl.fl=tien,abenfl=pnf.aben.hl.snippets=5 



Could you help me please to understand ? I read doc, google, without 
success...

so I post here...

my result is:



 lst  name=DE202010012045U1
arr  name=aben
  str(EP2423092A1) #CMT# #/CMT# The bicycle pedal has a pedal 
body (10) made fromlt;emgt;plasticlt;/emgt; material/str
  str, particularly for touring bike. #CMT#ADVANTAGE : #/CMT# 
The bicycle pedal has a pedal body made 
fromlt;emgt;plasticlt;/emgt;/str

/arr
  /lst
  lst  name=JP2014091382A
arr  name=aben
  str betweenlt;emgt;plasticlt;/emgt;  tapes 3 and 3 having 
two heat fusion layers, and the twolt;emgt;plasticlt;/emgt;  tapes 
3 and 3 are stuck/str

/arr
  /lst
  lst  name=DE10201740A1
arr  name=aben
  str  elements. A connecting element is formed as a hinge, a 
flexible foil or a flexiblelt;emgt;plasticlt;/emgt;  part. 
#CMT#USE/str

/arr
  /lst
  lst  name=US2008276751A1
arr  name=aben
  strA bicycle handlebar grip includes an inner fiber layer and 
an outerlt;emgt;plasticlt;/emgt; layer. Thus, the fiber/str
  str  handlebar grip, while thelt;emgt;plasticlt;/emgt;  
layer is soft and has an adjustable thickness to provide a 
comfortable/str
  str  sensation to a user. In addition, 
thelt;emgt;plasticlt;/emgt;  layer includes a holding portion 
coated on the outer surface/str
  str  layer to enhance the combination strength between the 
fiber layer and thelt;emgt;plasticlt;/emgt;  layer and to 
enhance/str

/arr
  /lst






---
Ce courrier électronique ne contient aucun virus ou logiciel 
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com





shard splitting (solr 4.4.0)

2015-04-01 Thread Ashwin Kumar
 Hello Solr Community,
 
Greetings ! This is my first post to this group.
 
I am very new to solr, so please do not mind if some of my questions below 
sound dumb :)
 
Let me explain my present setup:
 
Solr version : Solr_4.4.0 
Zookeeper version: zookeeper-3.4.5
-
 
Present Setup
Unix_box_1
One Solr instance (Collection 1 : contains around 24 million indexed documents) 
running on port 8983
 

 
Target setup
 
Now as the number of users are going to increase and also we are looking for 
high availability, I am thinking of setting up solr cloud with the following 
setup: 
 
Unix box 1
zookeeper 1(master)
Solr instance 1(Shard 1 - leader node)

 
Unix_box_2
zookeeper 2
Solr instance 2  (Shard 2)

 
Unix_box_3
zookeeper 3
Solr instance 3  (Replica for Shard 1)

 
Unix_box_4
Solr instance 4 (Replica for Shard 2)

 

 
Now following are my queries:
 
1) Is it possible for me to split the present solr running on one node with 24 
million docs under Collection1 into 2 shards as shown above ?
2) If yes how can I achieve this, and approximately how long does it take ?
3) For my application to fetch the result from solr, I need to give one solr 
url meaning http://Unix_box_1:8983/solr   . In this case if I have some docs on 
shard2 (which is on Unix_box_2) and some on shard1 (Unix_box_1), will my search 
result in the application fetch docs from both the shards and combine the 
result ? 
 
=
 
 
Thank you for your patience and time.
 
Regards,
Ashwin
  

Re: Customzing Solr Dedupe

2015-04-01 Thread Jack Krupansky
Solr dedupe is based on the concept of a signature - some fields and rules
that reduce a document into a discrete signature, and then checking if that
signature exists as a document key that can be looked up quickly in the
index. That's the conceptual basis. It is not based on any kind of field by
field comparison to all existing documents.

-- Jack Krupansky

On Wed, Apr 1, 2015 at 6:35 AM, thakkar.aayush thakkar.aay...@gmail.com
wrote:

 I'm facing a challenges using de-dupliation of Solr documents.

 De-duplicate is done using TextProfileSignature with following parameters:
 str name=fieldsfield1, field2, field3/str
 str name=quantRate0.5/str
 str name=minTokenLen3/str

 Here Field3 is normal text with few lines of data.
 Field1 and Field2 can contain upto 5 or 6 words of data.

 I want to de-duplicate when data in field1 and field2 are exactly the same
 and 90% of the lines in field3 is matched to that in another document.

 Is there anyway to achieve this?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Customzing-Solr-Dedupe-tp4196879.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr -indexing from csv file having 28 cols taking lot of time ..plz help i m new to solr

2015-04-01 Thread Alexandre Rafalovitch
Solr actually has CSV update handler. You could send file to that directly.

Have you tried that?

Regards,
Alex
On 1 Apr 2015 11:56 pm, avinash09 avinash.i...@gmail.com wrote:


   entity name=test1
 processor=LineEntityProcessor
 dataSource=fds
 url=test.csv
 rootEntity=true
 transformer=RegexTransformer,TemplateTransformer 
   field column=rawLine


 regex=^(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*)$
  groupNames=test,,

 ,,,is_frequency_cap_enabled,,,daily_spend_limit,,, /
  field column=table_name name=table_name template=test1 /
 /entity



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-indexing-from-csv-file-having-28-cols-taking-lot-of-time-plz-help-i-m-new-to-solr-tp4196904.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Suspicious message with attachment

2015-04-01 Thread help
The following message addressed to you was quarantined because it likely 
contains a virus:

Subject: Error while reading index
From: Moshe Recanati mos...@kmslh.com

However, if you know the sender and are expecting an attachment, please reply 
to this message, and we will forward the quarantined message to you.


RE: Error while reading index

2015-04-01 Thread Moshe Recanati
Hi,
I uploaded the log to drive.
https://drive.google.com/file/d/0B0GR0M-lL5QHX1B2a2NZZXh3a1E/view?usp=sharing



Regards,
Moshe Recanati
SVP Engineering
Office + 972-73-2617564
Mobile  + 972-52-6194481
Skype:  recanati
[KMS2]http://finance.yahoo.com/news/kms-lighthouse-named-gartner-cool-121000184.html
More at:  www.kmslh.comhttp://www.kmslh.com/ | 
LinkedInhttp://www.linkedin.com/company/kms-lighthouse | 
FBhttps://www.facebook.com/pages/KMS-lighthouse/123774257810917


From: Moshe Recanati [mailto:mos...@kmslh.com]
Sent: Wednesday, April 01, 2015 5:22 PM
To: solr-user@lucene.apache.org
Subject: Error while reading index

Hi,
We're running on production environment with Solr 4.7.1 master and slave with 
replication every 1 minute.
During regular activity and index delta build we got the following error:
ERROR - 2015-03-30 04:06:12.318; java.lang.RuntimeException: [was class 
java.net.SocketException] Connection reset
at 
com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
at 
com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)

After additional 2 minutes we got the following error:
ERROR - 2015-03-30 04:07:39.875; Unable to get file names for indexCommit 
generation: 638
java.io.FileNotFoundException: _tu.fdt
at 
org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:261)
at 
org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:178)

And since than Solr wasn't recover until we did full rebuild of all documents.
Detailed log attached.

Let me know if you familiar with such issue.
And what can create such issue that prevent from recovery and requires rebuild 
index. This is major issue for us.

Thank you in advance,


Regards,
Moshe Recanati
SVP Engineering
Office + 972-73-2617564
Mobile  + 972-52-6194481
Skype:  recanati
[KMS2]http://finance.yahoo.com/news/kms-lighthouse-named-gartner-cool-121000184.html
More at:  www.kmslh.comhttp://www.kmslh.com/ | 
LinkedInhttp://www.linkedin.com/company/kms-lighthouse | 
FBhttps://www.facebook.com/pages/KMS-lighthouse/123774257810917




Re: Solr -indexing from csv file having 28 cols taking lot of time ..plz help i m new to solr

2015-04-01 Thread avinash09
no could you please share an example



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-from-csv-file-having-28-cols-taking-lot-of-time-plz-help-i-m-new-to-solr-tp4196904p4196928.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr 3.6, Highlight and multi words?

2015-04-01 Thread Reitzel, Charles
Haven't used Solr 3.x in a long time.  But with 4.10.x, I haven't had any 
trouble with multiple terms.  I'd look at a few things.

1.  Do you have a typo in your query?  Shouldn't it be q=aben:(plastic and 
bicycle)?

   ^^
2. Try removing the word and from the query.  There may be some interaction 
with a stop word filter.  If you want a phrase query, wrap it in quotes.  

3.  Also, be sure that the query and indexing analyzers for the aben field are 
compatible with each other.

-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr] 
Sent: Wednesday, April 01, 2015 7:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

Sorry to disturb you with the renew but nobody use or have problem with 
multi-terms and highlight ?

regards,

Le 29/03/2015 21:15, Bruno Mannina a écrit :
 Dear Solr User,

 I try to work with highlight, it works well but only if I have only 
 one keyword in my query?!
 If my request is plastic AND bicycle then only plastic is highlight.

 my request is:

 ./select/?q=ab%3A%28plastic+and+bicycle%29version=2.2start=0row
 s=10indent=onhl=truehl.fl=tien,abenfl=pnf.aben.hl.snippets=5


 Could you help me please to understand ? I read doc, google, without 
 success...
 so I post here...

 my result is:

 

  lst  name=DE202010012045U1
 arr  name=aben
   str(EP2423092A1) #CMT# #/CMT# The bicycle pedal has a pedal 
 body (10) made fromlt;emgt;plasticlt;/emgt; material/str
   str, particularly for touring bike. #CMT#ADVANTAGE : #/CMT# 
 The bicycle pedal has a pedal body made 
 fromlt;emgt;plasticlt;/emgt;/str
 /arr
   /lst
   lst  name=JP2014091382A
 arr  name=aben
   str betweenlt;emgt;plasticlt;/emgt;  tapes 3 and 3 having 
 two heat fusion layers, and the twolt;emgt;plasticlt;/emgt;  tapes
 3 and 3 are stuck/str
 /arr
   /lst
   lst  name=DE10201740A1
 arr  name=aben
   str  elements. A connecting element is formed as a hinge, a 
 flexible foil or a flexiblelt;emgt;plasticlt;/emgt;  part.
 #CMT#USE/str
 /arr
   /lst
   lst  name=US2008276751A1
 arr  name=aben
   strA bicycle handlebar grip includes an inner fiber layer and 
 an outerlt;emgt;plasticlt;/emgt; layer. Thus, the fiber/str
   str  handlebar grip, while thelt;emgt;plasticlt;/emgt; 
 layer is soft and has an adjustable thickness to provide a 
 comfortable/str
   str  sensation to a user. In addition, 
 thelt;emgt;plasticlt;/emgt;  layer includes a holding portion 
 coated on the outer surface/str
   str  layer to enhance the combination strength between the 
 fiber layer and thelt;emgt;plasticlt;/emgt;  layer and to 
 enhance/str
 /arr
   /lst


*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*


Re: shard splitting (solr 4.4.0)

2015-04-01 Thread Erick Erickson
Ashwin:

First, if at all possible I would simply set up my new SolrCloud
structure (2 shards, a leader and follower each) and re-index the
entire corpus. 24M docs isn't really very many, and you'll have to
have this capability sometime since somone, somewhere will want to
change the schema in ways that require it.

But to answer your questions:
1: Certainly. There's the SPLITSHARD command, see:
https://cwiki.apache.org/confluence/display/solr/Collections+API. That
said, Solr 4.4 used a relatively early version of SPLITSHARD and there
have been many improvements so make sure and back up first.

2: Not quite sure how long it takes, but I wouldn't expect it to take
hours. A lot depends on what the docs are like.

3: Yes, sending a query (or update for that matter) to any node in the
cluster will do the right thing. In a production environment, and
assuming you're not using SolrJ, I'd put a load balancer in front of
the cluster for queries. If you _are_ querying through SolrJ from the
application, you only need to use the CloudSolrServer class as it
includes a software load balancer by default. Otherwise, if you
hard-code a single machine that machine becomes a single point of
failure.

Best,
Erick

On Wed, Apr 1, 2015 at 4:55 AM, Ashwin Kumar ashwins...@outlook.de wrote:
  Hello Solr Community,

 Greetings ! This is my first post to this group.

 I am very new to solr, so please do not mind if some of my questions below 
 sound dumb :)

 Let me explain my present setup:

 Solr version : Solr_4.4.0
 Zookeeper version: zookeeper-3.4.5
 -

 Present Setup
 Unix_box_1
 One Solr instance (Collection 1 : contains around 24 million indexed 
 documents) running on port 8983

 

 Target setup

 Now as the number of users are going to increase and also we are looking for 
 high availability, I am thinking of setting up solr cloud with the following 
 setup:

 Unix box 1
 zookeeper 1(master)
 Solr instance 1(Shard 1 - leader node)
 

 Unix_box_2
 zookeeper 2
 Solr instance 2  (Shard 2)
 

 Unix_box_3
 zookeeper 3
 Solr instance 3  (Replica for Shard 1)
 

 Unix_box_4
 Solr instance 4 (Replica for Shard 2)
 

 

 Now following are my queries:

 1) Is it possible for me to split the present solr running on one node with 
 24 million docs under Collection1 into 2 shards as shown above ?
 2) If yes how can I achieve this, and approximately how long does it take ?
 3) For my application to fetch the result from solr, I need to give one solr 
 url meaning http://Unix_box_1:8983/solr   . In this case if I have some docs 
 on shard2 (which is on Unix_box_2) and some on shard1 (Unix_box_1), will my 
 search result in the application fetch docs from both the shards and combine 
 the result ?

 =


 Thank you for your patience and time.

 Regards,
 Ashwin



Re: Solr -indexing from csv file having 28 cols taking lot of time ..plz help i m new to solr

2015-04-01 Thread avinash09
sir , a silly  question m confuse here what is difference between data import
handler and update csv



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-from-csv-file-having-28-cols-taking-lot-of-time-plz-help-i-m-new-to-solr-tp4196904p4196940.html
Sent from the Solr - User mailing list archive at Nabble.com.


Information regarding This conf directory is not valid SolrException.

2015-04-01 Thread Bar Weiner
Hi,

I'm working on upgrading a project from solr-4.10.3 to solr-5.0.0.
As part of our JUnit tests we have a few tests for deleting/creating
collections. Each test createdelete a collection with a different name,
but they all share the same config in ZK.
When running these tests in Eclipse everything works fine, but when running
the same tests through Maven we get the following error so I suspect this
is a timing related issue :

INFO  org.apache.solr.rest.ManagedResourceStorage  – Setting up
ZooKeeper-based storage for the RestManager with znodeBase:
/configs/SIMPLE_CONFIG
INFO  org.apache.solr.rest.ManagedResourceStorage  – Configured
ZooKeeperStorageIO with znodeBase: /configs/SIMPLE_CONFIG
INFO  org.apache.solr.rest.RestManager  – Initializing RestManager with
initArgs: {}
INFO  org.apache.solr.rest.ManagedResourceStorage  – Reading
_rest_managed.json using ZooKeeperStorageIO:path=/configs/SIMPLE_CONFIG
INFO  org.apache.solr.rest.ManagedResourceStorage  – No data found for
znode /configs/SIMPLE_CONFIG/_rest_managed.json
INFO  org.apache.solr.rest.ManagedResourceStorage  – Loaded null at path
_rest_managed.json using ZooKeeperStorageIO:path=/configs/SIMPLE_CONFIG
INFO  org.apache.solr.rest.RestManager  – Initializing 0 registered
ManagedResources
INFO  org.apache.solr.handler.ReplicationHandler  – Commits will be
reserved for  1
INFO  org.apache.solr.core.SolrCore  – [mycollection1] Registered new
searcher Searcher@3208a6c4[mycollection1]
main{ExitableDirectoryReader(UninvertingDirectoryReader())}
ERROR org.apache.solr.core.CoreContainer  – Error creating core
[mycollection1]: This conf directory is not valid
org.apache.solr.common.SolrException: This conf directory is not valid
at
org.apache.solr.cloud.ZkController.registerConfListenerForCore(ZkController.java:2229)
at
org.apache.solr.core.SolrCore.registerConfListener(SolrCore.java:2633)
at org.apache.solr.core.SolrCore.init(SolrCore.java:936)
at org.apache.solr.core.SolrCore.init(SolrCore.java:662)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:513)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:488)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:573)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:197)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:736)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:261)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at

Re: Unable to perform search query after changing uniqueKey

2015-04-01 Thread Erick Erickson
Steve:

Totally agree. Even if you _do_ correctly escape the URL though,
there's no guarantee that Solr will do the right thing with field
names with spaces. Plus endless chances for you to get it wrong when
constructing the URL

Best,
Erick

On Wed, Apr 1, 2015 at 1:01 AM, steve sc_shep...@hotmail.com wrote:
 Gently walking into rough waters here, but if you use any API with GET, 
 you're sending a URI which must be properly encoded. This has nothing to do 
 with with the programming language that generates key and store pairs on the 
 browser or the one(s) used on the server. Lots and lots of good folks have 
 tripped over this one.http://www.w3schools.com/tags/ref_urlencode.asp
 Play hard, but play safe!

 Date: Wed, 1 Apr 2015 13:58:55 +0800
 Subject: Re: Unable to perform search query after changing uniqueKey
 From: edwinye...@gmail.com
 To: solr-user@lucene.apache.org

 Thanks Erick.

 Yes, it is able to work correct if I do not use spaces for the field names,
 especially for the uniqueKey.

 Regards,
 Edwin


 On 31 March 2015 at 13:58, Erick Erickson erickerick...@gmail.com wrote:

  I would never put spaces in my field names! Frankly I have no clue
  what Solr does with that, but it can't be good. Solr explicitly
  supports Java naming conventions, camel case, underscores and numbers.
  Special symbols are frowned upon, I never use anything but upper case,
  lower case and underscores. Actually, I don't use upper case either
  but that's a personal preference. Other things might work, but only by
  chance.
 
  Best,
  Erick
 
  On Mon, Mar 30, 2015 at 8:59 PM, Zheng Lin Edwin Yeo
  edwinye...@gmail.com wrote:
   Latest information that I've found for this is that the error only occurs
   for shard2.
  
   If I do a search for just shard1, those records that are assigned to
  shard1
   will be able to be displayed. Only when I search for shard2 will the
   NullPointerException error occurs. Previously I was doing a search for
  both
   shards.
  
   Is there any settings that I required to do for shard2 in order to solve
   this issue? Currently I have not made any changes to the shards since I
   created it using
  
  http://localhost:8983/solr/admin/collections?action=CREATEname=nps1numShards=2collection.configName=collection1
  
  
   Regards,
   Edwin
  
   On 31 March 2015 at 09:42, Zheng Lin Edwin Yeo edwinye...@gmail.com
  wrote:
  
   Hi Erick,
  
   I've changed the uniqueKey from id to Item No.
  
   uniqueKeyItem No/uniqueKey
  
  
   Below are my definitions for both the id and Item No.
  
   field name=id type=string indexed=true stored=true
   required=false multiValued=false /
   field name=Item No type=text_general indexed=true stored=true/
  
   Regards,
   Edwin
  
  
   On 30 March 2015 at 23:05, Erick Erickson erickerick...@gmail.com
  wrote:
  
   Well, let's see the definition of your ID field, 'cause I'm puzzled.
  
   It's definitely A Bad Thing to have it be any kind of tokenized field
   though, but that's a shot in the dark.
  
   Best,
   Erick
  
   On Mon, Mar 30, 2015 at 2:17 AM, Zheng Lin Edwin Yeo
   edwinye...@gmail.com wrote:
Hi Mostafa,
   
Yes, I've defined all the fields in schema.xml. It is able to work on
   the
version without SolrCloud, but it is not working for the one with
   SolrCloud.
Both of them are using the same schema.xml.
   
Regards,
Edwin
   
   
   
On 30 March 2015 at 14:34, Mostafa Gomaa mostafa.goma...@gmail.com
   wrote:
   
Hi Zheng,
   
It's possible that there's a problem with your schema.xml. Are all
   fields
defined and have appropriate options enabled?
   
Regards,
   
Mostafa.
   
On Mon, Mar 30, 2015 at 7:49 AM, Zheng Lin Edwin Yeo 
   edwinye...@gmail.com

wrote:
   
 Hi Erick,

 I've tried that, and removed the data directory from both the
   shards. But
 the same problem still occurs, so we probably can rule out the
   memory
 issue.

 Regards,
 Edwin

 On 30 March 2015 at 12:39, Erick Erickson 
  erickerick...@gmail.com
wrote:

  I meant shut down Solr and physically remove the entire data
  directory. Not saying this is the cure, but it can't hurt to
  rule
   out
  the index having memory...
 
  Best,
  Erick
 
  On Sun, Mar 29, 2015 at 6:35 PM, Zheng Lin Edwin Yeo
  edwinye...@gmail.com wrote:
   Hi Erick,
  
   I used the following query to delete all the index.
  
   http://localhost:8983/solr/update?stream.body=
  deletequery*:*/query/delete
  http://localhost:8983/solr/update?stream.body=commit/
  
  
   Or is it better to physically delete the entire data
  directory?
  
  
   Regards,
   Edwin
  
  
   On 28 March 2015 at 02:27, Erick Erickson 
   erickerick...@gmail.com
  wrote:
  
   You say you re-indexed, did you _completely_ remove the data
directory
   first, i.e. the parent of 

Re: Solr -indexing from csv file having 28 cols taking lot of time ..plz help i m new to solr

2015-04-01 Thread Erick Erickson
Data Import Handler is a process in Solr that reaches out, grabs
something external and indexes it. Something external can be a
database, files on the server etc. Along the way, you can do many
transformations of the data. The point is that the source can be
anything.

The update handler is an end-point in Solr that expects certain
specific formats and puts them in the index. For instance, if you
index XML, it _must_ be in a very specific form to throw at the update
handler, something like
add
   doc
 field...
 field...
   /doc
   doc
 field...
 field...
   /doc
/add

The csv update handler is just an update handler that expects CSV
files. The headers are usually the field names although you can map
them from the column header in your csv file to your Solr schema.

In importing csv files should be very fast. I suspect your regex is costly.

As Alexandre says, though, it would be a good idea to go through the
CSV import tutorial. The Solr reference guide has the details:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates

Best,
Erick

On Wed, Apr 1, 2015 at 8:04 AM, avinash09 avinash.i...@gmail.com wrote:
 sir , a silly  question m confuse here what is difference between data import
 handler and update csv



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-indexing-from-csv-file-having-28-cols-taking-lot-of-time-plz-help-i-m-new-to-solr-tp4196904p4196940.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr -indexing from csv file having 28 cols taking lot of time ..plz help i m new to solr

2015-04-01 Thread Alexandre Rafalovitch
Well, I believe the tutorial has an example. Always a good thing -
going through the tutorial.

And the reference guide has the details:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates
.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 2 April 2015 at 01:37, avinash09 avinash.i...@gmail.com wrote:
 no could you please share an example



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-indexing-from-csv-file-having-28-cols-taking-lot-of-time-plz-help-i-m-new-to-solr-tp4196904p4196928.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr 4.10.3 and index.xxxxxxxxxxx directory

2015-04-01 Thread Shawn Heisey
On 4/1/2015 6:35 AM, Dominique Bejean wrote:
 Is it normal with Solr 4.10.3 that the data directory of replicas still
 contains directories like

 index.3636365667474747
 index.999080980976

 and files

 index.properties
 replica.properties

 If yes, why and in which circumstances ?

The index. directories are created during master/slave
index replication.  If you're running SolrCloud, then replication is
only used for index recovery.  Index recovery is only required in
situations where the replicas are so far behind that the transaction log
cannot be used to synchronize them, and sometimes happens when a Solr
node is restarted.  If SolrCloud index recovery is actually required
when you are NOT restarting Solr instances, your index might be having
problems.

Regardless of whether you're running SolrCloud or not, normally when one
of those directories with a numeric suffix is created, it will be
changed to index with no suffix after the replication is complete, but
if Solr is unable to change the directories for some reason, it will
simply keep and use the new directory with the suffix.  Do you see any
ERROR or WARN entries in your solr logfile that would indicate why Solr
cannot change the directory name?  Are you on Windows?  Problems like
this are more common on Windows, because Windows prevents a lot of file
operations when files/directories are open.

The long-term existence of directories with this naming convention
indicates that *something* went wrong, but you would need to consult
your logs to find out what happened.  There have been several bugs over
Solr's history that cause this problem.

Thanks,
Shawn



How to recover a Shard

2015-04-01 Thread Matt Kuiper
Hello,

I have a SolrCloud (4.10.1) where for one of the shards, both replicas are in a 
Recovery Failed state per the Solr Admin Cloud page.  The logs contains the 
following type of entries for the two Solr nodes involved, including statements 
that it will retry.

Is there a way to recover from this state?

Maybe bring down one replica, and then somehow declare that the remaining 
replica is to be the leader?  Understand this would not be ideal as the new 
leader may be missing documents that were sent its way to be indexed while it 
was down, but would be better than having to rebuild the whole cloud.

Any tips or suggestions would be appreciated.

Thanks,
Matt

Solr node .65
Error while trying to recover. 
core=kla_collection_shard6_replica5:org.apache.solr.common.SolrException: No 
registered leader was found after waiting for 4000ms , collection: 
kla_collection slice: shard6
 at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:568)
 at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:551)
 at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:332)
 at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
Solr node .64

Error while trying to recover. 
core=kla_collection_shard6_replica2:org.apache.solr.common.SolrException: No 
registered leader was found after waiting for 4000ms , collection: 
kla_collection slice: shard6

 at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:568)

 at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:551)

 at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:332)

 at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)



RE: How to recover a Shard

2015-04-01 Thread Matt Kuiper
Maybe I have been working too many long hours as I missed the obvious solution 
of bringing down/up one of the Solr nodes backing one of the replicas, and then 
the same for the second node.  This did the trick.

Since I brought this topic up, I will narrow the question a bit:  Would there 
be a way to recover without restarting the Solr node?  Basically to delete one 
replica and then somehow declare the other replica the leader and break it out 
of its recovery process?

Thanks,
Matt


From: Matt Kuiper
Sent: Wednesday, April 01, 2015 8:43 PM
To: solr-user@lucene.apache.org
Subject: How to recover a Shard

Hello,

I have a SolrCloud (4.10.1) where for one of the shards, both replicas are in a 
Recovery Failed state per the Solr Admin Cloud page.  The logs contains the 
following type of entries for the two Solr nodes involved, including statements 
that it will retry.

Is there a way to recover from this state?

Maybe bring down one replica, and then somehow declare that the remaining 
replica is to be the leader?  Understand this would not be ideal as the new 
leader may be missing documents that were sent its way to be indexed while it 
was down, but would be better than having to rebuild the whole cloud.

Any tips or suggestions would be appreciated.

Thanks,
Matt

Solr node .65
Error while trying to recover. 
core=kla_collection_shard6_replica5:org.apache.solr.common.SolrException: No 
registered leader was found after waiting for 4000ms , collection: 
kla_collection slice: shard6
 at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:568)
 at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:551)
 at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:332)
 at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
Solr node .64

Error while trying to recover. 
core=kla_collection_shard6_replica2:org.apache.solr.common.SolrException: No 
registered leader was found after waiting for 4000ms , collection: 
kla_collection slice: shard6

 at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:568)

 at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:551)

 at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:332)

 at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)



Re: Solr went on recovery multiple time.

2015-04-01 Thread William Bell
I would give it 32GB of RAM. And try to use SSD.

On Tue, Mar 31, 2015 at 12:50 AM, sthita sthit...@gmail.com wrote:

 Hi Bill, My index size is around 48GB and contains around 8 million
 documents.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-went-on-recovery-multiple-time-tp4196249p4196504.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: SolrCloud 5.0 cluster RAM requirements

2015-04-01 Thread Shawn Heisey

On 4/1/2015 3:22 PM, Ryan Steele wrote:
Does a SolrCloud 5.0 cluster need enough RAM across the cluster to 
load all the collections into RAM at all times?


Need is too strong a word.  If you want the best possible performance, 
then you would have enough RAM across the cluster to cache the entire 
index.  That's not required for a *functional* system, ignoring 
performance.  For an index on that scale, caching the entire index is 
usually an unrealistically expensive goal.


Are you the person who mentioned a terabyte scale SolrCloud index on the 
#solr IRC channel that's hosted on Amazon?


Here's a general wiki page on performance problems with Solr that has a 
large amount of focus on RAM:


http://wiki.apache.org/solr/SolrPerformanceProblems

The unfortunate fact about this is that the only way you'll figure out 
what you actually need is to prototype, and prototyping on the scale of 
your index is difficult and expensive.


https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Thanks,
Shawn



Re: Unable to perform search query after changing uniqueKey

2015-04-01 Thread Zheng Lin Edwin Yeo
Hi Steve,

Thanks for the link and the information.

Regards,
Edwin


On 1 April 2015 at 23:17, Erick Erickson erickerick...@gmail.com wrote:

 Steve:

 Totally agree. Even if you _do_ correctly escape the URL though,
 there's no guarantee that Solr will do the right thing with field
 names with spaces. Plus endless chances for you to get it wrong when
 constructing the URL

 Best,
 Erick

 On Wed, Apr 1, 2015 at 1:01 AM, steve sc_shep...@hotmail.com wrote:
  Gently walking into rough waters here, but if you use any API with GET,
 you're sending a URI which must be properly encoded. This has nothing to do
 with with the programming language that generates key and store pairs on
 the browser or the one(s) used on the server. Lots and lots of good folks
 have tripped over this one.http://www.w3schools.com/tags/ref_urlencode.asp
  Play hard, but play safe!
 
  Date: Wed, 1 Apr 2015 13:58:55 +0800
  Subject: Re: Unable to perform search query after changing uniqueKey
  From: edwinye...@gmail.com
  To: solr-user@lucene.apache.org
 
  Thanks Erick.
 
  Yes, it is able to work correct if I do not use spaces for the field
 names,
  especially for the uniqueKey.
 
  Regards,
  Edwin
 
 
  On 31 March 2015 at 13:58, Erick Erickson erickerick...@gmail.com
 wrote:
 
   I would never put spaces in my field names! Frankly I have no clue
   what Solr does with that, but it can't be good. Solr explicitly
   supports Java naming conventions, camel case, underscores and numbers.
   Special symbols are frowned upon, I never use anything but upper case,
   lower case and underscores. Actually, I don't use upper case either
   but that's a personal preference. Other things might work, but only by
   chance.
  
   Best,
   Erick
  
   On Mon, Mar 30, 2015 at 8:59 PM, Zheng Lin Edwin Yeo
   edwinye...@gmail.com wrote:
Latest information that I've found for this is that the error only
 occurs
for shard2.
   
If I do a search for just shard1, those records that are assigned to
   shard1
will be able to be displayed. Only when I search for shard2 will the
NullPointerException error occurs. Previously I was doing a search
 for
   both
shards.
   
Is there any settings that I required to do for shard2 in order to
 solve
this issue? Currently I have not made any changes to the shards
 since I
created it using
   
  
 http://localhost:8983/solr/admin/collections?action=CREATEname=nps1numShards=2collection.configName=collection1
   
   
Regards,
Edwin
   
On 31 March 2015 at 09:42, Zheng Lin Edwin Yeo 
 edwinye...@gmail.com
   wrote:
   
Hi Erick,
   
I've changed the uniqueKey from id to Item No.
   
uniqueKeyItem No/uniqueKey
   
   
Below are my definitions for both the id and Item No.
   
field name=id type=string indexed=true stored=true
required=false multiValued=false /
field name=Item No type=text_general indexed=true
 stored=true/
   
Regards,
Edwin
   
   
On 30 March 2015 at 23:05, Erick Erickson erickerick...@gmail.com
 
   wrote:
   
Well, let's see the definition of your ID field, 'cause I'm
 puzzled.
   
It's definitely A Bad Thing to have it be any kind of tokenized
 field
though, but that's a shot in the dark.
   
Best,
Erick
   
On Mon, Mar 30, 2015 at 2:17 AM, Zheng Lin Edwin Yeo
edwinye...@gmail.com wrote:
 Hi Mostafa,

 Yes, I've defined all the fields in schema.xml. It is able to
 work on
the
 version without SolrCloud, but it is not working for the one
 with
SolrCloud.
 Both of them are using the same schema.xml.

 Regards,
 Edwin



 On 30 March 2015 at 14:34, Mostafa Gomaa 
 mostafa.goma...@gmail.com
wrote:

 Hi Zheng,

 It's possible that there's a problem with your schema.xml. Are
 all
fields
 defined and have appropriate options enabled?

 Regards,

 Mostafa.

 On Mon, Mar 30, 2015 at 7:49 AM, Zheng Lin Edwin Yeo 
edwinye...@gmail.com
 
 wrote:

  Hi Erick,
 
  I've tried that, and removed the data directory from both the
shards. But
  the same problem still occurs, so we probably can rule out
 the
memory
  issue.
 
  Regards,
  Edwin
 
  On 30 March 2015 at 12:39, Erick Erickson 
   erickerick...@gmail.com
 wrote:
 
   I meant shut down Solr and physically remove the entire
 data
   directory. Not saying this is the cure, but it can't hurt
 to
   rule
out
   the index having memory...
  
   Best,
   Erick
  
   On Sun, Mar 29, 2015 at 6:35 PM, Zheng Lin Edwin Yeo
   edwinye...@gmail.com wrote:
Hi Erick,
   
I used the following query to delete all the index.
   
http://localhost:8983/solr/update?stream.body=
   deletequery*:*/query/delete
   http://localhost:8983/solr/update?stream.body=commit/
   
   
Or is it better 

Re: Solr -indexing from csv file having 28 cols taking lot of time ..plz help i m new to solr

2015-04-01 Thread avinash09
thanks Erick and Alexandre Rafalovitch R

one more doubt how to pass ctrl A(^A) seprator while csv upload  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-from-csv-file-having-28-cols-taking-lot-of-time-plz-help-i-m-new-to-solr-tp4196904p4196998.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 3.6, Highlight and multi words?

2015-04-01 Thread Bruno Mannina

Dear Charles,

Thanks for your answer, please find below my answers.

ok it works if I use aben as field in my query as you say in Answer 1.
it doesn't work if I use ab may be because ab field is a copyField 
for abfr, aben, abit, abpt


Concerning the 2., yes you have right it's not and but AND

I have this result:

lst  name=DE102009043935B3
arr  name=tien
  strlt;emgt;Bicyclelt;/emgt;  frame comprises holder, particularly for 
water bottle, where holder is connected/str
/arr
arr  name=aben
  str#CMT# #/CMT# Thelt;emgt;bicyclelt;/emgt;  frame (7) comprises a holder 
(1), particularly for a water bottle/str
  str. The holder is connected with thelt;emgt;bicyclelt;/emgt;  frame by a 
screw (5), where a mounting element has a compensation/str
  str  section which is made of an elastic material, particularly 
alt;emgt;plasticlt;/emgt;  material. The compensation section/str
/arr
  /lst


So my last question is why I haven't em/em instead having colored ?
How can I tell to solr to use the colored ?

Thanks a lot,
Bruno


Le 01/04/2015 17:15, Reitzel, Charles a écrit :

Haven't used Solr 3.x in a long time.  But with 4.10.x, I haven't had any 
trouble with multiple terms.  I'd look at a few things.

1.  Do you have a typo in your query?  Shouldn't it be q=aben:(plastic and 
bicycle)?

^^
2. Try removing the word and from the query.  There may be some interaction 
with a stop word filter.  If you want a phrase query, wrap it in quotes.

3.  Also, be sure that the query and indexing analyzers for the aben field are 
compatible with each other.

-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr]
Sent: Wednesday, April 01, 2015 7:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

Sorry to disturb you with the renew but nobody use or have problem with 
multi-terms and highlight ?

regards,

Le 29/03/2015 21:15, Bruno Mannina a écrit :

Dear Solr User,

I try to work with highlight, it works well but only if I have only
one keyword in my query?!
If my request is plastic AND bicycle then only plastic is highlight.

my request is:

./select/?q=ab%3A%28plastic+and+bicycle%29version=2.2start=0row
s=10indent=onhl=truehl.fl=tien,abenfl=pnf.aben.hl.snippets=5


Could you help me please to understand ? I read doc, google, without
success...
so I post here...

my result is:



  lst  name=DE202010012045U1
 arr  name=aben
   str(EP2423092A1) #CMT# #/CMT# The bicycle pedal has a pedal
body (10) made fromlt;emgt;plasticlt;/emgt; material/str
   str, particularly for touring bike. #CMT#ADVANTAGE : #/CMT#
The bicycle pedal has a pedal body made
fromlt;emgt;plasticlt;/emgt;/str
 /arr
   /lst
   lst  name=JP2014091382A
 arr  name=aben
   str betweenlt;emgt;plasticlt;/emgt;  tapes 3 and 3 having
two heat fusion layers, and the twolt;emgt;plasticlt;/emgt;  tapes
3 and 3 are stuck/str
 /arr
   /lst
   lst  name=DE10201740A1
 arr  name=aben
   str  elements. A connecting element is formed as a hinge, a
flexible foil or a flexiblelt;emgt;plasticlt;/emgt;  part.
#CMT#USE/str
 /arr
   /lst
   lst  name=US2008276751A1
 arr  name=aben
   strA bicycle handlebar grip includes an inner fiber layer and
an outerlt;emgt;plasticlt;/emgt; layer. Thus, the fiber/str
   str  handlebar grip, while thelt;emgt;plasticlt;/emgt;
layer is soft and has an adjustable thickness to provide a
comfortable/str
   str  sensation to a user. In addition,
thelt;emgt;plasticlt;/emgt;  layer includes a holding portion
coated on the outer surface/str
   str  layer to enhance the combination strength between the
fiber layer and thelt;emgt;plasticlt;/emgt;  layer and to
enhance/str
 /arr
   /lst


*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*




Re: Customzing Solr Dedupe

2015-04-01 Thread Dan Davis
But you can potentially still use Solr dedupe if you do the upfront work
(in RDMS or NoSQL pre-index processing) to assign some sort of Group ID.
  See OCLC's FRBR Work-Set Algorithm,
http://www.oclc.org/content/dam/research/activities/frbralgorithm/2009-08.pdf?urlm=161376
, for some details on one such algorithm.

If the job is too big for RDBMS, and/or you don't want to use/have a
suitable NoSQL, you can have two Solr indexes (collection/core/whatever) -
one for classification with only id, field1, field2, field3, and another
for production query.   Then, you put stuff into the classification index,
use queries and your own algorithm to do classification, assigning a
groupId, and then put the document with groupId assigned into the
production database.

A key question is whether you want to preserve the groupId.   In some
cases, you do, and in some cases, it is just an internal signature.   In
both cases, a non-deterministic up-front algorithm can work, but if the
groupId needs to be preserved, you need to work harder to make sure it all
hangs together.

Hope this helps,

-Dan

On Wed, Apr 1, 2015 at 7:05 AM, Jack Krupansky jack.krupan...@gmail.com
wrote:

 Solr dedupe is based on the concept of a signature - some fields and rules
 that reduce a document into a discrete signature, and then checking if that
 signature exists as a document key that can be looked up quickly in the
 index. That's the conceptual basis. It is not based on any kind of field by
 field comparison to all existing documents.

 -- Jack Krupansky

 On Wed, Apr 1, 2015 at 6:35 AM, thakkar.aayush thakkar.aay...@gmail.com
 wrote:

  I'm facing a challenges using de-dupliation of Solr documents.
 
  De-duplicate is done using TextProfileSignature with following
 parameters:
  str name=fieldsfield1, field2, field3/str
  str name=quantRate0.5/str
  str name=minTokenLen3/str
 
  Here Field3 is normal text with few lines of data.
  Field1 and Field2 can contain upto 5 or 6 words of data.
 
  I want to de-duplicate when data in field1 and field2 are exactly the
 same
  and 90% of the lines in field3 is matched to that in another document.
 
  Is there anyway to achieve this?
 
 
 
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/Customzing-Solr-Dedupe-tp4196879.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: solr 4.10.3 and index.xxxxxxxxxxx directory

2015-04-01 Thread Dominique Bejean
Hi Shawn,

Thank you for your response.

This is a Solrcloud installation on Centos.

There are 5 servers with 128 Gb ram each.
The collection contains 650 millions of small documents.
There are 3 shards with replicationfactor = 2 (so 9 cores).
The JVM Xmx parameter was set to 96 Gb. We changed it yesterday to 32 Gb in
order to be under the CompressedOops limit and free the direct memory for
MMapDirectory.

I will have access to both full solr and tomcat logs tomorrow.

What I know, is that there are some zookeeper time out in solr logs.
And the replications occur on some nodes after some commits (after DIH
import) and when nodes restart.

So, I will have more precise log messages tomorrow.

Thank you for your response.

Dominique



2015-04-01 18:29 GMT+02:00 Shawn Heisey apa...@elyograg.org:

 On 4/1/2015 6:35 AM, Dominique Bejean wrote:
  Is it normal with Solr 4.10.3 that the data directory of replicas still
  contains directories like
 
  index.3636365667474747
  index.999080980976
 
  and files
 
  index.properties
  replica.properties
 
  If yes, why and in which circumstances ?

 The index. directories are created during master/slave
 index replication.  If you're running SolrCloud, then replication is
 only used for index recovery.  Index recovery is only required in
 situations where the replicas are so far behind that the transaction log
 cannot be used to synchronize them, and sometimes happens when a Solr
 node is restarted.  If SolrCloud index recovery is actually required
 when you are NOT restarting Solr instances, your index might be having
 problems.

 Regardless of whether you're running SolrCloud or not, normally when one
 of those directories with a numeric suffix is created, it will be
 changed to index with no suffix after the replication is complete, but
 if Solr is unable to change the directories for some reason, it will
 simply keep and use the new directory with the suffix.  Do you see any
 ERROR or WARN entries in your solr logfile that would indicate why Solr
 cannot change the directory name?  Are you on Windows?  Problems like
 this are more common on Windows, because Windows prevents a lot of file
 operations when files/directories are open.

 The long-term existence of directories with this naming convention
 indicates that *something* went wrong, but you would need to consult
 your logs to find out what happened.  There have been several bugs over
 Solr's history that cause this problem.

 Thanks,
 Shawn




Re: solr 4.10.3 and index.xxxxxxxxxxx directory

2015-04-01 Thread Erick Erickson
I _really_ suspect that with the huge JVM heaps you had, you were hitting long
GC pauses that exceeded the Zookeeper timeout, causing ZK to believe the
node had gone away thus throwing it into recovery mode.

You can enable GC logging to see whether you see such long pauses, but with 96G
it's almost certain that you did.

Reducing the JVM allocation should help, but if you continue to see
nodes go into
recovery for no apparent reason enabling GC logging is a good idea so you have
a record..

See Getting a view into garbage collection here:
https://lucidworks.com/blog/garbage-collection-bootcamp-1-0/

Best
Erick

On Wed, Apr 1, 2015 at 10:35 AM, Dominique Bejean
dominique.bej...@eolya.fr wrote:
 Hi Shawn,

 Thank you for your response.

 This is a Solrcloud installation on Centos.

 There are 5 servers with 128 Gb ram each.
 The collection contains 650 millions of small documents.
 There are 3 shards with replicationfactor = 2 (so 9 cores).
 The JVM Xmx parameter was set to 96 Gb. We changed it yesterday to 32 Gb in
 order to be under the CompressedOops limit and free the direct memory for
 MMapDirectory.

 I will have access to both full solr and tomcat logs tomorrow.

 What I know, is that there are some zookeeper time out in solr logs.
 And the replications occur on some nodes after some commits (after DIH
 import) and when nodes restart.

 So, I will have more precise log messages tomorrow.

 Thank you for your response.

 Dominique



 2015-04-01 18:29 GMT+02:00 Shawn Heisey apa...@elyograg.org:

 On 4/1/2015 6:35 AM, Dominique Bejean wrote:
  Is it normal with Solr 4.10.3 that the data directory of replicas still
  contains directories like
 
  index.3636365667474747
  index.999080980976
 
  and files
 
  index.properties
  replica.properties
 
  If yes, why and in which circumstances ?

 The index. directories are created during master/slave
 index replication.  If you're running SolrCloud, then replication is
 only used for index recovery.  Index recovery is only required in
 situations where the replicas are so far behind that the transaction log
 cannot be used to synchronize them, and sometimes happens when a Solr
 node is restarted.  If SolrCloud index recovery is actually required
 when you are NOT restarting Solr instances, your index might be having
 problems.

 Regardless of whether you're running SolrCloud or not, normally when one
 of those directories with a numeric suffix is created, it will be
 changed to index with no suffix after the replication is complete, but
 if Solr is unable to change the directories for some reason, it will
 simply keep and use the new directory with the suffix.  Do you see any
 ERROR or WARN entries in your solr logfile that would indicate why Solr
 cannot change the directory name?  Are you on Windows?  Problems like
 this are more common on Windows, because Windows prevents a lot of file
 operations when files/directories are open.

 The long-term existence of directories with this naming convention
 indicates that *something* went wrong, but you would need to consult
 your logs to find out what happened.  There have been several bugs over
 Solr's history that cause this problem.

 Thanks,
 Shawn




RE: Solr 3.6, Highlight and multi words?

2015-04-01 Thread Reitzel, Charles
If you want to query on the field ab, you'll probably need to add it the qf 
parameter.

To control the highlighting markup, with the standard highlighter, use 
hl.simple.pre and hl.simple.post.   

https://cwiki.apache.org/confluence/display/solr/Standard+Highlighter


-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr] 
Sent: Wednesday, April 01, 2015 2:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

Dear Charles,

Thanks for your answer, please find below my answers.

ok it works if I use aben as field in my query as you say in Answer 1.
it doesn't work if I use ab may be because ab field is a copyField for 
abfr, aben, abit, abpt

Concerning the 2., yes you have right it's not and but AND

I have this result:

lst  name=DE102009043935B3
 arr  name=tien
   strlt;emgt;Bicyclelt;/emgt;  frame comprises holder, particularly 
for water bottle, where holder is connected/str
 /arr
 arr  name=aben
   str#CMT# #/CMT# Thelt;emgt;bicyclelt;/emgt;  frame (7) comprises a 
holder (1), particularly for a water bottle/str
   str. The holder is connected with thelt;emgt;bicyclelt;/emgt;  
frame by a screw (5), where a mounting element has a compensation/str
   str  section which is made of an elastic material, particularly 
alt;emgt;plasticlt;/emgt;  material. The compensation section/str
 /arr
   /lst


So my last question is why I haven't em/em instead having colored ?
How can I tell to solr to use the colored ?

Thanks a lot,
Bruno


Le 01/04/2015 17:15, Reitzel, Charles a écrit :
 Haven't used Solr 3.x in a long time.  But with 4.10.x, I haven't had any 
 trouble with multiple terms.  I'd look at a few things.

 1.  Do you have a typo in your query?  Shouldn't it be q=aben:(plastic and 
 bicycle)?
   
   
 ^^ 2. Try removing the word and from the query.  There may be some 
 interaction with a stop word filter.  If you want a phrase query, wrap it in 
 quotes.

 3.  Also, be sure that the query and indexing analyzers for the aben field 
 are compatible with each other.

 -Original Message-
 From: Bruno Mannina [mailto:bmann...@free.fr]
 Sent: Wednesday, April 01, 2015 7:05 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr 3.6, Highlight and multi words?

 Sorry to disturb you with the renew but nobody use or have problem with 
 multi-terms and highlight ?

 regards,

 Le 29/03/2015 21:15, Bruno Mannina a écrit :
 Dear Solr User,

 I try to work with highlight, it works well but only if I have only 
 one keyword in my query?!
 If my request is plastic AND bicycle then only plastic is highlight.

 my request is:

 ./select/?q=ab%3A%28plastic+and+bicycle%29version=2.2start=0ro
 w
 s=10indent=onhl=truehl.fl=tien,abenfl=pnf.aben.hl.snippets=5


 Could you help me please to understand ? I read doc, google, without 
 success...
 so I post here...

 my result is:

 

   lst  name=DE202010012045U1
  arr  name=aben
str(EP2423092A1) #CMT# #/CMT# The bicycle pedal has a pedal 
 body (10) made fromlt;emgt;plasticlt;/emgt; material/str
str, particularly for touring bike. #CMT#ADVANTAGE : #/CMT# 
 The bicycle pedal has a pedal body made 
 fromlt;emgt;plasticlt;/emgt;/str
  /arr
/lst
lst  name=JP2014091382A
  arr  name=aben
str betweenlt;emgt;plasticlt;/emgt;  tapes 3 and 3 
 having two heat fusion layers, and the 
 twolt;emgt;plasticlt;/emgt;  tapes
 3 and 3 are stuck/str
  /arr
/lst
lst  name=DE10201740A1
  arr  name=aben
str  elements. A connecting element is formed as a hinge, a 
 flexible foil or a flexiblelt;emgt;plasticlt;/emgt;  part.
 #CMT#USE/str
  /arr
/lst
lst  name=US2008276751A1
  arr  name=aben
strA bicycle handlebar grip includes an inner fiber layer 
 and an outerlt;emgt;plasticlt;/emgt; layer. Thus, the fiber/str
str  handlebar grip, while thelt;emgt;plasticlt;/emgt; 
 layer is soft and has an adjustable thickness to provide a 
 comfortable/str
str  sensation to a user. In addition, 
 thelt;emgt;plasticlt;/emgt;  layer includes a holding portion 
 coated on the outer surface/str
str  layer to enhance the combination strength between the 
 fiber layer and thelt;emgt;plasticlt;/emgt;  layer and to 
 enhance/str
  /arr
/lst

 **
 *** This e-mail may contain confidential or privileged information.
 If you are not the intended recipient, please notify the sender immediately 
 and then delete it.

 TIAA-CREF
 **
 ***


*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.


Re: Solr 3.6, Highlight and multi words?

2015-04-01 Thread Bruno Mannina

ok for qf (i can't test now)

but concerning hl.simple.pre hl.simple.post I can define only one color no ?

in the sample solrconfig.xml there are several color,

!-- multi-colored tag FragmentsBuilder --
  fragmentsBuilder  name=colored
class=solr.highlight.ScoreOrderFragmentsBuilder
lst  name=defaults
  str  name=hl.tag.pre![CDATA[
   b style=background:yellow,b style=background:lawgreen,
   b style=background:aquamarine,b style=background:magenta,
   b style=background:palegreen,b style=background:coral,
   b style=background:wheat,b style=background:khaki,
   b style=background:lime,b 
style=background:deepskyblue]]/str
  str  name=hl.tag.post![CDATA[/b]]/str
/lst
  /fragmentsBuilder

How can I tell to solr to use these color instead of hl.simple.pre/post ?



Le 01/04/2015 20:58, Reitzel, Charles a écrit :

If you want to query on the field ab, you'll probably need to add it the qf 
parameter.

To control the highlighting markup, with the standard highlighter, use 
hl.simple.pre and hl.simple.post.

https://cwiki.apache.org/confluence/display/solr/Standard+Highlighter


-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr]
Sent: Wednesday, April 01, 2015 2:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

Dear Charles,

Thanks for your answer, please find below my answers.

ok it works if I use aben as field in my query as you say in Answer 1.
it doesn't work if I use ab may be because ab field is a copyField for 
abfr, aben, abit, abpt

Concerning the 2., yes you have right it's not and but AND

I have this result:

lst  name=DE102009043935B3
  arr  name=tien
strlt;emgt;Bicyclelt;/emgt;  frame comprises holder, particularly for 
water bottle, where holder is connected/str
  /arr
  arr  name=aben
str#CMT# #/CMT# Thelt;emgt;bicyclelt;/emgt;  frame (7) comprises a 
holder (1), particularly for a water bottle/str
str. The holder is connected with thelt;emgt;bicyclelt;/emgt;  frame by 
a screw (5), where a mounting element has a compensation/str
str  section which is made of an elastic material, particularly 
alt;emgt;plasticlt;/emgt;  material. The compensation section/str
  /arr
/lst


So my last question is why I haven't em/em instead having colored ?
How can I tell to solr to use the colored ?

Thanks a lot,
Bruno


Le 01/04/2015 17:15, Reitzel, Charles a écrit :

Haven't used Solr 3.x in a long time.  But with 4.10.x, I haven't had any 
trouble with multiple terms.  I'd look at a few things.

1.  Do you have a typo in your query?  Shouldn't it be q=aben:(plastic and 
bicycle)?
 
^^ 2. Try removing the word and from the query.  There may be some interaction with a stop word filter.  If you want a phrase query, wrap it in quotes.


3.  Also, be sure that the query and indexing analyzers for the aben field are 
compatible with each other.

-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr]
Sent: Wednesday, April 01, 2015 7:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

Sorry to disturb you with the renew but nobody use or have problem with 
multi-terms and highlight ?

regards,

Le 29/03/2015 21:15, Bruno Mannina a écrit :

Dear Solr User,

I try to work with highlight, it works well but only if I have only
one keyword in my query?!
If my request is plastic AND bicycle then only plastic is highlight.

my request is:

./select/?q=ab%3A%28plastic+and+bicycle%29version=2.2start=0ro
w
s=10indent=onhl=truehl.fl=tien,abenfl=pnf.aben.hl.snippets=5


Could you help me please to understand ? I read doc, google, without
success...
so I post here...

my result is:



   lst  name=DE202010012045U1
  arr  name=aben
str(EP2423092A1) #CMT# #/CMT# The bicycle pedal has a pedal
body (10) made fromlt;emgt;plasticlt;/emgt; material/str
str, particularly for touring bike. #CMT#ADVANTAGE : #/CMT#
The bicycle pedal has a pedal body made
fromlt;emgt;plasticlt;/emgt;/str
  /arr
/lst
lst  name=JP2014091382A
  arr  name=aben
str betweenlt;emgt;plasticlt;/emgt;  tapes 3 and 3
having two heat fusion layers, and the
twolt;emgt;plasticlt;/emgt;  tapes
3 and 3 are stuck/str
  /arr
/lst
lst  name=DE10201740A1
  arr  name=aben
str  elements. A connecting element is formed as a hinge, a
flexible foil or a flexiblelt;emgt;plasticlt;/emgt;  part.
#CMT#USE/str
  /arr
/lst
lst  name=US2008276751A1
  arr  name=aben
strA bicycle handlebar grip includes an inner fiber layer
and an outerlt;emgt;plasticlt;/emgt; layer. Thus, the fiber/str
str  handlebar grip, while thelt;emgt;plasticlt;/emgt;
layer is soft and has an adjustable thickness to 

Re: Solr -indexing from csv file having 28 cols taking lot of time ..plz help i m new to solr

2015-04-01 Thread Alexandre Rafalovitch
That's an interesting question. The reference shows you how to set a
separator, but ^A is a special case. You may need to pass it in as a
URL escape character or similar.

But I would first get a sample working with more conventional
separator and then worry about ^A. Just so you are not confusing
several problems.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 2 April 2015 at 05:05, avinash09 avinash.i...@gmail.com wrote:
 thanks Erick and Alexandre Rafalovitch R

 one more doubt how to pass ctrl A(^A) seprator while csv upload




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-indexing-from-csv-file-having-28-cols-taking-lot-of-time-plz-help-i-m-new-to-solr-tp4196904p4196998.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 3.6, Highlight and multi words?

2015-04-01 Thread Bruno Mannina

of course no prb charles, you already help me !

Le 01/04/2015 21:54, Reitzel, Charles a écrit :

Sorry, I've never tried highlighting in multiple colors...

-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr]
Sent: Wednesday, April 01, 2015 3:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

ok for qf (i can't test now)

but concerning hl.simple.pre hl.simple.post I can define only one color no ?

in the sample solrconfig.xml there are several color,

!-- multi-colored tag FragmentsBuilder --
fragmentsBuilder  name=colored
  class=solr.highlight.ScoreOrderFragmentsBuilder
  lst  name=defaults
str  name=hl.tag.pre![CDATA[
 b style=background:yellow,b style=background:lawgreen,
 b style=background:aquamarine,b 
style=background:magenta,
 b style=background:palegreen,b style=background:coral,
 b style=background:wheat,b style=background:khaki,
 b style=background:lime,b 
style=background:deepskyblue]]/str
str  name=hl.tag.post![CDATA[/b]]/str
  /lst
/fragmentsBuilder

How can I tell to solr to use these color instead of hl.simple.pre/post ?



Le 01/04/2015 20:58, Reitzel, Charles a écrit :

If you want to query on the field ab, you'll probably need to add it the qf 
parameter.

To control the highlighting markup, with the standard highlighter, use 
hl.simple.pre and hl.simple.post.

https://cwiki.apache.org/confluence/display/solr/Standard+Highlighter


-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr]
Sent: Wednesday, April 01, 2015 2:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

Dear Charles,

Thanks for your answer, please find below my answers.

ok it works if I use aben as field in my query as you say in Answer 1.
it doesn't work if I use ab may be because ab field is a copyField
for abfr, aben, abit, abpt

Concerning the 2., yes you have right it's not and but AND

I have this result:

lst  name=DE102009043935B3
   arr  name=tien
 strlt;emgt;Bicyclelt;/emgt;  frame comprises holder, particularly for 
water bottle, where holder is connected/str
   /arr
   arr  name=aben
 str#CMT# #/CMT# Thelt;emgt;bicyclelt;/emgt;  frame (7) comprises a 
holder (1), particularly for a water bottle/str
 str. The holder is connected with thelt;emgt;bicyclelt;/emgt;  frame 
by a screw (5), where a mounting element has a compensation/str
 str  section which is made of an elastic material, particularly 
alt;emgt;plasticlt;/emgt;  material. The compensation section/str
   /arr
 /lst


So my last question is why I haven't em/em instead having colored ?
How can I tell to solr to use the colored ?

Thanks a lot,
Bruno


Le 01/04/2015 17:15, Reitzel, Charles a écrit :

Haven't used Solr 3.x in a long time.  But with 4.10.x, I haven't had any 
trouble with multiple terms.  I'd look at a few things.

1.  Do you have a typo in your query?  Shouldn't it be q=aben:(plastic and 
bicycle)?
  
^^ 2. Try removing the word and from the query.  There may be some interaction with a stop word filter.  If you want a phrase query, wrap it in quotes.


3.  Also, be sure that the query and indexing analyzers for the aben field are 
compatible with each other.

-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr]
Sent: Wednesday, April 01, 2015 7:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

Sorry to disturb you with the renew but nobody use or have problem with 
multi-terms and highlight ?

regards,

Le 29/03/2015 21:15, Bruno Mannina a écrit :

Dear Solr User,

I try to work with highlight, it works well but only if I have only
one keyword in my query?!
If my request is plastic AND bicycle then only plastic is highlight.

my request is:

./select/?q=ab%3A%28plastic+and+bicycle%29version=2.2start=0r
o
w
s=10indent=onhl=truehl.fl=tien,abenfl=pnf.aben.hl.snippets=5


Could you help me please to understand ? I read doc, google, without
success...
so I post here...

my result is:



lst  name=DE202010012045U1
   arr  name=aben
 str(EP2423092A1) #CMT# #/CMT# The bicycle pedal has a
pedal body (10) made fromlt;emgt;plasticlt;/emgt; material/str
 str, particularly for touring bike. #CMT#ADVANTAGE :
#/CMT# The bicycle pedal has a pedal body made
fromlt;emgt;plasticlt;/emgt;/str
   /arr
 /lst
 lst  name=JP2014091382A
   arr  name=aben
 str betweenlt;emgt;plasticlt;/emgt;  tapes 3 and 3
having two heat fusion layers, and the
twolt;emgt;plasticlt;/emgt;  tapes
3 and 3 are stuck/str
   /arr
 /lst
 lst  name=DE10201740A1
   arr  name=aben
 str  elements. A connecting element is 

SolrCloud 5.0 cluster RAM requirements

2015-04-01 Thread Ryan Steele
Does a SolrCloud 5.0 cluster need enough RAM across the cluster to load 
all the collections into RAM at all times?


I'm building a SolrCloud cluster that may have approximately 1 TB of 
data spread across the collections.


Thanks,
Ryan
---
This email has been scanned for email related threats and delivered safely by 
Mimecast.
For more information please visit http://www.mimecast.com
---



RE: Solr 3.6, Highlight and multi words?

2015-04-01 Thread Reitzel, Charles
Sorry, I've never tried highlighting in multiple colors...

-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr] 
Sent: Wednesday, April 01, 2015 3:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

ok for qf (i can't test now)

but concerning hl.simple.pre hl.simple.post I can define only one color no ?

in the sample solrconfig.xml there are several color,

!-- multi-colored tag FragmentsBuilder --
   fragmentsBuilder  name=colored
 class=solr.highlight.ScoreOrderFragmentsBuilder
 lst  name=defaults
   str  name=hl.tag.pre![CDATA[
b style=background:yellow,b style=background:lawgreen,
b style=background:aquamarine,b 
style=background:magenta,
b style=background:palegreen,b style=background:coral,
b style=background:wheat,b style=background:khaki,
b style=background:lime,b 
style=background:deepskyblue]]/str
   str  name=hl.tag.post![CDATA[/b]]/str
 /lst
   /fragmentsBuilder

How can I tell to solr to use these color instead of hl.simple.pre/post ?



Le 01/04/2015 20:58, Reitzel, Charles a écrit :
 If you want to query on the field ab, you'll probably need to add it the qf 
 parameter.

 To control the highlighting markup, with the standard highlighter, use 
 hl.simple.pre and hl.simple.post.

 https://cwiki.apache.org/confluence/display/solr/Standard+Highlighter


 -Original Message-
 From: Bruno Mannina [mailto:bmann...@free.fr]
 Sent: Wednesday, April 01, 2015 2:24 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr 3.6, Highlight and multi words?

 Dear Charles,

 Thanks for your answer, please find below my answers.

 ok it works if I use aben as field in my query as you say in Answer 1.
 it doesn't work if I use ab may be because ab field is a copyField 
 for abfr, aben, abit, abpt

 Concerning the 2., yes you have right it's not and but AND

 I have this result:

 lst  name=DE102009043935B3
   arr  name=tien
 strlt;emgt;Bicyclelt;/emgt;  frame comprises holder, 
 particularly for water bottle, where holder is connected/str
   /arr
   arr  name=aben
 str#CMT# #/CMT# Thelt;emgt;bicyclelt;/emgt;  frame (7) 
 comprises a holder (1), particularly for a water bottle/str
 str. The holder is connected with thelt;emgt;bicyclelt;/emgt;  
 frame by a screw (5), where a mounting element has a compensation/str
 str  section which is made of an elastic material, particularly 
 alt;emgt;plasticlt;/emgt;  material. The compensation section/str
   /arr
 /lst


 So my last question is why I haven't em/em instead having colored ?
 How can I tell to solr to use the colored ?

 Thanks a lot,
 Bruno


 Le 01/04/2015 17:15, Reitzel, Charles a écrit :
 Haven't used Solr 3.x in a long time.  But with 4.10.x, I haven't had any 
 trouble with multiple terms.  I'd look at a few things.

 1.  Do you have a typo in your query?  Shouldn't it be q=aben:(plastic and 
 bicycle)?
  
 
 ^^ 2. Try removing the word and from the query.  There may be some 
 interaction with a stop word filter.  If you want a phrase query, wrap it in 
 quotes.

 3.  Also, be sure that the query and indexing analyzers for the aben field 
 are compatible with each other.

 -Original Message-
 From: Bruno Mannina [mailto:bmann...@free.fr]
 Sent: Wednesday, April 01, 2015 7:05 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr 3.6, Highlight and multi words?

 Sorry to disturb you with the renew but nobody use or have problem with 
 multi-terms and highlight ?

 regards,

 Le 29/03/2015 21:15, Bruno Mannina a écrit :
 Dear Solr User,

 I try to work with highlight, it works well but only if I have only 
 one keyword in my query?!
 If my request is plastic AND bicycle then only plastic is highlight.

 my request is:

 ./select/?q=ab%3A%28plastic+and+bicycle%29version=2.2start=0r
 o
 w
 s=10indent=onhl=truehl.fl=tien,abenfl=pnf.aben.hl.snippets=5


 Could you help me please to understand ? I read doc, google, without 
 success...
 so I post here...

 my result is:

 

lst  name=DE202010012045U1
   arr  name=aben
 str(EP2423092A1) #CMT# #/CMT# The bicycle pedal has a 
 pedal body (10) made fromlt;emgt;plasticlt;/emgt; material/str
 str, particularly for touring bike. #CMT#ADVANTAGE : 
 #/CMT# The bicycle pedal has a pedal body made 
 fromlt;emgt;plasticlt;/emgt;/str
   /arr
 /lst
 lst  name=JP2014091382A
   arr  name=aben
 str betweenlt;emgt;plasticlt;/emgt;  tapes 3 and 3 
 having two heat fusion layers, and the 
 twolt;emgt;plasticlt;/emgt;  tapes
 3 and 3 are stuck/str
   /arr
 /lst
 lst  name=DE10201740A1
   arr  name=aben
 str  elements. A connecting element is formed as a hinge, 
 a flexible foil or a