date:20150401


  entity name=test1
processor=LineEntityProcessor
dataSource=fds
url=test.csv
rootEntity=true
transformer=RegexTransformer,TemplateTransformer 
  field column=rawLine

regex=^(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*)$
 groupNames=test,,

,,,is_frequency_cap_enabled,,,daily_spend_limit,,, /
 field column=table_name name=table_name template=test1 /
/entity



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-from-csv-file-having-28-cols-taking-lot-of-time-plz-help-i-m-new-to-solr-tp4196904.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Collapse and Expand behaviour on result with 1 document.

2015-04-01 Thread Joel Bernstein

Exactly correct.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Apr 1, 2015 at 5:44 AM, Derek Poh d...@globalsources.com wrote:

 Hi Joel

 Correct me if my understanding is wrong.
 Using supplier id as the field to collapse on.

 - If thecollapse group heads inthe main result set has only 1document in
 each group, the expanded section will be empty since there are no documents
 to expandfor each collapse group.
 - To render the page, I need to iterate the main result set. For each
 document I have to check if there is an expanded group with the same
 supplier id.
 - The facets counts is based on the number of collapse groupsin the main
 result set (result maxScore=6.470696 name=response numFound=27
 start=0)

 -Derek


 On 3/31/2015 7:43 PM, Joel Bernstein wrote:

 The way that collapse/expand is designed to be used is as follows:

 The main result set will contain the collapsed group heads.

 The expanded section will contain the expanded groups for the page of
 results.

 To render the page you iterate the main result set. For each document
 check
 to see if there is an expanded group.




 Joel Bernstein
 http://joelsolr.blogspot.com/

 On Tue, Mar 31, 2015 at 7:37 AM, Joel Bernstein joels...@gmail.com
 wrote:

  You should be able to use collapse/expand with one result.

 Does the document in the main result set have group members that aren't
 being expanded?



 Joel Bernstein
 http://joelsolr.blogspot.com/

 On Tue, Mar 31, 2015 at 2:00 AM, Derek Poh d...@globalsources.com
 wrote:

  If I want to group the results (by a certain field) even if there is
 only
 1 document, I should use the group parameter instead?
 The requirement is to group the result of product documents by their
 supplier id.
 group=truegroup.field=P_SupplierIdgroup.limit=5

 Is it true that the performance of collapse is better than group
 parameter on large data set, say 10-20 million documents?

 -Derek


 On 3/31/2015 10:03 AM, Joel Bernstein wrote:

  The expanded section will only include groups that have expanded
 documents.

 So, if the document that in the main result set has no documents to
 expand,
 then this is working as expected.



 Joel Bernstein
 http://joelsolr.blogspot.com/

 On Mon, Mar 30, 2015 at 8:43 PM, Derek Poh d...@globalsources.com
 wrote:

   Hi

 I have a query which return 1 document.
 When I add the collapse and expand parameters to it,
 expand=trueexpand.rows=5fq={!collapse%20field=P_SupplierId}, the
 expanded section is empty (lst name=expanded/).

 Is this the behaviour of collapse and expand parameters on result
 which
 contain only 1 document?

 -Derek

Customzing Solr Dedupe

2015-04-01 Thread thakkar.aayush

I'm facing a challenges using de-dupliation of Solr documents.

De-duplicate is done using TextProfileSignature with following parameters: 
str name=fieldsfield1, field2, field3/str 
str name=quantRate0.5/str
str name=minTokenLen3/str

Here Field3 is normal text with few lines of data.
Field1 and Field2 can contain upto 5 or 6 words of data. 

I want to de-duplicate when data in field1 and field2 are exactly the same
and 90% of the lines in field3 is matched to that in another document. 

Is there anyway to achieve this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Customzing-Solr-Dedupe-tp4196879.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 3.6, Highlight and multi words?

Sorry to disturb you with the renew but nobody use or have problem with 
multi-terms and highlight ?


regards,

Le 29/03/2015 21:15, Bruno Mannina a écrit :

Dear Solr User,

I try to work with highlight, it works well but only if I have only 
one keyword in my query?!

If my request is plastic AND bicycle then only plastic is highlight.

my request is:

./select/?q=ab%3A%28plastic+and+bicycle%29version=2.2start=0rows=10indent=onhl=truehl.fl=tien,abenfl=pnf.aben.hl.snippets=5 



Could you help me please to understand ? I read doc, google, without 
success...

so I post here...

my result is:



 lst  name=DE202010012045U1
arr  name=aben
  str(EP2423092A1) #CMT# #/CMT# The bicycle pedal has a pedal 
body (10) made fromlt;emgt;plasticlt;/emgt; material/str
  str, particularly for touring bike. #CMT#ADVANTAGE : #/CMT# 
The bicycle pedal has a pedal body made 
fromlt;emgt;plasticlt;/emgt;/str

/arr
  /lst
  lst  name=JP2014091382A
arr  name=aben
  str betweenlt;emgt;plasticlt;/emgt;  tapes 3 and 3 having 
two heat fusion layers, and the twolt;emgt;plasticlt;/emgt;  tapes 
3 and 3 are stuck/str

/arr
  /lst
  lst  name=DE10201740A1
arr  name=aben
  str  elements. A connecting element is formed as a hinge, a 
flexible foil or a flexiblelt;emgt;plasticlt;/emgt;  part. 
#CMT#USE/str

/arr
  /lst
  lst  name=US2008276751A1
arr  name=aben
  strA bicycle handlebar grip includes an inner fiber layer and 
an outerlt;emgt;plasticlt;/emgt; layer. Thus, the fiber/str
  str  handlebar grip, while thelt;emgt;plasticlt;/emgt;  
layer is soft and has an adjustable thickness to provide a 
comfortable/str
  str  sensation to a user. In addition, 
thelt;emgt;plasticlt;/emgt;  layer includes a holding portion 
coated on the outer surface/str
  str  layer to enhance the combination strength between the 
fiber layer and thelt;emgt;plasticlt;/emgt;  layer and to 
enhance/str

/arr
  /lst






---
Ce courrier électronique ne contient aucun virus ou logiciel 
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com

shard splitting (solr 4.4.0)

2015-04-01 Thread Ashwin Kumar

 Hello Solr Community,
 
Greetings ! This is my first post to this group.
 
I am very new to solr, so please do not mind if some of my questions below 
sound dumb :)
 
Let me explain my present setup:
 
Solr version : Solr_4.4.0 
Zookeeper version: zookeeper-3.4.5
-
 
Present Setup
Unix_box_1
One Solr instance (Collection 1 : contains around 24 million indexed documents) 
running on port 8983
 

 
Target setup
 
Now as the number of users are going to increase and also we are looking for 
high availability, I am thinking of setting up solr cloud with the following 
setup: 
 
Unix box 1
zookeeper 1(master)
Solr instance 1(Shard 1 - leader node)

 
Unix_box_2
zookeeper 2
Solr instance 2  (Shard 2)

 
Unix_box_3
zookeeper 3
Solr instance 3  (Replica for Shard 1)

 
Unix_box_4
Solr instance 4 (Replica for Shard 2)

 

 
Now following are my queries:
 
1) Is it possible for me to split the present solr running on one node with 24 
million docs under Collection1 into 2 shards as shown above ?
2) If yes how can I achieve this, and approximately how long does it take ?
3) For my application to fetch the result from solr, I need to give one solr 
url meaning http://Unix_box_1:8983/solr   . In this case if I have some docs on 
shard2 (which is on Unix_box_2) and some on shard1 (Unix_box_1), will my search 
result in the application fetch docs from both the shards and combine the 
result ? 
 
=
 
 
Thank you for your patience and time.
 
Regards,
Ashwin

Re: Customzing Solr Dedupe

2015-04-01 Thread Jack Krupansky

Solr dedupe is based on the concept of a signature - some fields and rules
that reduce a document into a discrete signature, and then checking if that
signature exists as a document key that can be looked up quickly in the
index. That's the conceptual basis. It is not based on any kind of field by
field comparison to all existing documents.

-- Jack Krupansky

On Wed, Apr 1, 2015 at 6:35 AM, thakkar.aayush thakkar.aay...@gmail.com
wrote:

 I'm facing a challenges using de-dupliation of Solr documents.

 De-duplicate is done using TextProfileSignature with following parameters:
 str name=fieldsfield1, field2, field3/str
 str name=quantRate0.5/str
 str name=minTokenLen3/str

 Here Field3 is normal text with few lines of data.
 Field1 and Field2 can contain upto 5 or 6 words of data.

 I want to de-duplicate when data in field1 and field2 are exactly the same
 and 90% of the lines in field3 is matched to that in another document.

 Is there anyway to achieve this?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Customzing-Solr-Dedupe-tp4196879.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr -indexing from csv file having 28 cols taking lot of time ..plz help i m new to solr

2015-04-01 Thread Alexandre Rafalovitch

Solr actually has CSV update handler. You could send file to that directly.

Have you tried that?

Regards,
Alex
On 1 Apr 2015 11:56 pm, avinash09 avinash.i...@gmail.com wrote:


   entity name=test1
 processor=LineEntityProcessor
 dataSource=fds
 url=test.csv
 rootEntity=true
 transformer=RegexTransformer,TemplateTransformer 
   field column=rawLine


 regex=^(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*)$
  groupNames=test,,

 ,,,is_frequency_cap_enabled,,,daily_spend_limit,,, /
  field column=table_name name=table_name template=test1 /
 /entity



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-indexing-from-csv-file-having-28-cols-taking-lot-of-time-plz-help-i-m-new-to-solr-tp4196904.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Suspicious message with attachment

2015-04-01 Thread help

The following message addressed to you was quarantined because it likely 
contains a virus:

Subject: Error while reading index
From: Moshe Recanati mos...@kmslh.com

However, if you know the sender and are expecting an attachment, please reply 
to this message, and we will forward the quarantined message to you.

RE: Error while reading index

2015-04-01 Thread Moshe Recanati

Hi,
I uploaded the log to drive.
https://drive.google.com/file/d/0B0GR0M-lL5QHX1B2a2NZZXh3a1E/view?usp=sharing



Regards,
Moshe Recanati
SVP Engineering
Office + 972-73-2617564
Mobile  + 972-52-6194481
Skype:  recanati
[KMS2]http://finance.yahoo.com/news/kms-lighthouse-named-gartner-cool-121000184.html
More at:  www.kmslh.comhttp://www.kmslh.com/ | 
LinkedInhttp://www.linkedin.com/company/kms-lighthouse | 
FBhttps://www.facebook.com/pages/KMS-lighthouse/123774257810917


From: Moshe Recanati [mailto:mos...@kmslh.com]
Sent: Wednesday, April 01, 2015 5:22 PM
To: solr-user@lucene.apache.org
Subject: Error while reading index

Hi,
We're running on production environment with Solr 4.7.1 master and slave with 
replication every 1 minute.
During regular activity and index delta build we got the following error:
ERROR - 2015-03-30 04:06:12.318; java.lang.RuntimeException: [was class 
java.net.SocketException] Connection reset
at 
com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
at 
com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)

After additional 2 minutes we got the following error:
ERROR - 2015-03-30 04:07:39.875; Unable to get file names for indexCommit 
generation: 638
java.io.FileNotFoundException: _tu.fdt
at 
org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:261)
at 
org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:178)

And since than Solr wasn't recover until we did full rebuild of all documents.
Detailed log attached.

Let me know if you familiar with such issue.
And what can create such issue that prevent from recovery and requires rebuild 
index. This is major issue for us.

Thank you in advance,


Regards,
Moshe Recanati
SVP Engineering
Office + 972-73-2617564
Mobile  + 972-52-6194481
Skype:  recanati
[KMS2]http://finance.yahoo.com/news/kms-lighthouse-named-gartner-cool-121000184.html
More at:  www.kmslh.comhttp://www.kmslh.com/ | 
LinkedInhttp://www.linkedin.com/company/kms-lighthouse | 
FBhttps://www.facebook.com/pages/KMS-lighthouse/123774257810917

Re: Solr -indexing from csv file having 28 cols taking lot of time ..plz help i m new to solr

no could you please share an example



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-from-csv-file-having-28-cols-taking-lot-of-time-plz-help-i-m-new-to-solr-tp4196904p4196928.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr 3.6, Highlight and multi words?

2015-04-01 Thread Reitzel, Charles

Haven't used Solr 3.x in a long time.  But with 4.10.x, I haven't had any 
trouble with multiple terms.  I'd look at a few things.

1.  Do you have a typo in your query?  Shouldn't it be q=aben:(plastic and 
bicycle)?

   ^^
2. Try removing the word and from the query.  There may be some interaction 
with a stop word filter.  If you want a phrase query, wrap it in quotes.  

3.  Also, be sure that the query and indexing analyzers for the aben field are 
compatible with each other.

-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr] 
Sent: Wednesday, April 01, 2015 7:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

Sorry to disturb you with the renew but nobody use or have problem with 
multi-terms and highlight ?

regards,

Le 29/03/2015 21:15, Bruno Mannina a écrit :
 Dear Solr User,

 I try to work with highlight, it works well but only if I have only 
 one keyword in my query?!
 If my request is plastic AND bicycle then only plastic is highlight.

 my request is:

 ./select/?q=ab%3A%28plastic+and+bicycle%29version=2.2start=0row
 s=10indent=onhl=truehl.fl=tien,abenfl=pnf.aben.hl.snippets=5


 Could you help me please to understand ? I read doc, google, without 
 success...
 so I post here...

 my result is:

 

  lst  name=DE202010012045U1
 arr  name=aben
   str(EP2423092A1) #CMT# #/CMT# The bicycle pedal has a pedal 
 body (10) made fromlt;emgt;plasticlt;/emgt; material/str
   str, particularly for touring bike. #CMT#ADVANTAGE : #/CMT# 
 The bicycle pedal has a pedal body made 
 fromlt;emgt;plasticlt;/emgt;/str
 /arr
   /lst
   lst  name=JP2014091382A
 arr  name=aben
   str betweenlt;emgt;plasticlt;/emgt;  tapes 3 and 3 having 
 two heat fusion layers, and the twolt;emgt;plasticlt;/emgt;  tapes
 3 and 3 are stuck/str
 /arr
   /lst
   lst  name=DE10201740A1
 arr  name=aben
   str  elements. A connecting element is formed as a hinge, a 
 flexible foil or a flexiblelt;emgt;plasticlt;/emgt;  part.
 #CMT#USE/str
 /arr
   /lst
   lst  name=US2008276751A1
 arr  name=aben
   strA bicycle handlebar grip includes an inner fiber layer and 
 an outerlt;emgt;plasticlt;/emgt; layer. Thus, the fiber/str
   str  handlebar grip, while thelt;emgt;plasticlt;/emgt; 
 layer is soft and has an adjustable thickness to provide a 
 comfortable/str
   str  sensation to a user. In addition, 
 thelt;emgt;plasticlt;/emgt;  layer includes a holding portion 
 coated on the outer surface/str
   str  layer to enhance the combination strength between the 
 fiber layer and thelt;emgt;plasticlt;/emgt;  layer and to 
 enhance/str
 /arr
   /lst


*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*

Re: shard splitting (solr 4.4.0)

Ashwin:

First, if at all possible I would simply set up my new SolrCloud
structure (2 shards, a leader and follower each) and re-index the
entire corpus. 24M docs isn't really very many, and you'll have to
have this capability sometime since somone, somewhere will want to
change the schema in ways that require it.

But to answer your questions:
1: Certainly. There's the SPLITSHARD command, see:
https://cwiki.apache.org/confluence/display/solr/Collections+API. That
said, Solr 4.4 used a relatively early version of SPLITSHARD and there
have been many improvements so make sure and back up first.

2: Not quite sure how long it takes, but I wouldn't expect it to take
hours. A lot depends on what the docs are like.

3: Yes, sending a query (or update for that matter) to any node in the
cluster will do the right thing. In a production environment, and
assuming you're not using SolrJ, I'd put a load balancer in front of
the cluster for queries. If you _are_ querying through SolrJ from the
application, you only need to use the CloudSolrServer class as it
includes a software load balancer by default. Otherwise, if you
hard-code a single machine that machine becomes a single point of
failure.

Best,
Erick

On Wed, Apr 1, 2015 at 4:55 AM, Ashwin Kumar ashwins...@outlook.de wrote:
  Hello Solr Community,

 Greetings ! This is my first post to this group.

 I am very new to solr, so please do not mind if some of my questions below 
 sound dumb :)

 Let me explain my present setup:

 Solr version : Solr_4.4.0
 Zookeeper version: zookeeper-3.4.5
 -

 Present Setup
 Unix_box_1
 One Solr instance (Collection 1 : contains around 24 million indexed 
 documents) running on port 8983

 

 Target setup

 Now as the number of users are going to increase and also we are looking for 
 high availability, I am thinking of setting up solr cloud with the following 
 setup:

 Unix box 1
 zookeeper 1(master)
 Solr instance 1(Shard 1 - leader node)
 

 Unix_box_2
 zookeeper 2
 Solr instance 2  (Shard 2)
 

 Unix_box_3
 zookeeper 3
 Solr instance 3  (Replica for Shard 1)
 

 Unix_box_4
 Solr instance 4 (Replica for Shard 2)
 

 

 Now following are my queries:

 1) Is it possible for me to split the present solr running on one node with 
 24 million docs under Collection1 into 2 shards as shown above ?
 2) If yes how can I achieve this, and approximately how long does it take ?
 3) For my application to fetch the result from solr, I need to give one solr 
 url meaning http://Unix_box_1:8983/solr   . In this case if I have some docs 
 on shard2 (which is on Unix_box_2) and some on shard1 (Unix_box_1), will my 
 search result in the application fetch docs from both the shards and combine 
 the result ?

 =


 Thank you for your patience and time.

 Regards,
 Ashwin

Re: Solr -indexing from csv file having 28 cols taking lot of time ..plz help i m new to solr

sir , a silly  question m confuse here what is difference between data import
handler and update csv



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-from-csv-file-having-28-cols-taking-lot-of-time-plz-help-i-m-new-to-solr-tp4196904p4196940.html
Sent from the Solr - User mailing list archive at Nabble.com.

Information regarding This conf directory is not valid SolrException.

2015-04-01 Thread Bar Weiner

Hi,

I'm working on upgrading a project from solr-4.10.3 to solr-5.0.0.
As part of our JUnit tests we have a few tests for deleting/creating
collections. Each test createdelete a collection with a different name,
but they all share the same config in ZK.
When running these tests in Eclipse everything works fine, but when running
the same tests through Maven we get the following error so I suspect this
is a timing related issue :

INFO  org.apache.solr.rest.ManagedResourceStorage  – Setting up
ZooKeeper-based storage for the RestManager with znodeBase:
/configs/SIMPLE_CONFIG
INFO  org.apache.solr.rest.ManagedResourceStorage  – Configured
ZooKeeperStorageIO with znodeBase: /configs/SIMPLE_CONFIG
INFO  org.apache.solr.rest.RestManager  – Initializing RestManager with
initArgs: {}
INFO  org.apache.solr.rest.ManagedResourceStorage  – Reading
_rest_managed.json using ZooKeeperStorageIO:path=/configs/SIMPLE_CONFIG
INFO  org.apache.solr.rest.ManagedResourceStorage  – No data found for
znode /configs/SIMPLE_CONFIG/_rest_managed.json
INFO  org.apache.solr.rest.ManagedResourceStorage  – Loaded null at path
_rest_managed.json using ZooKeeperStorageIO:path=/configs/SIMPLE_CONFIG
INFO  org.apache.solr.rest.RestManager  – Initializing 0 registered
ManagedResources
INFO  org.apache.solr.handler.ReplicationHandler  – Commits will be
reserved for  1
INFO  org.apache.solr.core.SolrCore  – [mycollection1] Registered new
searcher Searcher@3208a6c4[mycollection1]
main{ExitableDirectoryReader(UninvertingDirectoryReader())}
ERROR org.apache.solr.core.CoreContainer  – Error creating core
[mycollection1]: This conf directory is not valid
org.apache.solr.common.SolrException: This conf directory is not valid
at
org.apache.solr.cloud.ZkController.registerConfListenerForCore(ZkController.java:2229)
at
org.apache.solr.core.SolrCore.registerConfListener(SolrCore.java:2633)
at org.apache.solr.core.SolrCore.init(SolrCore.java:936)
at org.apache.solr.core.SolrCore.init(SolrCore.java:662)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:513)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:488)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:573)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:197)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:736)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:261)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at

Re: Unable to perform search query after changing uniqueKey

Steve:

Totally agree. Even if you _do_ correctly escape the URL though,
there's no guarantee that Solr will do the right thing with field
names with spaces. Plus endless chances for you to get it wrong when
constructing the URL

Best,
Erick

On Wed, Apr 1, 2015 at 1:01 AM, steve sc_shep...@hotmail.com wrote:
Gently walking into rough waters here, but if you use any API with GET,
you're sending a URI which must be properly encoded. This has nothing to do
with with the programming language that generates key and store pairs on the
browser or the one(s) used on the server. Lots and lots of good folks have
tripped over this one.http://www.w3schools.com/tags/ref_urlencode.asp
Play hard, but play safe!

Date: Wed, 1 Apr 2015 13:58:55 +0800
Subject: Re: Unable to perform search query after changing uniqueKey
From: edwinye...@gmail.com
To: solr-user@lucene.apache.org

Thanks Erick.

Yes, it is able to work correct if I do not use spaces for the field names,
especially for the uniqueKey.

Regards,
Edwin

On 31 March 2015 at 13:58, Erick Erickson erickerick...@gmail.com wrote:

Best,
Erick

On Mon, Mar 30, 2015 at 8:59 PM, Zheng Lin Edwin Yeo
edwinye...@gmail.com wrote:
Latest information that I've found for this is that the error only occurs
for shard2.

Is there any settings that I required to do for shard2 in order to solve
this issue? Currently I have not made any changes to the shards since I
created it using

http://localhost:8983/solr/admin/collections?action=CREATEname=nps1numShards=2collection.configName=collection1

Regards,
Edwin

On 31 March 2015 at 09:42, Zheng Lin Edwin Yeo edwinye...@gmail.com
wrote:

Hi Erick,

I've changed the uniqueKey from id to Item No.

uniqueKeyItem No/uniqueKey

Below are my definitions for both the id and Item No.

field name=id type=string indexed=true stored=true
required=false multiValued=false /
field name=Item No type=text_general indexed=true stored=true/

Regards,
Edwin

On 30 March 2015 at 23:05, Erick Erickson erickerick...@gmail.com
wrote:

Well, let's see the definition of your ID field, 'cause I'm puzzled.

It's definitely A Bad Thing to have it be any kind of tokenized field
though, but that's a shot in the dark.

Best,
Erick

On Mon, Mar 30, 2015 at 2:17 AM, Zheng Lin Edwin Yeo
edwinye...@gmail.com wrote:
Hi Mostafa,

Yes, I've defined all the fields in schema.xml. It is able to work on
the
version without SolrCloud, but it is not working for the one with
SolrCloud.
Both of them are using the same schema.xml.

Regards,
Edwin

On 30 March 2015 at 14:34, Mostafa Gomaa mostafa.goma...@gmail.com
wrote:

Hi Zheng,

It's possible that there's a problem with your schema.xml. Are all
fields
defined and have appropriate options enabled?

Regards,

Mostafa.

On Mon, Mar 30, 2015 at 7:49 AM, Zheng Lin Edwin Yeo
edwinye...@gmail.com

wrote:

Hi Erick,

I've tried that, and removed the data directory from both the
shards. But
the same problem still occurs, so we probably can rule out the
memory
issue.

Regards,
Edwin

On 30 March 2015 at 12:39, Erick Erickson
erickerick...@gmail.com
wrote:

I meant shut down Solr and physically remove the entire data
directory. Not saying this is the cure, but it can't hurt to
rule
out
the index having memory...

Best,
Erick

On Sun, Mar 29, 2015 at 6:35 PM, Zheng Lin Edwin Yeo
edwinye...@gmail.com wrote:
Hi Erick,

I used the following query to delete all the index.

http://localhost:8983/solr/update?stream.body=
deletequery*:*/query/delete
http://localhost:8983/solr/update?stream.body=commit/

Or is it better to physically delete the entire data
directory?

Regards,
Edwin

On 28 March 2015 at 02:27, Erick Erickson
erickerick...@gmail.com
wrote:

You say you re-indexed, did you _completely_ remove the data
directory
first, i.e. the parent of

Re: Solr -indexing from csv file having 28 cols taking lot of time ..plz help i m new to solr

Data Import Handler is a process in Solr that reaches out, grabs
something external and indexes it. Something external can be a
database, files on the server etc. Along the way, you can do many
transformations of the data. The point is that the source can be
anything.

The update handler is an end-point in Solr that expects certain
specific formats and puts them in the index. For instance, if you
index XML, it _must_ be in a very specific form to throw at the update
handler, something like
add
   doc
 field...
 field...
   /doc
   doc
 field...
 field...
   /doc
/add

The csv update handler is just an update handler that expects CSV
files. The headers are usually the field names although you can map
them from the column header in your csv file to your Solr schema.

In importing csv files should be very fast. I suspect your regex is costly.

As Alexandre says, though, it would be a good idea to go through the
CSV import tutorial. The Solr reference guide has the details:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates

Best,
Erick

On Wed, Apr 1, 2015 at 8:04 AM, avinash09 avinash.i...@gmail.com wrote:
 sir , a silly  question m confuse here what is difference between data import
 handler and update csv



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-indexing-from-csv-file-having-28-cols-taking-lot-of-time-plz-help-i-m-new-to-solr-tp4196904p4196940.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr -indexing from csv file having 28 cols taking lot of time ..plz help i m new to solr

2015-04-01 Thread Alexandre Rafalovitch

Well, I believe the tutorial has an example. Always a good thing -
going through the tutorial.

And the reference guide has the details:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates
.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 2 April 2015 at 01:37, avinash09 avinash.i...@gmail.com wrote:
 no could you please share an example



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-indexing-from-csv-file-having-28-cols-taking-lot-of-time-plz-help-i-m-new-to-solr-tp4196904p4196928.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr 4.10.3 and index.xxxxxxxxxxx directory

2015-04-01 Thread Shawn Heisey

On 4/1/2015 6:35 AM, Dominique Bejean wrote:
 Is it normal with Solr 4.10.3 that the data directory of replicas still
 contains directories like

 index.3636365667474747
 index.999080980976

 and files

 index.properties
 replica.properties

 If yes, why and in which circumstances ?

The index. directories are created during master/slave
index replication.  If you're running SolrCloud, then replication is
only used for index recovery.  Index recovery is only required in
situations where the replicas are so far behind that the transaction log
cannot be used to synchronize them, and sometimes happens when a Solr
node is restarted.  If SolrCloud index recovery is actually required
when you are NOT restarting Solr instances, your index might be having
problems.

Regardless of whether you're running SolrCloud or not, normally when one
of those directories with a numeric suffix is created, it will be
changed to index with no suffix after the replication is complete, but
if Solr is unable to change the directories for some reason, it will
simply keep and use the new directory with the suffix.  Do you see any
ERROR or WARN entries in your solr logfile that would indicate why Solr
cannot change the directory name?  Are you on Windows?  Problems like
this are more common on Windows, because Windows prevents a lot of file
operations when files/directories are open.

The long-term existence of directories with this naming convention
indicates that *something* went wrong, but you would need to consult
your logs to find out what happened.  There have been several bugs over
Solr's history that cause this problem.

Thanks,
Shawn

How to recover a Shard

2015-04-01 Thread Matt Kuiper

Hello,

I have a SolrCloud (4.10.1) where for one of the shards, both replicas are in a 
Recovery Failed state per the Solr Admin Cloud page.  The logs contains the 
following type of entries for the two Solr nodes involved, including statements 
that it will retry.

Is there a way to recover from this state?

Maybe bring down one replica, and then somehow declare that the remaining 
replica is to be the leader?  Understand this would not be ideal as the new 
leader may be missing documents that were sent its way to be indexed while it 
was down, but would be better than having to rebuild the whole cloud.

Any tips or suggestions would be appreciated.

Thanks,
Matt

Solr node .65
Error while trying to recover. 
core=kla_collection_shard6_replica5:org.apache.solr.common.SolrException: No 
registered leader was found after waiting for 4000ms , collection: 
kla_collection slice: shard6
 at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:568)
 at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:551)
 at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:332)
 at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
Solr node .64

Error while trying to recover. 
core=kla_collection_shard6_replica2:org.apache.solr.common.SolrException: No 
registered leader was found after waiting for 4000ms , collection: 
kla_collection slice: shard6

 at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:568)

 at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:551)

 at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:332)

 at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)

RE: How to recover a Shard

2015-04-01 Thread Matt Kuiper

Maybe I have been working too many long hours as I missed the obvious solution 
of bringing down/up one of the Solr nodes backing one of the replicas, and then 
the same for the second node.  This did the trick.

Since I brought this topic up, I will narrow the question a bit:  Would there 
be a way to recover without restarting the Solr node?  Basically to delete one 
replica and then somehow declare the other replica the leader and break it out 
of its recovery process?

Thanks,
Matt


From: Matt Kuiper
Sent: Wednesday, April 01, 2015 8:43 PM
To: solr-user@lucene.apache.org
Subject: How to recover a Shard

Hello,

I have a SolrCloud (4.10.1) where for one of the shards, both replicas are in a 
Recovery Failed state per the Solr Admin Cloud page.  The logs contains the 
following type of entries for the two Solr nodes involved, including statements 
that it will retry.

Is there a way to recover from this state?

Maybe bring down one replica, and then somehow declare that the remaining 
replica is to be the leader?  Understand this would not be ideal as the new 
leader may be missing documents that were sent its way to be indexed while it 
was down, but would be better than having to rebuild the whole cloud.

Any tips or suggestions would be appreciated.

Thanks,
Matt

Solr node .65
Error while trying to recover. 
core=kla_collection_shard6_replica5:org.apache.solr.common.SolrException: No 
registered leader was found after waiting for 4000ms , collection: 
kla_collection slice: shard6
 at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:568)
 at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:551)
 at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:332)
 at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
Solr node .64

Error while trying to recover. 
core=kla_collection_shard6_replica2:org.apache.solr.common.SolrException: No 
registered leader was found after waiting for 4000ms , collection: 
kla_collection slice: shard6

 at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:568)

 at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:551)

 at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:332)

 at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)

Re: Solr went on recovery multiple time.

2015-04-01 Thread William Bell

I would give it 32GB of RAM. And try to use SSD.

On Tue, Mar 31, 2015 at 12:50 AM, sthita sthit...@gmail.com wrote:

 Hi Bill, My index size is around 48GB and contains around 8 million
 documents.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-went-on-recovery-multiple-time-tp4196249p4196504.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076

Re: SolrCloud 5.0 cluster RAM requirements

2015-04-01 Thread Shawn Heisey


On 4/1/2015 3:22 PM, Ryan Steele wrote:
Does a SolrCloud 5.0 cluster need enough RAM across the cluster to 
load all the collections into RAM at all times?


Need is too strong a word.  If you want the best possible performance, 
then you would have enough RAM across the cluster to cache the entire 
index.  That's not required for a *functional* system, ignoring 
performance.  For an index on that scale, caching the entire index is 
usually an unrealistically expensive goal.


Are you the person who mentioned a terabyte scale SolrCloud index on the 
#solr IRC channel that's hosted on Amazon?


Here's a general wiki page on performance problems with Solr that has a 
large amount of focus on RAM:


http://wiki.apache.org/solr/SolrPerformanceProblems

The unfortunate fact about this is that the only way you'll figure out 
what you actually need is to prototype, and prototyping on the scale of 
your index is difficult and expensive.


https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Thanks,
Shawn

Re: Unable to perform search query after changing uniqueKey

2015-04-01 Thread Zheng Lin Edwin Yeo

Hi Steve,

Thanks for the link and the information.

Regards,
Edwin

On 1 April 2015 at 23:17, Erick Erickson erickerick...@gmail.com wrote:

Steve:

Best,
Erick

On Wed, Apr 1, 2015 at 1:01 AM, steve sc_shep...@hotmail.com wrote:
Gently walking into rough waters here, but if you use any API with GET,
you're sending a URI which must be properly encoded. This has nothing to do
with with the programming language that generates key and store pairs on
the browser or the one(s) used on the server. Lots and lots of good folks
have tripped over this one.http://www.w3schools.com/tags/ref_urlencode.asp
Play hard, but play safe!

Date: Wed, 1 Apr 2015 13:58:55 +0800
Subject: Re: Unable to perform search query after changing uniqueKey
From: edwinye...@gmail.com
To: solr-user@lucene.apache.org

Thanks Erick.

Yes, it is able to work correct if I do not use spaces for the field
names,
especially for the uniqueKey.

Regards,
Edwin

On 31 March 2015 at 13:58, Erick Erickson erickerick...@gmail.com
wrote:

Best,
Erick

On Mon, Mar 30, 2015 at 8:59 PM, Zheng Lin Edwin Yeo
edwinye...@gmail.com wrote:
Latest information that I've found for this is that the error only
occurs
for shard2.

Is there any settings that I required to do for shard2 in order to
solve
this issue? Currently I have not made any changes to the shards
since I
created it using

http://localhost:8983/solr/admin/collections?action=CREATEname=nps1numShards=2collection.configName=collection1

Regards,
Edwin

On 31 March 2015 at 09:42, Zheng Lin Edwin Yeo
edwinye...@gmail.com
wrote:

Hi Erick,

I've changed the uniqueKey from id to Item No.

uniqueKeyItem No/uniqueKey

Below are my definitions for both the id and Item No.

field name=id type=string indexed=true stored=true
required=false multiValued=false /
field name=Item No type=text_general indexed=true
stored=true/

Regards,
Edwin

On 30 March 2015 at 23:05, Erick Erickson erickerick...@gmail.com

wrote:

Well, let's see the definition of your ID field, 'cause I'm
puzzled.

It's definitely A Bad Thing to have it be any kind of tokenized
field
though, but that's a shot in the dark.

Best,
Erick

On Mon, Mar 30, 2015 at 2:17 AM, Zheng Lin Edwin Yeo
edwinye...@gmail.com wrote:
Hi Mostafa,

Yes, I've defined all the fields in schema.xml. It is able to
work on
the
version without SolrCloud, but it is not working for the one
with
SolrCloud.
Both of them are using the same schema.xml.

Regards,
Edwin

On 30 March 2015 at 14:34, Mostafa Gomaa
mostafa.goma...@gmail.com
wrote:

Hi Zheng,

It's possible that there's a problem with your schema.xml. Are
all
fields
defined and have appropriate options enabled?

Regards,

Mostafa.

On Mon, Mar 30, 2015 at 7:49 AM, Zheng Lin Edwin Yeo
edwinye...@gmail.com

wrote:

Hi Erick,

I've tried that, and removed the data directory from both the
shards. But
the same problem still occurs, so we probably can rule out
the
memory
issue.

Regards,
Edwin

On 30 March 2015 at 12:39, Erick Erickson
erickerick...@gmail.com
wrote:

I meant shut down Solr and physically remove the entire
data
directory. Not saying this is the cure, but it can't hurt
to
rule
out
the index having memory...

Best,
Erick

On Sun, Mar 29, 2015 at 6:35 PM, Zheng Lin Edwin Yeo
edwinye...@gmail.com wrote:
Hi Erick,

I used the following query to delete all the index.

http://localhost:8983/solr/update?stream.body=
deletequery*:*/query/delete
http://localhost:8983/solr/update?stream.body=commit/

Or is it better

Re: Solr -indexing from csv file having 28 cols taking lot of time ..plz help i m new to solr

thanks Erick and Alexandre Rafalovitch R

one more doubt how to pass ctrl A(^A) seprator while csv upload  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-from-csv-file-having-28-cols-taking-lot-of-time-plz-help-i-m-new-to-solr-tp4196904p4196998.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 3.6, Highlight and multi words?


Dear Charles,

Thanks for your answer, please find below my answers.

ok it works if I use aben as field in my query as you say in Answer 1.
it doesn't work if I use ab may be because ab field is a copyField 
for abfr, aben, abit, abpt


Concerning the 2., yes you have right it's not and but AND

I have this result:

lst  name=DE102009043935B3
arr  name=tien
  strlt;emgt;Bicyclelt;/emgt;  frame comprises holder, particularly for 
water bottle, where holder is connected/str
/arr
arr  name=aben
  str#CMT# #/CMT# Thelt;emgt;bicyclelt;/emgt;  frame (7) comprises a holder 
(1), particularly for a water bottle/str
  str. The holder is connected with thelt;emgt;bicyclelt;/emgt;  frame by a 
screw (5), where a mounting element has a compensation/str
  str  section which is made of an elastic material, particularly 
alt;emgt;plasticlt;/emgt;  material. The compensation section/str
/arr
  /lst


So my last question is why I haven't em/em instead having colored ?
How can I tell to solr to use the colored ?

Thanks a lot,
Bruno


Le 01/04/2015 17:15, Reitzel, Charles a écrit :

Haven't used Solr 3.x in a long time.  But with 4.10.x, I haven't had any 
trouble with multiple terms.  I'd look at a few things.

1.  Do you have a typo in your query?  Shouldn't it be q=aben:(plastic and 
bicycle)?

^^
2. Try removing the word and from the query.  There may be some interaction 
with a stop word filter.  If you want a phrase query, wrap it in quotes.

3.  Also, be sure that the query and indexing analyzers for the aben field are 
compatible with each other.

-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr]
Sent: Wednesday, April 01, 2015 7:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

Sorry to disturb you with the renew but nobody use or have problem with 
multi-terms and highlight ?

regards,

Le 29/03/2015 21:15, Bruno Mannina a écrit :

Dear Solr User,

I try to work with highlight, it works well but only if I have only
one keyword in my query?!
If my request is plastic AND bicycle then only plastic is highlight.

my request is:

./select/?q=ab%3A%28plastic+and+bicycle%29version=2.2start=0row
s=10indent=onhl=truehl.fl=tien,abenfl=pnf.aben.hl.snippets=5


Could you help me please to understand ? I read doc, google, without
success...
so I post here...

my result is:



  lst  name=DE202010012045U1
 arr  name=aben
   str(EP2423092A1) #CMT# #/CMT# The bicycle pedal has a pedal
body (10) made fromlt;emgt;plasticlt;/emgt; material/str
   str, particularly for touring bike. #CMT#ADVANTAGE : #/CMT#
The bicycle pedal has a pedal body made
fromlt;emgt;plasticlt;/emgt;/str
 /arr
   /lst
   lst  name=JP2014091382A
 arr  name=aben
   str betweenlt;emgt;plasticlt;/emgt;  tapes 3 and 3 having
two heat fusion layers, and the twolt;emgt;plasticlt;/emgt;  tapes
3 and 3 are stuck/str
 /arr
   /lst
   lst  name=DE10201740A1
 arr  name=aben
   str  elements. A connecting element is formed as a hinge, a
flexible foil or a flexiblelt;emgt;plasticlt;/emgt;  part.
#CMT#USE/str
 /arr
   /lst
   lst  name=US2008276751A1
 arr  name=aben
   strA bicycle handlebar grip includes an inner fiber layer and
an outerlt;emgt;plasticlt;/emgt; layer. Thus, the fiber/str
   str  handlebar grip, while thelt;emgt;plasticlt;/emgt;
layer is soft and has an adjustable thickness to provide a
comfortable/str
   str  sensation to a user. In addition,
thelt;emgt;plasticlt;/emgt;  layer includes a holding portion
coated on the outer surface/str
   str  layer to enhance the combination strength between the
fiber layer and thelt;emgt;plasticlt;/emgt;  layer and to
enhance/str
 /arr
   /lst


*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*

Re: Customzing Solr Dedupe

2015-04-01 Thread Dan Davis

But you can potentially still use Solr dedupe if you do the upfront work
(in RDMS or NoSQL pre-index processing) to assign some sort of Group ID.
  See OCLC's FRBR Work-Set Algorithm,
http://www.oclc.org/content/dam/research/activities/frbralgorithm/2009-08.pdf?urlm=161376
, for some details on one such algorithm.

If the job is too big for RDBMS, and/or you don't want to use/have a
suitable NoSQL, you can have two Solr indexes (collection/core/whatever) -
one for classification with only id, field1, field2, field3, and another
for production query.   Then, you put stuff into the classification index,
use queries and your own algorithm to do classification, assigning a
groupId, and then put the document with groupId assigned into the
production database.

A key question is whether you want to preserve the groupId.   In some
cases, you do, and in some cases, it is just an internal signature.   In
both cases, a non-deterministic up-front algorithm can work, but if the
groupId needs to be preserved, you need to work harder to make sure it all
hangs together.

Hope this helps,

-Dan

On Wed, Apr 1, 2015 at 7:05 AM, Jack Krupansky jack.krupan...@gmail.com
wrote:

 Solr dedupe is based on the concept of a signature - some fields and rules
 that reduce a document into a discrete signature, and then checking if that
 signature exists as a document key that can be looked up quickly in the
 index. That's the conceptual basis. It is not based on any kind of field by
 field comparison to all existing documents.

 -- Jack Krupansky

 On Wed, Apr 1, 2015 at 6:35 AM, thakkar.aayush thakkar.aay...@gmail.com
 wrote:

  I'm facing a challenges using de-dupliation of Solr documents.
 
  De-duplicate is done using TextProfileSignature with following
 parameters:
  str name=fieldsfield1, field2, field3/str
  str name=quantRate0.5/str
  str name=minTokenLen3/str
 
  Here Field3 is normal text with few lines of data.
  Field1 and Field2 can contain upto 5 or 6 words of data.
 
  I want to de-duplicate when data in field1 and field2 are exactly the
 same
  and 90% of the lines in field3 is matched to that in another document.
 
  Is there anyway to achieve this?
 
 
 
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/Customzing-Solr-Dedupe-tp4196879.html
  Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr 4.10.3 and index.xxxxxxxxxxx directory

2015-04-01 Thread Dominique Bejean

Hi Shawn,

Thank you for your response.

This is a Solrcloud installation on Centos.

There are 5 servers with 128 Gb ram each.
The collection contains 650 millions of small documents.
There are 3 shards with replicationfactor = 2 (so 9 cores).
The JVM Xmx parameter was set to 96 Gb. We changed it yesterday to 32 Gb in
order to be under the CompressedOops limit and free the direct memory for
MMapDirectory.

I will have access to both full solr and tomcat logs tomorrow.

What I know, is that there are some zookeeper time out in solr logs.
And the replications occur on some nodes after some commits (after DIH
import) and when nodes restart.

So, I will have more precise log messages tomorrow.

Thank you for your response.

Dominique



2015-04-01 18:29 GMT+02:00 Shawn Heisey apa...@elyograg.org:

 On 4/1/2015 6:35 AM, Dominique Bejean wrote:
  Is it normal with Solr 4.10.3 that the data directory of replicas still
  contains directories like
 
  index.3636365667474747
  index.999080980976
 
  and files
 
  index.properties
  replica.properties
 
  If yes, why and in which circumstances ?

 The index. directories are created during master/slave
 index replication.  If you're running SolrCloud, then replication is
 only used for index recovery.  Index recovery is only required in
 situations where the replicas are so far behind that the transaction log
 cannot be used to synchronize them, and sometimes happens when a Solr
 node is restarted.  If SolrCloud index recovery is actually required
 when you are NOT restarting Solr instances, your index might be having
 problems.

 Regardless of whether you're running SolrCloud or not, normally when one
 of those directories with a numeric suffix is created, it will be
 changed to index with no suffix after the replication is complete, but
 if Solr is unable to change the directories for some reason, it will
 simply keep and use the new directory with the suffix.  Do you see any
 ERROR or WARN entries in your solr logfile that would indicate why Solr
 cannot change the directory name?  Are you on Windows?  Problems like
 this are more common on Windows, because Windows prevents a lot of file
 operations when files/directories are open.

 The long-term existence of directories with this naming convention
 indicates that *something* went wrong, but you would need to consult
 your logs to find out what happened.  There have been several bugs over
 Solr's history that cause this problem.

 Thanks,
 Shawn

Re: solr 4.10.3 and index.xxxxxxxxxxx directory

I _really_ suspect that with the huge JVM heaps you had, you were hitting long
GC pauses that exceeded the Zookeeper timeout, causing ZK to believe the
node had gone away thus throwing it into recovery mode.

You can enable GC logging to see whether you see such long pauses, but with 96G
it's almost certain that you did.

Reducing the JVM allocation should help, but if you continue to see
nodes go into
recovery for no apparent reason enabling GC logging is a good idea so you have
a record..

See Getting a view into garbage collection here:
https://lucidworks.com/blog/garbage-collection-bootcamp-1-0/

Best
Erick

On Wed, Apr 1, 2015 at 10:35 AM, Dominique Bejean
dominique.bej...@eolya.fr wrote:
 Hi Shawn,

 Thank you for your response.

 This is a Solrcloud installation on Centos.

 There are 5 servers with 128 Gb ram each.
 The collection contains 650 millions of small documents.
 There are 3 shards with replicationfactor = 2 (so 9 cores).
 The JVM Xmx parameter was set to 96 Gb. We changed it yesterday to 32 Gb in
 order to be under the CompressedOops limit and free the direct memory for
 MMapDirectory.

 I will have access to both full solr and tomcat logs tomorrow.

 What I know, is that there are some zookeeper time out in solr logs.
 And the replications occur on some nodes after some commits (after DIH
 import) and when nodes restart.

 So, I will have more precise log messages tomorrow.

 Thank you for your response.

 Dominique



 2015-04-01 18:29 GMT+02:00 Shawn Heisey apa...@elyograg.org:

 On 4/1/2015 6:35 AM, Dominique Bejean wrote:
  Is it normal with Solr 4.10.3 that the data directory of replicas still
  contains directories like
 
  index.3636365667474747
  index.999080980976
 
  and files
 
  index.properties
  replica.properties
 
  If yes, why and in which circumstances ?

 The index. directories are created during master/slave
 index replication.  If you're running SolrCloud, then replication is
 only used for index recovery.  Index recovery is only required in
 situations where the replicas are so far behind that the transaction log
 cannot be used to synchronize them, and sometimes happens when a Solr
 node is restarted.  If SolrCloud index recovery is actually required
 when you are NOT restarting Solr instances, your index might be having
 problems.

 Regardless of whether you're running SolrCloud or not, normally when one
 of those directories with a numeric suffix is created, it will be
 changed to index with no suffix after the replication is complete, but
 if Solr is unable to change the directories for some reason, it will
 simply keep and use the new directory with the suffix.  Do you see any
 ERROR or WARN entries in your solr logfile that would indicate why Solr
 cannot change the directory name?  Are you on Windows?  Problems like
 this are more common on Windows, because Windows prevents a lot of file
 operations when files/directories are open.

 The long-term existence of directories with this naming convention
 indicates that *something* went wrong, but you would need to consult
 your logs to find out what happened.  There have been several bugs over
 Solr's history that cause this problem.

 Thanks,
 Shawn

RE: Solr 3.6, Highlight and multi words?

2015-04-01 Thread Reitzel, Charles

If you want to query on the field ab, you'll probably need to add it the qf 
parameter.

To control the highlighting markup, with the standard highlighter, use 
hl.simple.pre and hl.simple.post.   

https://cwiki.apache.org/confluence/display/solr/Standard+Highlighter


-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr] 
Sent: Wednesday, April 01, 2015 2:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

Dear Charles,

Thanks for your answer, please find below my answers.

ok it works if I use aben as field in my query as you say in Answer 1.
it doesn't work if I use ab may be because ab field is a copyField for 
abfr, aben, abit, abpt

Concerning the 2., yes you have right it's not and but AND

I have this result:

lst  name=DE102009043935B3
 arr  name=tien
   strlt;emgt;Bicyclelt;/emgt;  frame comprises holder, particularly 
for water bottle, where holder is connected/str
 /arr
 arr  name=aben
   str#CMT# #/CMT# Thelt;emgt;bicyclelt;/emgt;  frame (7) comprises a 
holder (1), particularly for a water bottle/str
   str. The holder is connected with thelt;emgt;bicyclelt;/emgt;  
frame by a screw (5), where a mounting element has a compensation/str
   str  section which is made of an elastic material, particularly 
alt;emgt;plasticlt;/emgt;  material. The compensation section/str
 /arr
   /lst


So my last question is why I haven't em/em instead having colored ?
How can I tell to solr to use the colored ?

Thanks a lot,
Bruno


Le 01/04/2015 17:15, Reitzel, Charles a écrit :
 Haven't used Solr 3.x in a long time.  But with 4.10.x, I haven't had any 
 trouble with multiple terms.  I'd look at a few things.

 1.  Do you have a typo in your query?  Shouldn't it be q=aben:(plastic and 
 bicycle)?
   
   
 ^^ 2. Try removing the word and from the query.  There may be some 
 interaction with a stop word filter.  If you want a phrase query, wrap it in 
 quotes.

 3.  Also, be sure that the query and indexing analyzers for the aben field 
 are compatible with each other.

 -Original Message-
 From: Bruno Mannina [mailto:bmann...@free.fr]
 Sent: Wednesday, April 01, 2015 7:05 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr 3.6, Highlight and multi words?

 Sorry to disturb you with the renew but nobody use or have problem with 
 multi-terms and highlight ?

 regards,

 Le 29/03/2015 21:15, Bruno Mannina a écrit :
 Dear Solr User,

 I try to work with highlight, it works well but only if I have only 
 one keyword in my query?!
 If my request is plastic AND bicycle then only plastic is highlight.

 my request is:

 ./select/?q=ab%3A%28plastic+and+bicycle%29version=2.2start=0ro
 w
 s=10indent=onhl=truehl.fl=tien,abenfl=pnf.aben.hl.snippets=5


 Could you help me please to understand ? I read doc, google, without 
 success...
 so I post here...

 my result is:

 

   lst  name=DE202010012045U1
  arr  name=aben
str(EP2423092A1) #CMT# #/CMT# The bicycle pedal has a pedal 
 body (10) made fromlt;emgt;plasticlt;/emgt; material/str
str, particularly for touring bike. #CMT#ADVANTAGE : #/CMT# 
 The bicycle pedal has a pedal body made 
 fromlt;emgt;plasticlt;/emgt;/str
  /arr
/lst
lst  name=JP2014091382A
  arr  name=aben
str betweenlt;emgt;plasticlt;/emgt;  tapes 3 and 3 
 having two heat fusion layers, and the 
 twolt;emgt;plasticlt;/emgt;  tapes
 3 and 3 are stuck/str
  /arr
/lst
lst  name=DE10201740A1
  arr  name=aben
str  elements. A connecting element is formed as a hinge, a 
 flexible foil or a flexiblelt;emgt;plasticlt;/emgt;  part.
 #CMT#USE/str
  /arr
/lst
lst  name=US2008276751A1
  arr  name=aben
strA bicycle handlebar grip includes an inner fiber layer 
 and an outerlt;emgt;plasticlt;/emgt; layer. Thus, the fiber/str
str  handlebar grip, while thelt;emgt;plasticlt;/emgt; 
 layer is soft and has an adjustable thickness to provide a 
 comfortable/str
str  sensation to a user. In addition, 
 thelt;emgt;plasticlt;/emgt;  layer includes a holding portion 
 coated on the outer surface/str
str  layer to enhance the combination strength between the 
 fiber layer and thelt;emgt;plasticlt;/emgt;  layer and to 
 enhance/str
  /arr
/lst

 **
 *** This e-mail may contain confidential or privileged information.
 If you are not the intended recipient, please notify the sender immediately 
 and then delete it.

 TIAA-CREF
 **
 ***


*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

Re: Solr 3.6, Highlight and multi words?


ok for qf (i can't test now)

but concerning hl.simple.pre hl.simple.post I can define only one color no ?

in the sample solrconfig.xml there are several color,

!-- multi-colored tag FragmentsBuilder --
  fragmentsBuilder  name=colored
class=solr.highlight.ScoreOrderFragmentsBuilder
lst  name=defaults
  str  name=hl.tag.pre![CDATA[
   b style=background:yellow,b style=background:lawgreen,
   b style=background:aquamarine,b style=background:magenta,
   b style=background:palegreen,b style=background:coral,
   b style=background:wheat,b style=background:khaki,
   b style=background:lime,b 
style=background:deepskyblue]]/str
  str  name=hl.tag.post![CDATA[/b]]/str
/lst
  /fragmentsBuilder

How can I tell to solr to use these color instead of hl.simple.pre/post ?



Le 01/04/2015 20:58, Reitzel, Charles a écrit :

If you want to query on the field ab, you'll probably need to add it the qf 
parameter.

To control the highlighting markup, with the standard highlighter, use 
hl.simple.pre and hl.simple.post.

https://cwiki.apache.org/confluence/display/solr/Standard+Highlighter


-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr]
Sent: Wednesday, April 01, 2015 2:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

Dear Charles,

Thanks for your answer, please find below my answers.

ok it works if I use aben as field in my query as you say in Answer 1.
it doesn't work if I use ab may be because ab field is a copyField for 
abfr, aben, abit, abpt

Concerning the 2., yes you have right it's not and but AND

I have this result:

lst  name=DE102009043935B3
  arr  name=tien
strlt;emgt;Bicyclelt;/emgt;  frame comprises holder, particularly for 
water bottle, where holder is connected/str
  /arr
  arr  name=aben
str#CMT# #/CMT# Thelt;emgt;bicyclelt;/emgt;  frame (7) comprises a 
holder (1), particularly for a water bottle/str
str. The holder is connected with thelt;emgt;bicyclelt;/emgt;  frame by 
a screw (5), where a mounting element has a compensation/str
str  section which is made of an elastic material, particularly 
alt;emgt;plasticlt;/emgt;  material. The compensation section/str
  /arr
/lst


So my last question is why I haven't em/em instead having colored ?
How can I tell to solr to use the colored ?

Thanks a lot,
Bruno


Le 01/04/2015 17:15, Reitzel, Charles a écrit :

Haven't used Solr 3.x in a long time.  But with 4.10.x, I haven't had any 
trouble with multiple terms.  I'd look at a few things.

1.  Do you have a typo in your query?  Shouldn't it be q=aben:(plastic and 
bicycle)?
 
^^ 2. Try removing the word and from the query.  There may be some interaction with a stop word filter.  If you want a phrase query, wrap it in quotes.


3.  Also, be sure that the query and indexing analyzers for the aben field are 
compatible with each other.

-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr]
Sent: Wednesday, April 01, 2015 7:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

Sorry to disturb you with the renew but nobody use or have problem with 
multi-terms and highlight ?

regards,

Le 29/03/2015 21:15, Bruno Mannina a écrit :

Dear Solr User,

I try to work with highlight, it works well but only if I have only
one keyword in my query?!
If my request is plastic AND bicycle then only plastic is highlight.

my request is:

./select/?q=ab%3A%28plastic+and+bicycle%29version=2.2start=0ro
w
s=10indent=onhl=truehl.fl=tien,abenfl=pnf.aben.hl.snippets=5


Could you help me please to understand ? I read doc, google, without
success...
so I post here...

my result is:



   lst  name=DE202010012045U1
  arr  name=aben
str(EP2423092A1) #CMT# #/CMT# The bicycle pedal has a pedal
body (10) made fromlt;emgt;plasticlt;/emgt; material/str
str, particularly for touring bike. #CMT#ADVANTAGE : #/CMT#
The bicycle pedal has a pedal body made
fromlt;emgt;plasticlt;/emgt;/str
  /arr
/lst
lst  name=JP2014091382A
  arr  name=aben
str betweenlt;emgt;plasticlt;/emgt;  tapes 3 and 3
having two heat fusion layers, and the
twolt;emgt;plasticlt;/emgt;  tapes
3 and 3 are stuck/str
  /arr
/lst
lst  name=DE10201740A1
  arr  name=aben
str  elements. A connecting element is formed as a hinge, a
flexible foil or a flexiblelt;emgt;plasticlt;/emgt;  part.
#CMT#USE/str
  /arr
/lst
lst  name=US2008276751A1
  arr  name=aben
strA bicycle handlebar grip includes an inner fiber layer
and an outerlt;emgt;plasticlt;/emgt; layer. Thus, the fiber/str
str  handlebar grip, while thelt;emgt;plasticlt;/emgt;
layer is soft and has an adjustable thickness to

Re: Solr -indexing from csv file having 28 cols taking lot of time ..plz help i m new to solr

2015-04-01 Thread Alexandre Rafalovitch

That's an interesting question. The reference shows you how to set a
separator, but ^A is a special case. You may need to pass it in as a
URL escape character or similar.

But I would first get a sample working with more conventional
separator and then worry about ^A. Just so you are not confusing
several problems.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 2 April 2015 at 05:05, avinash09 avinash.i...@gmail.com wrote:
 thanks Erick and Alexandre Rafalovitch R

 one more doubt how to pass ctrl A(^A) seprator while csv upload




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-indexing-from-csv-file-having-28-cols-taking-lot-of-time-plz-help-i-m-new-to-solr-tp4196904p4196998.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 3.6, Highlight and multi words?