Re: Setting up SolrCloud 5.0.0 and ZooKeeper 3.4.6

2015-04-07 Thread Zheng Lin Edwin Yeo
Thanks Swaraj.

It is working now, after I run without start, and changing the zookeeper
port to 2888 instead.

Regards,
Edwin


On 7 April 2015 at 14:59, Swaraj Kumar swaraj2...@gmail.com wrote:

 As per http://stackoverflow.com/questions/11765015/zookeeper-not-starting
 http://stackoverflow.com/questions/11765015/zookeeper-not-starting
 Running without start will fix this.

 One more change you need to do is Solr default runs on 8983 and you have
 used 8983 in zookeeper so start solr on different port.

 Regards,


 Swaraj Kumar
 Senior Software Engineer I
 MakeMyTrip.com
 Mob No- 9811774497

 On Tue, Apr 7, 2015 at 9:42 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com
 wrote:

  Hi Erick,
 
  I think I'll just setup the ZooKeeper server in standalone mode first,
  before I get more confused as I'm quite new to both Solr and ZooKeeper
 too.
  Better not to jump the gun.
 
  However, I face this error when I try to start it in standalone mode.
 
  2015-04-07 11:59:51,789 [myid:] - ERROR [main:ZooKeeperServerMain@54] -
  Invalid arguments, exiting abnormally
  java.lang.NumberFormatException: For input string:
  C:\Users\edwin\zookeeper-3.4.6\bin\..\conf\zoo.cfg
  at java.lang.NumberFormatException.forInputString(Unknown Source)
  at java.lang.Integer.parseInt(Unknown Source)
  at java.lang.Integer.parseInt(Unknown Source)
  at org.apache.zookeeper.server.ServerConfig.parse(ServerConfig.java:60)
  at
 
 
 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:83)
  at
 
 
 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
  at
 
 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
  at
 
 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
  2015-04-07 11:59:51,796 [myid:] - INFO  [main:ZooKeeperServerMain@55] -
  Usage: ZooKeeperServerMain configfile | port datadir [ticktime]
 [maxcnxns]
 
 
  I have the following information in my zoo.cfg:
 
  tickTime=2000
  initLimit=10
  syncLimit=5
  dataDir=C:\\Users\\edwin\\zookeeper-3.4.6\\singleserver
  clientPort=8983
 
 
  I got the same error even if I set the clientPort=2888.
 
 
  Regards,
  Edwin
 
 
 
  On 7 April 2015 at 11:26, Erick Erickson erickerick...@gmail.com
 wrote:
 
   Believe me, I'm no Zookeeper expert, but it looks to me like you're
   mixing Solr ports and Zookeeper ports. AFAIK, the two ports in
   the zoo.cfg file are exclusively for the Zookeeper instances to talk
   to each other. Zookeeper isn't aware that the listening nodes are
   Solr noodes, so putting Solr ports in there is confusing Zookeeper
   I'd guess.
  
   Assuming you're starting your three ZK instances on ports 2888, 2889
 and
   2890,
   I'd expect the proper ports are
   2888:3888
   2889:3889
   2890:3890
  
   But as I said I'm not a Zookeeper expert so beware..
  
  
   Best,
   Erick
  
   On Mon, Apr 6, 2015 at 7:57 PM, Zheng Lin Edwin Yeo
   edwinye...@gmail.com wrote:
Hi,
   
I'm using Solr 5.0.0 and ZooKeeper 3.4.6. I'm trying to set up a
   ZooKeeper
with simulation of 3 servers, but they are all located on the same
   machine
for testing purpose.
   
In my zoo.cfg file, I have listed down the 3 servers to be as
 follows:
server.1=localhost:8983:3888
server.2=localhost:8984:3889
server.3=localhost:8985:3890
   
Then I try to start Solr using the following command:
bin/solr start -e cloud -z localhost:8983-noprompt
   
However, I'm unable to establish a connection from my Solr to the
ZooKeeper. Is this configuration possible, or is there anything
 which I
missed out?
   
Thank you in advance for your help.
   
Regards,
Edwin
  
 



Re: Solr 4.2.0 index corruption issue

2015-04-07 Thread Puneet Jain
HI Guys,

Please can someone help out here to pin-point the issue..?

Thanks  Regards,
Puneet

On Mon, Apr 6, 2015 at 1:27 PM, Puneet Jain ja.pun...@gmail.com wrote:

 Hi Guys,

 I am using 4.2.0 since more than a year and since last October 2014 facing
 index corruption issue. However, now it is happening everyday and have to
 built a fresh index for the temporary fix. Please find the logs below where
 i can see an error while replicating data from master to slave and notice
 the index corruption issue at slave nodes:

 2015-04-05 00:00:37,671 ERROR snapPuller-15-thread-1 [handler.SnapPuller]
 - Error closing the file stream: _1re_Lucene41_0.tim
 java.io.IOException: Input/output error
 at java.io.RandomAccessFile.close0(Native Method)
 at java.io.RandomAccessFile.close(RandomAccessFile.java:543)
 at
 org.apache.lucene.store.FSDirectory$FSIndexOutput.close(FSDirectory.java:494)
 at
 org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1223)
 at
 org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1117)
 at
 org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:744)
 at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:398)
 at
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:281)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:223)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
 at java.lang.Thread.run(Thread.java:619)

 Not getting exact solution for the same was thinking to upgrade to SOLR
 4.7.0 as it uses new versions of httpcomponents and i thought that older
 version have some issues. Please can someone recommend what can be done to
 avoid the index corruption issue in SOLR 4.2.0.

 Thanks in advance..!

 Thanks  Regards,
 Puneet



Re: Collapse and Expand behaviour on result with 1 document.

2015-04-07 Thread Derek Poh

Hi Joel

Is the number of documents info available when using collapse and expand 
parameters?


I can't seem to find it in the return xml.
I know the numFound in the the main result set (result 
maxScore=6.470696 name=response numFound=27 start=0) refer to 
the number of collapse groups.


I need to issue another query without the collapse and expand parameters 
to get the total number of documents?
Or is there any fieldor parameter that indicate the number of documents 
that can be return through 'fl' parameter?


I am trying to display such info on the front-end,

571 led results from 240 suppliers.


On 4/1/2015 7:05 PM, Joel Bernstein wrote:

Exactly correct.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Apr 1, 2015 at 5:44 AM, Derek Poh d...@globalsources.com wrote:


Hi Joel

Correct me if my understanding is wrong.
Using supplier id as the field to collapse on.

- If thecollapse group heads inthe main result set has only 1document in
each group, the expanded section will be empty since there are no documents
to expandfor each collapse group.
- To render the page, I need to iterate the main result set. For each
document I have to check if there is an expanded group with the same
supplier id.
- The facets counts is based on the number of collapse groupsin the main
result set (result maxScore=6.470696 name=response numFound=27
start=0)

-Derek


On 3/31/2015 7:43 PM, Joel Bernstein wrote:


The way that collapse/expand is designed to be used is as follows:

The main result set will contain the collapsed group heads.

The expanded section will contain the expanded groups for the page of
results.

To render the page you iterate the main result set. For each document
check
to see if there is an expanded group.




Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Mar 31, 2015 at 7:37 AM, Joel Bernstein joels...@gmail.com
wrote:

  You should be able to use collapse/expand with one result.

Does the document in the main result set have group members that aren't
being expanded?



Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Mar 31, 2015 at 2:00 AM, Derek Poh d...@globalsources.com
wrote:

  If I want to group the results (by a certain field) even if there is

only
1 document, I should use the group parameter instead?
The requirement is to group the result of product documents by their
supplier id.
group=truegroup.field=P_SupplierIdgroup.limit=5

Is it true that the performance of collapse is better than group
parameter on large data set, say 10-20 million documents?

-Derek


On 3/31/2015 10:03 AM, Joel Bernstein wrote:

  The expanded section will only include groups that have expanded

documents.

So, if the document that in the main result set has no documents to
expand,
then this is working as expected.



Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Mar 30, 2015 at 8:43 PM, Derek Poh d...@globalsources.com
wrote:

   Hi


I have a query which return 1 document.
When I add the collapse and expand parameters to it,
expand=trueexpand.rows=5fq={!collapse%20field=P_SupplierId}, the
expanded section is empty (lst name=expanded/).

Is this the behaviour of collapse and expand parameters on result
which
contain only 1 document?

-Derek









RE: How do I use CachedSqlEntityProcessor?

2015-04-07 Thread chuotlac
The conversation helps me understand Cached processor a lot. I'm working on
DIH cache using MapDB as backed engine instead of default
CachedSqlEntityProcessor



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4198037.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Setting up SolrCloud 5.0.0 and ZooKeeper 3.4.6

2015-04-07 Thread Swaraj Kumar
As per http://stackoverflow.com/questions/11765015/zookeeper-not-starting
http://stackoverflow.com/questions/11765015/zookeeper-not-starting
Running without start will fix this.

One more change you need to do is Solr default runs on 8983 and you have
used 8983 in zookeeper so start solr on different port.

Regards,


Swaraj Kumar
Senior Software Engineer I
MakeMyTrip.com
Mob No- 9811774497

On Tue, Apr 7, 2015 at 9:42 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com
wrote:

 Hi Erick,

 I think I'll just setup the ZooKeeper server in standalone mode first,
 before I get more confused as I'm quite new to both Solr and ZooKeeper too.
 Better not to jump the gun.

 However, I face this error when I try to start it in standalone mode.

 2015-04-07 11:59:51,789 [myid:] - ERROR [main:ZooKeeperServerMain@54] -
 Invalid arguments, exiting abnormally
 java.lang.NumberFormatException: For input string:
 C:\Users\edwin\zookeeper-3.4.6\bin\..\conf\zoo.cfg
 at java.lang.NumberFormatException.forInputString(Unknown Source)
 at java.lang.Integer.parseInt(Unknown Source)
 at java.lang.Integer.parseInt(Unknown Source)
 at org.apache.zookeeper.server.ServerConfig.parse(ServerConfig.java:60)
 at

 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:83)
 at

 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
 at

 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
 at

 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
 2015-04-07 11:59:51,796 [myid:] - INFO  [main:ZooKeeperServerMain@55] -
 Usage: ZooKeeperServerMain configfile | port datadir [ticktime] [maxcnxns]


 I have the following information in my zoo.cfg:

 tickTime=2000
 initLimit=10
 syncLimit=5
 dataDir=C:\\Users\\edwin\\zookeeper-3.4.6\\singleserver
 clientPort=8983


 I got the same error even if I set the clientPort=2888.


 Regards,
 Edwin



 On 7 April 2015 at 11:26, Erick Erickson erickerick...@gmail.com wrote:

  Believe me, I'm no Zookeeper expert, but it looks to me like you're
  mixing Solr ports and Zookeeper ports. AFAIK, the two ports in
  the zoo.cfg file are exclusively for the Zookeeper instances to talk
  to each other. Zookeeper isn't aware that the listening nodes are
  Solr noodes, so putting Solr ports in there is confusing Zookeeper
  I'd guess.
 
  Assuming you're starting your three ZK instances on ports 2888, 2889 and
  2890,
  I'd expect the proper ports are
  2888:3888
  2889:3889
  2890:3890
 
  But as I said I'm not a Zookeeper expert so beware..
 
 
  Best,
  Erick
 
  On Mon, Apr 6, 2015 at 7:57 PM, Zheng Lin Edwin Yeo
  edwinye...@gmail.com wrote:
   Hi,
  
   I'm using Solr 5.0.0 and ZooKeeper 3.4.6. I'm trying to set up a
  ZooKeeper
   with simulation of 3 servers, but they are all located on the same
  machine
   for testing purpose.
  
   In my zoo.cfg file, I have listed down the 3 servers to be as follows:
   server.1=localhost:8983:3888
   server.2=localhost:8984:3889
   server.3=localhost:8985:3890
  
   Then I try to start Solr using the following command:
   bin/solr start -e cloud -z localhost:8983-noprompt
  
   However, I'm unable to establish a connection from my Solr to the
   ZooKeeper. Is this configuration possible, or is there anything which I
   missed out?
  
   Thank you in advance for your help.
  
   Regards,
   Edwin
 



What is the best way of Indexing different formats of documents?

2015-04-07 Thread sangeetha.subraman...@gtnexus.com
Hi,

I am a newbie to SOLR and basically from database background. We have a 
requirement of indexing files of different formats (x12,edifact, csv,xml).
The files which are inputted can be of any format and we need to do a content 
based search on it.

From the web I understand we can use TIKA processor to extract the content and 
store it in SOLR. What I want to know is, is there any better approach for 
indexing files in SOLR ? Can we index the document through streaming directly 
from the Application ? If so what is the disadvantage of using it (against DIH 
which fetches from the database)? Could someone share me some insight on this 
? ls there any web links which I can refer to get some idea on it ? Please do 
help.

Thanks
Sangeetha



Re: Collapse and Expand behaviour on result with 1 document.

2015-04-07 Thread Joel Bernstein
I believe currently issuing another query will be necessary to get the
count of the expanded result set.

I think it does make sense to include this information as part of the
ExpandComponent output. So feel free to create a jira ticket for this and
we should be able to get this into a future release.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Apr 7, 2015 at 3:27 AM, Derek Poh d...@globalsources.com wrote:

 Hi Joel

 Is the number of documents info available when using collapse and expand
 parameters?

 I can't seem to find it in the return xml.
 I know the numFound in the the main result set (result
 maxScore=6.470696 name=response numFound=27 start=0) refer to the
 number of collapse groups.

 I need to issue another query without the collapse and expand parameters
 to get the total number of documents?
 Or is there any fieldor parameter that indicate the number of documents
 that can be return through 'fl' parameter?

 I am trying to display such info on the front-end,

 571 led results from 240 suppliers.



 On 4/1/2015 7:05 PM, Joel Bernstein wrote:

 Exactly correct.

 Joel Bernstein
 http://joelsolr.blogspot.com/

 On Wed, Apr 1, 2015 at 5:44 AM, Derek Poh d...@globalsources.com wrote:

  Hi Joel

 Correct me if my understanding is wrong.
 Using supplier id as the field to collapse on.

 - If thecollapse group heads inthe main result set has only 1document in
 each group, the expanded section will be empty since there are no
 documents
 to expandfor each collapse group.
 - To render the page, I need to iterate the main result set. For each
 document I have to check if there is an expanded group with the same
 supplier id.
 - The facets counts is based on the number of collapse groupsin the main
 result set (result maxScore=6.470696 name=response numFound=27
 start=0)

 -Derek


 On 3/31/2015 7:43 PM, Joel Bernstein wrote:

  The way that collapse/expand is designed to be used is as follows:

 The main result set will contain the collapsed group heads.

 The expanded section will contain the expanded groups for the page of
 results.

 To render the page you iterate the main result set. For each document
 check
 to see if there is an expanded group.




 Joel Bernstein
 http://joelsolr.blogspot.com/

 On Tue, Mar 31, 2015 at 7:37 AM, Joel Bernstein joels...@gmail.com
 wrote:

   You should be able to use collapse/expand with one result.

 Does the document in the main result set have group members that aren't
 being expanded?



 Joel Bernstein
 http://joelsolr.blogspot.com/

 On Tue, Mar 31, 2015 at 2:00 AM, Derek Poh d...@globalsources.com
 wrote:

   If I want to group the results (by a certain field) even if there is

 only
 1 document, I should use the group parameter instead?
 The requirement is to group the result of product documents by their
 supplier id.
 group=truegroup.field=P_SupplierIdgroup.limit=5

 Is it true that the performance of collapse is better than group
 parameter on large data set, say 10-20 million documents?

 -Derek


 On 3/31/2015 10:03 AM, Joel Bernstein wrote:

   The expanded section will only include groups that have expanded

 documents.

 So, if the document that in the main result set has no documents to
 expand,
 then this is working as expected.



 Joel Bernstein
 http://joelsolr.blogspot.com/

 On Mon, Mar 30, 2015 at 8:43 PM, Derek Poh d...@globalsources.com
 wrote:

Hi

  I have a query which return 1 document.
 When I add the collapse and expand parameters to it,
 expand=trueexpand.rows=5fq={!collapse%20field=P_SupplierId},
 the
 expanded section is empty (lst name=expanded/).

 Is this the behaviour of collapse and expand parameters on result
 which
 contain only 1 document?

 -Derek









Re: What is the best way of Indexing different formats of documents?

2015-04-07 Thread Swaraj Kumar
You can always choose either DIH or /update/extract to index docs in solr.
Now there are multiple benefits of DIH which I am listing below :-

1. Clean and update using a single command.
2. DIH also optimize indexing using optimize=true
3. You can do delta-import based on last index time where as in case of
/update/extract you need to do manual operation in case of delta import.
4. You can use multiple entity processor and transformers in case of DIH
which is very useful to index exact data you want.
5. Query parameter rows limits the num of records.

Regards,


Swaraj Kumar
Senior Software Engineer I
MakeMyTrip.com
Mob No- 9811774497

On Tue, Apr 7, 2015 at 4:18 PM, sangeetha.subraman...@gtnexus.com 
sangeetha.subraman...@gtnexus.com wrote:

 Hi,

 I am a newbie to SOLR and basically from database background. We have a
 requirement of indexing files of different formats (x12,edifact, csv,xml).
 The files which are inputted can be of any format and we need to do a
 content based search on it.

 From the web I understand we can use TIKA processor to extract the content
 and store it in SOLR. What I want to know is, is there any better approach
 for indexing files in SOLR ? Can we index the document through streaming
 directly from the Application ? If so what is the disadvantage of using it
 (against DIH which fetches from the database)? Could someone share me some
 insight on this ? ls there any web links which I can refer to get some idea
 on it ? Please do help.

 Thanks
 Sangeetha




Lucene indexWriter update does not affect Solr search

2015-04-07 Thread Ali Nazemian
I implement a small code for the purpose of extracting some keywords out of
Lucene index. I did implement that using search component. My problem is
when I tried to update Lucene IndexWriter, Solr index which is placed on
top of that, does not affect. As you can see I did the commit part.

BooleanQuery query = new BooleanQuery();
for (String fieldName : keywordSourceFields) {
  TermQuery termQuery = new TermQuery(new Term(fieldName,N/A));
  query.add(termQuery, Occur.MUST_NOT);
}
TermQuery termQuery=new TermQuery(new Term(keywordField, N/A));
query.add(termQuery, Occur.MUST);
try {
  //Query q= new QueryParser(keywordField, new
StandardAnalyzer()).parse(query.toString());
  TopDocs results = searcher.search(query,
  maxNumDocs);
  ScoreDoc[] hits = results.scoreDocs;
  IndexWriter writer = getLuceneIndexWriter(searcher.getPath());
  for (int i = 0; i  hits.length; i++) {
Document document = searcher.doc(hits[i].doc);
ListString keywords = keyword.getKeywords(hits[i].doc);
if(keywords.size()0) document.removeFields(keywordField);
for (String word : keywords) {
  document.add(new StringField(keywordField, word,
Field.Store.YES));
}
String uniqueKey =
searcher.getSchema().getUniqueKeyField().getName();
writer.updateDocument(new Term(uniqueKey,
document.get(uniqueKey)),
document);
  }
  writer.commit();
  writer.forceMerge(1);
  writer.close();
} catch (IOException | SyntaxError e) {
  throw new RuntimeException();
}

Please help me through solving this problem.

-- 
A.Nazemian


Re: Lucene indexWriter update does not affect Solr search

2015-04-07 Thread Upayavira
What are you trying to do? A search component is not intended for
updating the index, so it really doesn’t surprise me that you aren’t
seeing updates.

I’d suggest you describe the problem you are trying to solve before
proposing solutions.

Upayavira


On Tue, Apr 7, 2015, at 01:32 PM, Ali Nazemian wrote:
 I implement a small code for the purpose of extracting some keywords out
 of
 Lucene index. I did implement that using search component. My problem is
 when I tried to update Lucene IndexWriter, Solr index which is placed on
 top of that, does not affect. As you can see I did the commit part.
 
 BooleanQuery query = new BooleanQuery();
 for (String fieldName : keywordSourceFields) {
   TermQuery termQuery = new TermQuery(new Term(fieldName,N/A));
   query.add(termQuery, Occur.MUST_NOT);
 }
 TermQuery termQuery=new TermQuery(new Term(keywordField, N/A));
 query.add(termQuery, Occur.MUST);
 try {
   //Query q= new QueryParser(keywordField, new
 StandardAnalyzer()).parse(query.toString());
   TopDocs results = searcher.search(query,
   maxNumDocs);
   ScoreDoc[] hits = results.scoreDocs;
   IndexWriter writer = getLuceneIndexWriter(searcher.getPath());
   for (int i = 0; i  hits.length; i++) {
 Document document = searcher.doc(hits[i].doc);
 ListString keywords = keyword.getKeywords(hits[i].doc);
 if(keywords.size()0) document.removeFields(keywordField);
 for (String word : keywords) {
   document.add(new StringField(keywordField, word,
 Field.Store.YES));
 }
 String uniqueKey =
 searcher.getSchema().getUniqueKeyField().getName();
 writer.updateDocument(new Term(uniqueKey,
 document.get(uniqueKey)),
 document);
   }
   writer.commit();
   writer.forceMerge(1);
   writer.close();
 } catch (IOException | SyntaxError e) {
   throw new RuntimeException();
 }
 
 Please help me through solving this problem.
 
 -- 
 A.Nazemian


Re: What is the best way of Indexing different formats of documents?

2015-04-07 Thread Upayavira


On Tue, Apr 7, 2015, at 11:48 AM, sangeetha.subraman...@gtnexus.com
wrote:
 Hi,
 
 I am a newbie to SOLR and basically from database background. We have a
 requirement of indexing files of different formats (x12,edifact,
 csv,xml).
 The files which are inputted can be of any format and we need to do a
 content based search on it.
 
 From the web I understand we can use TIKA processor to extract the
 content and store it in SOLR. What I want to know is, is there any better
 approach for indexing files in SOLR ? Can we index the document through
 streaming directly from the Application ? If so what is the disadvantage
 of using it (against DIH which fetches from the database)? Could someone
 share me some insight on this ? ls there any web links which I can refer
 to get some idea on it ? Please do help.

You can have Solr do the TIKA work for you, by posting to
update/extract. See here:

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika

You can only post one document at a time, and you will have to provide
extra metadata fields in the URL you post to (e.g. the document ID).

If the extracting update handler can handle what you need, then you are
good. Otherwise, you will want to write your own code to call Tika, then
push the extracted content as a plain document.

Solr is just an HTTP server, so your application can post binary files
for Solr to ingest with Tika, or otherwise.

Upayavira


Re: DictionaryCompoundWordTokenFilterFactory - Dictionary/Compound-Words File

2015-04-07 Thread Mike L.

Typo:   *even when the user delimits with a space. (e.g. base ball should find 
baseball). 

Thanks,
  From: Mike L. javaone...@yahoo.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org 
 Sent: Tuesday, April 7, 2015 9:05 AM
 Subject: DictionaryCompoundWordTokenFilterFactory - Dictionary/Compound-Words 
File
   

Solr User Group -

   I have a case where I need to be able to search against compound words, even 
when the user delimits with a space. (e.g. baseball = base ball).  I think 
I've solved this by creating a compound-words dictionary file containing the 
split words that I would want DictionaryCompoundWordTokenFilterFactory to split.
 base \n  
ball
I also applied in the synonym file the following rule: baseball = base ball  ( 
to allow baseball to also get a hit)
   filter class=solr.DictionaryCompoundWordTokenFilterFactory 
dictionary=compound-words.txt minWordSize=5 minSubwordSize=2 
maxSubwordSize=15 onlyLongestMatch=true/   
  
Two questions - If I could in advance figure out all the compound words I would 
want to split, would it be better (more reliable results) for me to maintain 
this compount-words file or would it be better to throw one of those open 
office dictionaries at it the filter?
Also - Any better suggestions to dealing with this problem vs the one I 
described using both the dictionary filter and the synonym rule?
Thanks in advance!
Mike



  

Re: Lucene indexWriter update does not affect Solr search

2015-04-07 Thread Ali Nazemian
I did some investigation and found out that the retrieving part of
documents works fine while Solr did not restarted. But the searching part
of documents did not work. After I restarted Solr it seems that the core
corrupted and failed to start! Here is the corresponding log:

org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.init(SolrCore.java:896)
at org.apache.solr.core.SolrCore.init(SolrCore.java:662)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:513)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:278)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:272)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1604)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1716)
at org.apache.solr.core.SolrCore.init(SolrCore.java:868)
... 9 more
Caused by: org.apache.lucene.index.IndexNotFoundException: no
segments* file found in
NRTCachingDirectory(MMapDirectory@C:\Users\Ali\workspace\lucene_solr_5_0_0\solr\server\solr\document\data\index
lockFactory=org.apache.lucene.store.SimpleFSLockFactory@3bf76891;
maxCacheMB=48.0 maxMergeSizeMB=4.0): files: [_2_Lucene50_0.doc,
write.lock, _2_Lucene50_0.pos, _2.nvd, _2.fdt, _2_Lucene50_0.tim]
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:821)
at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:78)
at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:65)
at 
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:272)
at 
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:115)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1573)
... 11 more

4/7/2015, 6:53:26 PM
ERROR
SolrIndexWriter
SolrIndexWriter was not closed prior to finalize(),​ indicates a bug
-- POSSIBLE RESOURCE LEAK!!!
4/7/2015, 6:53:26 PM
ERROR
SolrIndexWriter
Error closing IndexWriter
java.lang.NullPointerException
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2959)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2927)
at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:965)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1010)
at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:130)
at org.apache.solr.update.SolrIndexWriter.finalize(SolrIndexWriter.java:183)
at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:101)
at java.lang.ref.Finalizer.access$100(Finalizer.java:32)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:190)

There for my guess would be problem with indexing the keywordField and also
problem related to closing the IndexWriter.

On Tue, Apr 7, 2015 at 6:13 PM, Ali Nazemian alinazem...@gmail.com wrote:

 Dear Upayavira,
 Hi,
 It is just the part of my code in which caused the problem. I know
 searchComponent is not for changing the index, but for the purpose of
 extracting document keywords I was forced to hack searchComponent for
 extracting keywords and putting them into index.
 For more information about why I chose searchComponent at the first place
 please follow this link:

 https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201503.mbox/browser

 Best regards.


 On Tue, Apr 7, 2015 at 5:30 PM, Upayavira u...@odoko.co.uk wrote:

 What are you trying to do? A search component is not intended for
 updating the index, so it really doesn’t surprise me that you aren’t
 seeing updates.

 I’d suggest you describe the problem you are trying to solve before
 proposing solutions.

 Upayavira


 On Tue, Apr 7, 2015, at 01:32 PM, Ali Nazemian wrote:
  I implement a small code for the purpose of extracting some keywords out
  of
  Lucene index. I did implement that using search component. My problem is
  when I tried to update Lucene IndexWriter, Solr index which is placed on
  top of that, does not affect. As you can see I did the commit part.
 
  BooleanQuery query = new BooleanQuery();
  for (String fieldName : keywordSourceFields) {
TermQuery termQuery = new TermQuery(new
 Term(fieldName,N/A));
query.add(termQuery, Occur.MUST_NOT);
  }
  TermQuery termQuery=new TermQuery(new Term(keywordField,
 N/A));
  query.add(termQuery, Occur.MUST);
  try {
//Query q= new QueryParser(keywordField, new

Re: What is the best way of Indexing different formats of documents?

2015-04-07 Thread Yavar Husain
Well have indexed heterogeneous sources including a variety of NoSQL's,
RDBMs and Rich Documents (PDF Word etc.) using SolrJ. The only prerequisite
of using SolrJ is that you should have an API to fetch data from your data
source (Say JDBC for RDBMS, Tika for extracting text content from rich
documents etc.) than SolrJ is so damn great and simple. Its as simple as
downloading the jar and few lines of code to send data to your solr server
after pre-processing your data. More details here:

http://lucidworks.com/blog/indexing-with-solrj/

https://wiki.apache.org/solr/Solrj

http://www.solrtutorial.com/solrj-tutorial.html

Cheers,
Yavar



On Tue, Apr 7, 2015 at 4:18 PM, sangeetha.subraman...@gtnexus.com 
sangeetha.subraman...@gtnexus.com wrote:

 Hi,

 I am a newbie to SOLR and basically from database background. We have a
 requirement of indexing files of different formats (x12,edifact, csv,xml).
 The files which are inputted can be of any format and we need to do a
 content based search on it.

 From the web I understand we can use TIKA processor to extract the content
 and store it in SOLR. What I want to know is, is there any better approach
 for indexing files in SOLR ? Can we index the document through streaming
 directly from the Application ? If so what is the disadvantage of using it
 (against DIH which fetches from the database)? Could someone share me some
 insight on this ? ls there any web links which I can refer to get some idea
 on it ? Please do help.

 Thanks
 Sangeetha




Re: Lucene indexWriter update does not affect Solr search

2015-04-07 Thread Ali Nazemian
Dear Upayavira,
Hi,
It is just the part of my code in which caused the problem. I know
searchComponent is not for changing the index, but for the purpose of
extracting document keywords I was forced to hack searchComponent for
extracting keywords and putting them into index.
For more information about why I chose searchComponent at the first place
please follow this link:
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201503.mbox/browser

Best regards.


On Tue, Apr 7, 2015 at 5:30 PM, Upayavira u...@odoko.co.uk wrote:

 What are you trying to do? A search component is not intended for
 updating the index, so it really doesn’t surprise me that you aren’t
 seeing updates.

 I’d suggest you describe the problem you are trying to solve before
 proposing solutions.

 Upayavira


 On Tue, Apr 7, 2015, at 01:32 PM, Ali Nazemian wrote:
  I implement a small code for the purpose of extracting some keywords out
  of
  Lucene index. I did implement that using search component. My problem is
  when I tried to update Lucene IndexWriter, Solr index which is placed on
  top of that, does not affect. As you can see I did the commit part.
 
  BooleanQuery query = new BooleanQuery();
  for (String fieldName : keywordSourceFields) {
TermQuery termQuery = new TermQuery(new Term(fieldName,N/A));
query.add(termQuery, Occur.MUST_NOT);
  }
  TermQuery termQuery=new TermQuery(new Term(keywordField, N/A));
  query.add(termQuery, Occur.MUST);
  try {
//Query q= new QueryParser(keywordField, new
  StandardAnalyzer()).parse(query.toString());
TopDocs results = searcher.search(query,
maxNumDocs);
ScoreDoc[] hits = results.scoreDocs;
IndexWriter writer = getLuceneIndexWriter(searcher.getPath());
for (int i = 0; i  hits.length; i++) {
  Document document = searcher.doc(hits[i].doc);
  ListString keywords = keyword.getKeywords(hits[i].doc);
  if(keywords.size()0) document.removeFields(keywordField);
  for (String word : keywords) {
document.add(new StringField(keywordField, word,
  Field.Store.YES));
  }
  String uniqueKey =
  searcher.getSchema().getUniqueKeyField().getName();
  writer.updateDocument(new Term(uniqueKey,
  document.get(uniqueKey)),
  document);
}
writer.commit();
writer.forceMerge(1);
writer.close();
  } catch (IOException | SyntaxError e) {
throw new RuntimeException();
  }
 
  Please help me through solving this problem.
 
  --
  A.Nazemian




-- 
A.Nazemian


DictionaryCompoundWordTokenFilterFactory - Dictionary/Compound-Words File

2015-04-07 Thread Mike L.

Solr User Group -

   I have a case where I need to be able to search against compound words, even 
when the user delimits with a space. (e.g. baseball = base ball).  I think 
I've solved this by creating a compound-words dictionary file containing the 
split words that I would want DictionaryCompoundWordTokenFilterFactory to split.
 base \n  
ball
I also applied in the synonym file the following rule: baseball = base ball  ( 
to allow baseball to also get a hit)
   filter class=solr.DictionaryCompoundWordTokenFilterFactory 
dictionary=compound-words.txt minWordSize=5 minSubwordSize=2 
maxSubwordSize=15 onlyLongestMatch=true/   
  
Two questions - If I could in advance figure out all the compound words I would 
want to split, would it be better (more reliable results) for me to maintain 
this compount-words file or would it be better to throw one of those open 
office dictionaries at it the filter?
Also - Any better suggestions to dealing with this problem vs the one I 
described using both the dictionary filter and the synonym rule?
Thanks in advance!
Mike



Re: What is the best way of Indexing different formats of documents?

2015-04-07 Thread Dan Davis
Sangeetha,

You can also run Tika directly from data import handler, and Data Import
Handler can be made to run several threads if you can partition the input
documents by directory or database id.   I've done 4 threads by having a
base configuration that does an Oracle query like this:

  SELECT * (SELECT id, url, ..., Modulo(rowNum, 4) as threadid FROM ...
WHERE ...) WHERE threadid = %d

A bash/sed script writes several data import handler XML files.
I can then index several threads at a time.

Each of these threads can then use all the transformers, e.g.
templateTransformer, etc.
XML can be transformed via XSLT.

The Data Import Handler has other entities that go out to the web and then
index the document via Tika.

If you are indexing generic HTML, you may want to figure out an approach to
SOLR-3808 and SOLR-2250 - this can be resolved by recompiling Solr and Tika
locally, because Boilerpipe has a bug that has been fixed, but not pushed
to Maven Central.   Without that, the ASF cannot include the fix, but
distributions such as LucidWorks Solr Enterprise can.

I can drop some configs into github.com if I clean them up to obfuscate
host names, passwords, and such.


On Tue, Apr 7, 2015 at 9:14 AM, Yavar Husain yavarhus...@gmail.com wrote:

 Well have indexed heterogeneous sources including a variety of NoSQL's,
 RDBMs and Rich Documents (PDF Word etc.) using SolrJ. The only prerequisite
 of using SolrJ is that you should have an API to fetch data from your data
 source (Say JDBC for RDBMS, Tika for extracting text content from rich
 documents etc.) than SolrJ is so damn great and simple. Its as simple as
 downloading the jar and few lines of code to send data to your solr server
 after pre-processing your data. More details here:

 http://lucidworks.com/blog/indexing-with-solrj/

 https://wiki.apache.org/solr/Solrj

 http://www.solrtutorial.com/solrj-tutorial.html

 Cheers,
 Yavar



 On Tue, Apr 7, 2015 at 4:18 PM, sangeetha.subraman...@gtnexus.com 
 sangeetha.subraman...@gtnexus.com wrote:

  Hi,
 
  I am a newbie to SOLR and basically from database background. We have a
  requirement of indexing files of different formats (x12,edifact,
 csv,xml).
  The files which are inputted can be of any format and we need to do a
  content based search on it.
 
  From the web I understand we can use TIKA processor to extract the
 content
  and store it in SOLR. What I want to know is, is there any better
 approach
  for indexing files in SOLR ? Can we index the document through streaming
  directly from the Application ? If so what is the disadvantage of using
 it
  (against DIH which fetches from the database)? Could someone share me
 some
  insight on this ? ls there any web links which I can refer to get some
 idea
  on it ? Please do help.
 
  Thanks
  Sangeetha
 
 



Re: What is the best way of Indexing different formats of documents?

2015-04-07 Thread Erick Erickson
The disadvantages of DIH are
1 it's a black box, debugging it isn't easy
2 it puts all the work on the Solr node. Parsing documents in various
forms can be pretty heavy-weight and steal cycles from indexing and
searching.
2a the extracting request handler also puts all the load on Solr FWIW.


Personally I prefer an external program (and I was gratified to see
Yavar's reference to the indexing with SolrJ article...). But then I'm
a Java programmer by training, so that seems easy...

Best,
Erick

On Tue, Apr 7, 2015 at 7:41 AM, Dan Davis dansm...@gmail.com wrote:
 Sangeetha,

 You can also run Tika directly from data import handler, and Data Import
 Handler can be made to run several threads if you can partition the input
 documents by directory or database id.   I've done 4 threads by having a
 base configuration that does an Oracle query like this:

   SELECT * (SELECT id, url, ..., Modulo(rowNum, 4) as threadid FROM ...
 WHERE ...) WHERE threadid = %d

 A bash/sed script writes several data import handler XML files.
 I can then index several threads at a time.

 Each of these threads can then use all the transformers, e.g.
 templateTransformer, etc.
 XML can be transformed via XSLT.

 The Data Import Handler has other entities that go out to the web and then
 index the document via Tika.

 If you are indexing generic HTML, you may want to figure out an approach to
 SOLR-3808 and SOLR-2250 - this can be resolved by recompiling Solr and Tika
 locally, because Boilerpipe has a bug that has been fixed, but not pushed
 to Maven Central.   Without that, the ASF cannot include the fix, but
 distributions such as LucidWorks Solr Enterprise can.

 I can drop some configs into github.com if I clean them up to obfuscate
 host names, passwords, and such.


 On Tue, Apr 7, 2015 at 9:14 AM, Yavar Husain yavarhus...@gmail.com wrote:

 Well have indexed heterogeneous sources including a variety of NoSQL's,
 RDBMs and Rich Documents (PDF Word etc.) using SolrJ. The only prerequisite
 of using SolrJ is that you should have an API to fetch data from your data
 source (Say JDBC for RDBMS, Tika for extracting text content from rich
 documents etc.) than SolrJ is so damn great and simple. Its as simple as
 downloading the jar and few lines of code to send data to your solr server
 after pre-processing your data. More details here:

 http://lucidworks.com/blog/indexing-with-solrj/

 https://wiki.apache.org/solr/Solrj

 http://www.solrtutorial.com/solrj-tutorial.html

 Cheers,
 Yavar



 On Tue, Apr 7, 2015 at 4:18 PM, sangeetha.subraman...@gtnexus.com 
 sangeetha.subraman...@gtnexus.com wrote:

  Hi,
 
  I am a newbie to SOLR and basically from database background. We have a
  requirement of indexing files of different formats (x12,edifact,
 csv,xml).
  The files which are inputted can be of any format and we need to do a
  content based search on it.
 
  From the web I understand we can use TIKA processor to extract the
 content
  and store it in SOLR. What I want to know is, is there any better
 approach
  for indexing files in SOLR ? Can we index the document through streaming
  directly from the Application ? If so what is the disadvantage of using
 it
  (against DIH which fetches from the database)? Could someone share me
 some
  insight on this ? ls there any web links which I can refer to get some
 idea
  on it ? Please do help.
 
  Thanks
  Sangeetha
 
 



Merge Two Fields in SOLR

2015-04-07 Thread EXTERNAL Taminidi Ravi (ETI, AA-AS/PAS-PTS)
Hi Group,

I am not sure if we have any easy way to merge two  fields data in One Field, 
the Copy field doesn’t works as it stores as Multivalued.

Can someone suggest any workaround to achieve this Use Case?

FirstName:ABC
SurName:XYZ

I need an Another Field with Name:ABCXYZ where I have to do at SOLR END.. as 
the Source Data is read only and no control to comibine.


Thanks

Ravi


Re: Problem with new solr.xml format and core swaps

2015-04-07 Thread Erick Erickson
Shawn:

I'm pretty clueless why you would be seeing this, and slammed with
other stuff so I can't dig into this right now.

What do the core.properties files look like when you see this? They
should be re-written when you swap cores. Hmmm, I wonder if there's
some condition where the files are already open and the persistence
fails? If so we should be logging that error, I have no proof either
way whether we are or not though.

Guessing that your log files in the problem case weren't all that
helpful, but let's have a look at them if this occurs again?

Sorry I can't be more help
Erick

On Mon, Apr 6, 2015 at 8:38 PM, Shawn Heisey apa...@elyograg.org wrote:
 On 4/6/2015 6:40 PM, Erick Erickson wrote:
 What version are you migrating _from_? 4.9.0? There were some
 persistence issues at one point, but AFAIK they were fixed by 4.9, I
 can check if you're on an earlier version...

 Effectively there is no previous version.  Whenever I upgrade, I delete
 all the data directories and completely reindex.  When I converted from
 the old solr.xml to core discovery, the server was already on 4.9.1.

 Thanks,
 Shawn



Re: Trouble GetSpans lucene 4

2015-04-07 Thread Compte Poubelle
Up.
Anyone?

Best regards.

 On 6 avr. 2015, at 21:32, Test Test andymish...@yahoo.fr wrote:
 
 Hi, 
 I'm working on TamingText's book.I try to upgrade the code from solr 3.6 to 
 solr 4.10.2.At the moment, i have a problem about the method 
 getSpans.spans.next() returns always false.Anyone can helps?
 SpanNearQuery sQuery = (SpanNearQuery) origQuery;SolrIndexSearcher searcher = 
 rb.req.getSearcher();IndexReader reader = 
 searcher.getIndexReader();//AtomicReader wrapper = 
 SlowCompositeReaderWrapper.wrap(reader);MapTerm, TermContext termContexts = 
 new HashMapTerm, TermContext();//Spans spans = 
 sQuery.getSpans(wrapper.getContext(), new 
 Bits.MatchAllBits(reader.numDocs()), termContexts);while (spans.next() == 
 true) {//}
 
 Thanks.Regards.
 


RE: Trouble GetSpans lucene 4

2015-04-07 Thread Allison, Timothy B.
What class is origQuery?

You will have to do more rewriting/calculation if you're trying to convert a 
PhraseQuery to a SpanNearQuery.

If you dig around in 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor in the Lucene 
highlighter package, you might get some inspiration.

I have a hack for converting regular queries to SpanQueries here (this is 
largely based on WeightedSpanTermExtractor):

https://github.com/tballison/lucene-addons/blob/master/lucene-5317/src/main/java/org/apache/lucene/search/spans/SimpleSpanQueryConverter.java
 

-Original Message-
From: Compte Poubelle [mailto:andymish...@yahoo.fr] 
Sent: Tuesday, April 07, 2015 1:53 PM
To: solr-user@lucene.apache.org
Subject: Re: Trouble GetSpans lucene 4

Up.
Anyone?

Best regards.

 On 6 avr. 2015, at 21:32, Test Test andymish...@yahoo.fr wrote:
 
 Hi, 
 I'm working on TamingText's book.I try to upgrade the code from solr 3.6 to 
 solr 4.10.2.At the moment, i have a problem about the method 
 getSpans.spans.next() returns always false.Anyone can helps?
 SpanNearQuery sQuery = (SpanNearQuery) origQuery;SolrIndexSearcher searcher = 
 rb.req.getSearcher();IndexReader reader = 
 searcher.getIndexReader();//AtomicReader wrapper = 
 SlowCompositeReaderWrapper.wrap(reader);MapTerm, TermContext termContexts = 
 new HashMapTerm, TermContext();//Spans spans = 
 sQuery.getSpans(wrapper.getContext(), new 
 Bits.MatchAllBits(reader.numDocs()), termContexts);while (spans.next() == 
 true) {//}
 
 Thanks.Regards.
 


Re: Merge Two Fields in SOLR

2015-04-07 Thread Damien Dykman
Ravi, what about using field aliasing at search time? Would that do the
trick for your use case?

http://localhost:8983/solr/mycollection/select?defType=edismaxq=name:john
doef.name.qf=firstname surname

For more details:
https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser

Damien

On 04/07/2015 10:21 AM, Erick Erickson wrote:
 I don't understand why copyField doesn't work. Admittedly the
 firstName and SurName would be separate tokens, but isn't that what
 you want? The fact that it's multiValued isn't really a problem,
 multiValued fields are really functionally identical to single valued
 fields if you set positionIncrementGap to... hmmm.. 1 or 0 I'm not
 quite sure which.

 Of course if your'e sorting by the field, that's a different story.

 Here's a discussion with several options, but I really wonder what
 your specific objection to copyField is, it's the simplest and on the
 surface it seems like it would work.

 http://lucene.472066.n3.nabble.com/Concat-2-fields-in-another-field-td4086786.html

 Best,
 Erick

 On Tue, Apr 7, 2015 at 10:08 AM, EXTERNAL Taminidi Ravi (ETI,
 AA-AS/PAS-PTS) external.ravi.tamin...@us.bosch.com wrote:
 Hi Group,

 I am not sure if we have any easy way to merge two  fields data in One 
 Field, the Copy field doesn’t works as it stores as Multivalued.

 Can someone suggest any workaround to achieve this Use Case?

 FirstName:ABC
 SurName:XYZ

 I need an Another Field with Name:ABCXYZ where I have to do at SOLR END.. as 
 the Source Data is read only and no control to comibine.


 Thanks

 Ravi



Re: Merge Two Fields in SOLR

2015-04-07 Thread Erick Erickson
I don't understand why copyField doesn't work. Admittedly the
firstName and SurName would be separate tokens, but isn't that what
you want? The fact that it's multiValued isn't really a problem,
multiValued fields are really functionally identical to single valued
fields if you set positionIncrementGap to... hmmm.. 1 or 0 I'm not
quite sure which.

Of course if your'e sorting by the field, that's a different story.

Here's a discussion with several options, but I really wonder what
your specific objection to copyField is, it's the simplest and on the
surface it seems like it would work.

http://lucene.472066.n3.nabble.com/Concat-2-fields-in-another-field-td4086786.html

Best,
Erick

On Tue, Apr 7, 2015 at 10:08 AM, EXTERNAL Taminidi Ravi (ETI,
AA-AS/PAS-PTS) external.ravi.tamin...@us.bosch.com wrote:
 Hi Group,

 I am not sure if we have any easy way to merge two  fields data in One Field, 
 the Copy field doesn’t works as it stores as Multivalued.

 Can someone suggest any workaround to achieve this Use Case?

 FirstName:ABC
 SurName:XYZ

 I need an Another Field with Name:ABCXYZ where I have to do at SOLR END.. as 
 the Source Data is read only and no control to comibine.


 Thanks

 Ravi


Re: Problem with new solr.xml format and core swaps

2015-04-07 Thread Shawn Heisey
On 4/7/2015 10:54 AM, Erick Erickson wrote:
 I'm pretty clueless why you would be seeing this, and slammed with
 other stuff so I can't dig into this right now.

 What do the core.properties files look like when you see this? They
 should be re-written when you swap cores. Hmmm, I wonder if there's
 some condition where the files are already open and the persistence
 fails? If so we should be logging that error, I have no proof either
 way whether we are or not though.

 Guessing that your log files in the problem case weren't all that
 helpful, but let's have a look at them if this occurs again?

I hadn't had a chance to review the logs, but when I did just now, I
found this:

ERROR - 2015-04-07 11:56:15.568;
org.apache.solr.core.CorePropertiesLocator; Couldn't persist core
properties to /index/solr4/cores/sparkinc_0/core.properties:
java.io.FileNotFoundException:
/index/solr4/cores/sparkinc_0/core.properties (Permission denied)

That's fairly clear.  I guess my permissions were wrong.  My best guess
as to why -- things owned by root from when I created the
core.properties files.  Solr does not run as root.  I didn't think to
actually look at the permissions before I ran a script that I maintain
which fixes all the ownership on my various directories involved in my
full search installation.

I don't think this explains the not-deleted segment files problem. 
Those segment files were written by solr running as the regular user, so
there couldn't have been a permission problem.

Thanks,
Shawn



Re: Trouble GetSpans lucene 4

2015-04-07 Thread Test Test
Re,
origQuery is a Query object, i got it from a ResponseBuilder object, passed by 
the method getQuery.
ResponseBuilder rb // it's method parameterQuery origQuery = rb.getQuery(); 
Thanks for the link, i'll keep you informed.
Regards,Andy


 Le Mardi 7 avril 2015 20h26, Allison, Timothy B. talli...@mitre.org a 
écrit :
   

 What class is origQuery?

You will have to do more rewriting/calculation if you're trying to convert a 
PhraseQuery to a SpanNearQuery.

If you dig around in 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor in the Lucene 
highlighter package, you might get some inspiration.

I have a hack for converting regular queries to SpanQueries here (this is 
largely based on WeightedSpanTermExtractor):

https://github.com/tballison/lucene-addons/blob/master/lucene-5317/src/main/java/org/apache/lucene/search/spans/SimpleSpanQueryConverter.java
 

-Original Message-
From: Compte Poubelle [mailto:andymish...@yahoo.fr] 
Sent: Tuesday, April 07, 2015 1:53 PM
To: solr-user@lucene.apache.org
Subject: Re: Trouble GetSpans lucene 4

Up.
Anyone?

Best regards.

 On 6 avr. 2015, at 21:32, Test Test andymish...@yahoo.fr wrote:
 
 Hi, 
 I'm working on TamingText's book.I try to upgrade the code from solr 3.6 to 
 solr 4.10.2.At the moment, i have a problem about the method 
 getSpans.spans.next() returns always false.Anyone can helps?
 SpanNearQuery sQuery = (SpanNearQuery) origQuery;SolrIndexSearcher searcher = 
 rb.req.getSearcher();IndexReader reader = 
 searcher.getIndexReader();//AtomicReader wrapper = 
 SlowCompositeReaderWrapper.wrap(reader);MapTerm, TermContext termContexts = 
 new HashMapTerm, TermContext();//Spans spans = 
 sQuery.getSpans(wrapper.getContext(), new 
 Bits.MatchAllBits(reader.numDocs()), termContexts);while (spans.next() == 
 true) {//}
 
 Thanks.Regards.
 

  

Re: Config join parse in solrconfig.xml

2015-04-07 Thread Frank li
Cool. It actually works after I removed those extra columns. Thanks for
your help.

On Mon, Apr 6, 2015 at 8:19 PM, Erick Erickson erickerick...@gmail.com
wrote:

 df does not allow multiple fields, it stands for default field, not
 default fields. To get what you're looking for, you need to use
 edismax or explicitly create the multiple clauses.

 I'm not quite sure what the join parser is doing with the df
 parameter. So my first question is what happens if you just use a
 single field for df?.

 Best,
 Erick

 On Mon, Apr 6, 2015 at 11:51 AM, Frank li fudon...@gmail.com wrote:
  The error message was from the query with debug=query.
 
  On Mon, Apr 6, 2015 at 11:49 AM, Frank li fudon...@gmail.com wrote:
 
  Hi Erick,
 
 
  Thanks for your response.
 
  Here is the query I am sending:
 
 
 http://dev-solr:8080/solr/collection1/select?q={!join+from=litigation_id_ls+to=lit_id_lms}all_text:applefq=type:PartyLawyerLawfirmfacet=truefacet.field=lawyer_id_lmsfacet.mincount=1rows=0
  
 http://dev-solr:8080/solr/collection1/select?q=%7B!join+from=litigation_id_ls+to=lit_id_lms%7Dall_text:applefq=type:PartyLawyerLawfirmfacet=truefacet.field=lawyer_id_lmsfacet.mincount=1rows=0
 
 
  You can see it has all_text:apple. I added field name all_text,
  because it gives error without it.
 
  Errors:
 
  lst name=errorstr name=msgundefined field all_text number party
  name all_code ent_name/strint name=code400/int/lst
 
 
  These fields are defined as the default search fields in our
  solr_config.xml file:
 
  str name=dfall_text number party name all_code ent_name/str
 
 
  Thanks,
 
  Fudong
 
  On Fri, Apr 3, 2015 at 1:31 PM, Erick Erickson erickerick...@gmail.com
 
  wrote:
 
  You have to show us several more things:
 
  1 what exactly does the query look like?
  2 what do you expect?
  3 output when you specify debug=query
  4 anything else that would help. You might review:
 
  http://wiki.apache.org/solr/UsingMailingLists
 
  Best,
  Erick
 
  On Fri, Apr 3, 2015 at 10:58 AM, Frank li fudon...@gmail.com wrote:
   Hi,
  
   I am starting using join parser with our solr. We have some default
  fields.
   They are defined in solrconfig.xml:
  
 lst name=defaults
  str name=defTypeedismax/str
  str name=echoParamsexplicit/str
  int name=rows10/int
  str name=dfall_text number party name all_code
 ent_name/str
  str name=qfall_text number^3 name^5 party^3 all_code^2
   ent_name^7/str
  str name=flid description market_sector_type parent
  ult_parent
   ent_name title patent_title *_ls *_lms *_is *_texts *_ac *_as *_s
 *_ss
  *_ds
   *_sms *_ss *_bs/str
  str name=q.opAND/str
/lst
  
  
   I found out once I use join parser, it does not recognize the default
   fields any more. How do I modify the configuration for this?
  
   Thanks,
  
   Fred
 
 
 



RE: Trouble GetSpans lucene 4

2015-04-07 Thread Allison, Timothy B.
Oh, ok, if that's just a regular query, you will need to convert it to a 
SpanQuery, and you may need to rewrite the SpanQuery after conversion.

If you're trying to do a concordance or trying to retrieve windows around the 
hits, take a look at ConcordanceSearcher within: 
https://github.com/tballison/lucene-addons/tree/master/lucene-5317 .

With any luck, I should find the time to get back to the Solr wrapper under 
solr-5411 that Jason Robinson initially developed.  

-Original Message-
From: Test Test [mailto:andymish...@yahoo.fr] 
Sent: Tuesday, April 07, 2015 3:51 PM
To: solr-user@lucene.apache.org
Subject: Re: Trouble GetSpans lucene 4

Re,
origQuery is a Query object, i got it from a ResponseBuilder object, passed by 
the method getQuery.
ResponseBuilder rb // it's method parameterQuery origQuery = rb.getQuery(); 
Thanks for the link, i'll keep you informed.
Regards,Andy


 Le Mardi 7 avril 2015 20h26, Allison, Timothy B. talli...@mitre.org a 
écrit :
   

 What class is origQuery?

You will have to do more rewriting/calculation if you're trying to convert a 
PhraseQuery to a SpanNearQuery.

If you dig around in 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor in the Lucene 
highlighter package, you might get some inspiration.

I have a hack for converting regular queries to SpanQueries here (this is 
largely based on WeightedSpanTermExtractor):

https://github.com/tballison/lucene-addons/blob/master/lucene-5317/src/main/java/org/apache/lucene/search/spans/SimpleSpanQueryConverter.java
 

-Original Message-
From: Compte Poubelle [mailto:andymish...@yahoo.fr] 
Sent: Tuesday, April 07, 2015 1:53 PM
To: solr-user@lucene.apache.org
Subject: Re: Trouble GetSpans lucene 4

Up.
Anyone?

Best regards.

 On 6 avr. 2015, at 21:32, Test Test andymish...@yahoo.fr wrote:
 
 Hi, 
 I'm working on TamingText's book.I try to upgrade the code from solr 3.6 to 
 solr 4.10.2.At the moment, i have a problem about the method 
 getSpans.spans.next() returns always false.Anyone can helps?
 SpanNearQuery sQuery = (SpanNearQuery) origQuery;SolrIndexSearcher searcher = 
 rb.req.getSearcher();IndexReader reader = 
 searcher.getIndexReader();//AtomicReader wrapper = 
 SlowCompositeReaderWrapper.wrap(reader);MapTerm, TermContext termContexts = 
 new HashMapTerm, TermContext();//Spans spans = 
 sQuery.getSpans(wrapper.getContext(), new 
 Bits.MatchAllBits(reader.numDocs()), termContexts);while (spans.next() == 
 true) {//}
 
 Thanks.Regards.
 

  


How to trace error records during POST?

2015-04-07 Thread Simon Cheng
Good morning,

I used Solr 4.7 to post 186,745 XML files and 186,622 files have been
indexed. That means there are 123 XML files with errors. How can I trace
what these files are?

Thank you in advance,
Simon Cheng.


Re: Deploying multiple ZooKeeper ensemble on a single machine

2015-04-07 Thread nutchsolruser
 I have to choose unique client port #’s for each.   Here I can see that you
have same client port for all 3 servers. 

You can refer  this http://myjeeva.com/zookeeper-cluster-setup.html  link. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Deploying-multiple-ZooKeeper-ensemble-on-a-single-machine-tp4198272p4198279.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Deploying multiple ZooKeeper ensemble on a single machine

2015-04-07 Thread Shawn Heisey
On 4/7/2015 9:16 PM, Zheng Lin Edwin Yeo wrote:
 I'm using SolrCloud 5.0.0 and ZooKeeper 3.4.6 running on Windows, and now
 I'm trying to deploy a multiple ZooKeeper ensemble (3 servers) on a single
 machine. These are the settings which I have configured, according to the
 Solr Reference Guide.
 
 These files are under ZOOKEEPER_HOME\conf\ directory
 (C:\Users\edwin\zookeeper-3.4.6\conf)
 
 *zoo.cfg*
 tickTime=2000
 initLimit=10
 syncLimit=5
 dataDir=C:\\Users\\edwin\\zookeeper-3.4.6\\1
 clientPort=2181
 server.1=localhost:2888:3888
 server.2=localhost:2889:3889
 server.3=localhost:2890:3890

snip

  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot
 open channel to 2 at election address localhost/127.0.0.1:3889
 java.net.ConnectException: Connection refused: connect

The first thing I would suspect when running any network program on a
Windows machine that won't communicate is the Windows firewall, unless
you have either turned off the firewall or you have explicitly
configured an exception in the firewall for the relevant ports.

Your other reply that you got from nutchsolruser does point out that all
three zookeeper configs are using 2181 as the clientPort.  Because these
are all running on the same machine, you must use a different port for
each one.  I'm not sure what happens to subsequent processes after the
first one starts, but they won't work even if they do manage to start.

Thanks,
Shawn



Deploying multiple ZooKeeper ensemble on a single machine

2015-04-07 Thread Zheng Lin Edwin Yeo
Hi,

I'm using SolrCloud 5.0.0 and ZooKeeper 3.4.6 running on Windows, and now
I'm trying to deploy a multiple ZooKeeper ensemble (3 servers) on a single
machine. These are the settings which I have configured, according to the
Solr Reference Guide.

These files are under ZOOKEEPER_HOME\conf\ directory
(C:\Users\edwin\zookeeper-3.4.6\conf)

*zoo.cfg*
tickTime=2000
initLimit=10
syncLimit=5
dataDir=C:\\Users\\edwin\\zookeeper-3.4.6\\1
clientPort=2181
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890


*zoo2.cfg*
tickTime=2000
initLimit=10
syncLimit=5
dataDir=C:\\Users\\edwin\\zookeeper-3.4.6\\2
clientPort=2181
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890


*zoo3.cfg*
tickTime=2000
initLimit=10
syncLimit=5
dataDir=C:\\Users\\edwin\\zookeeper-3.4.6\\3
clientPort=2181
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890


I have also created the myid file at the respective dataDir location for
each of the 3 servers.

- At C:\Users\edwin\zookeeper-3.4.6\1, the myid file contains just the
number 1
- At C:\Users\edwin\zookeeper-3.4.6\2, the myid file contains just the
number 2
- At C:\Users\edwin\zookeeper-3.4.6\3, the myid file contains just the
number 3


However, I'm getting the following error when I run zkServer.cmd

2015-04-08 10:54:17,097 [myid:1] - DEBUG
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@412] - Queue
size: 1
2015-04-08 10:54:17,097 [myid:1] - DEBUG
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@412] - Queue
size: 1
2015-04-08 10:54:17,097 [myid:1] - DEBUG
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@364] - Opening
channel to server 2
2015-04-08 10:54:18,097 [myid:1] - WARN
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot
open channel to 2 at election address localhost/127.0.0.1:3889
java.net.ConnectException: Connection refused: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at
org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2015-04-08 10:54:18,099 [myid:1] - DEBUG
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@364] - Opening
channel to server 3
2015-04-08 10:54:19,099 [myid:1] - WARN
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot
open channel to 3 at election address localhost/127.0.0.1:3890
java.net.ConnectException: Connection refused: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at
org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2015-04-08 10:54:19,101 [myid:1] - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 3200



Is there anything which I could have set wrongly?


Regards,
Edwin