Re: Injecting synonymns into Solr

2015-05-04 Thread Zheng Lin Edwin Yeo
Would like to check, will this method of splitting the synonyms into
multiple files use up a lot of memory?

I'm trying it with about 10 files and that collection is not able to be
loaded due to insufficient memory.

Although currently my machine only have 4GB of memory, but I only have
500,000 records indexed, so not sure if there's a significant impact in the
future (even with larger memory) when my index grows and other things like
faceting, highlighting, and carrot tools are implemented.

Regards,
Edwin



On 1 May 2015 at 11:08, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote:

 Thank you for the info. Yup this works. I found out that we can't load
 files that are more than 1MB into zookeeper, as it happens to any files
 that's larger than 1MB in size, not just the synonyms files.
 But I'm not sure if there will be an impact to the system, as the number
 of synonym text file can potentially grow up to more than 20 since my
 sample synonym file size is more than 20MB.

 Currently I only have less than 500,000 records indexed in Solr, so not
 sure if there will be a significant impact as compared to one which has
 millions of records.
 Will try to get more records indexed and will update here again.

 Regards,
 Edwin


 On 1 May 2015 at 08:17, Philippe Soares soa...@genomequest.com wrote:

 Split your synonyms into multiple files and set the SynonymFilterFactory
 with a coma-separated list of files. e.g. :
 synonyms=syn1.txt,syn2.txt,syn3.txt

 On Thu, Apr 30, 2015 at 8:07 PM, Zheng Lin Edwin Yeo 
 edwinye...@gmail.com
 wrote:

  Just to populate it with the general synonym words. I've managed to
  populate it with some source online, but is there a limit to what it can
  contains?
 
  I can't load the configuration into zookeeper if the synonyms.txt file
  contains more than 2100 lines.
 
  Regards,
  Edwin
  On 1 May 2015 05:44, Chris Hostetter hossman_luc...@fucit.org
 wrote:
 
  
   : There is a possible solution here:
   : https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet to
  SOLR
   : Synonym format).
  
   If you have WordNet synonyms you do't need any special code/tools to
   convert them -- the current solr.SynonymFilterFactory supports wordnet
   files (just specify format=wordnet)
  
  
   :   Does anyone knows any faster method of populating the
 synonyms.txt
   file
   :   instead of manually typing in the words into the file, which
 there
   could
   :  be
   :   thousands of synonyms around?
  
   populate from what?  what is hte source of your data?
  
   the default solr synonym file format is about as simple as it could
   possibly be -- pretty trivial to generate it from scripts -- the hard
  part
   is usually selecting the synonym data you want to use and parsing
  whatever
   format it is already in.
  
  
  
   -Hoss
   http://www.lucidworks.com/
  
 





Re: Injecting synonymns into Solr

2015-05-04 Thread Roman Chyla
It shouldn't matter.  Btw try a url instead of a file path. I think the
underlying loading mechanism uses java File , it could work.
On May 4, 2015 2:07 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote:

 Would like to check, will this method of splitting the synonyms into
 multiple files use up a lot of memory?

 I'm trying it with about 10 files and that collection is not able to be
 loaded due to insufficient memory.

 Although currently my machine only have 4GB of memory, but I only have
 500,000 records indexed, so not sure if there's a significant impact in the
 future (even with larger memory) when my index grows and other things like
 faceting, highlighting, and carrot tools are implemented.

 Regards,
 Edwin



 On 1 May 2015 at 11:08, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote:

  Thank you for the info. Yup this works. I found out that we can't load
  files that are more than 1MB into zookeeper, as it happens to any files
  that's larger than 1MB in size, not just the synonyms files.
  But I'm not sure if there will be an impact to the system, as the number
  of synonym text file can potentially grow up to more than 20 since my
  sample synonym file size is more than 20MB.
 
  Currently I only have less than 500,000 records indexed in Solr, so not
  sure if there will be a significant impact as compared to one which has
  millions of records.
  Will try to get more records indexed and will update here again.
 
  Regards,
  Edwin
 
 
  On 1 May 2015 at 08:17, Philippe Soares soa...@genomequest.com wrote:
 
  Split your synonyms into multiple files and set the SynonymFilterFactory
  with a coma-separated list of files. e.g. :
  synonyms=syn1.txt,syn2.txt,syn3.txt
 
  On Thu, Apr 30, 2015 at 8:07 PM, Zheng Lin Edwin Yeo 
  edwinye...@gmail.com
  wrote:
 
   Just to populate it with the general synonym words. I've managed to
   populate it with some source online, but is there a limit to what it
 can
   contains?
  
   I can't load the configuration into zookeeper if the synonyms.txt file
   contains more than 2100 lines.
  
   Regards,
   Edwin
   On 1 May 2015 05:44, Chris Hostetter hossman_luc...@fucit.org
  wrote:
  
   
: There is a possible solution here:
: https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet
 to
   SOLR
: Synonym format).
   
If you have WordNet synonyms you do't need any special code/tools to
convert them -- the current solr.SynonymFilterFactory supports
 wordnet
files (just specify format=wordnet)
   
   
:   Does anyone knows any faster method of populating the
  synonyms.txt
file
:   instead of manually typing in the words into the file, which
  there
could
:  be
:   thousands of synonyms around?
   
populate from what?  what is hte source of your data?
   
the default solr synonym file format is about as simple as it could
possibly be -- pretty trivial to generate it from scripts -- the
 hard
   part
is usually selecting the synonym data you want to use and parsing
   whatever
format it is already in.
   
   
   
-Hoss
http://www.lucidworks.com/
   
  
 
 
 



Re: Injecting synonymns into Solr

2015-05-04 Thread Shawn Heisey
On 5/4/2015 12:07 AM, Zheng Lin Edwin Yeo wrote:
 Would like to check, will this method of splitting the synonyms into
 multiple files use up a lot of memory?
 
 I'm trying it with about 10 files and that collection is not able to be
 loaded due to insufficient memory.
 
 Although currently my machine only have 4GB of memory, but I only have
 500,000 records indexed, so not sure if there's a significant impact in the
 future (even with larger memory) when my index grows and other things like
 faceting, highlighting, and carrot tools are implemented.

For Solr, depending on exactly how you use it, the number of docs, and
the nature of those docs, a 4GB machine will usually be considered quite
small.  Solr requires a big chunk of RAM for its heap, but usually
requires an even larger chunk of RAM for the OS disk cache.

My Solr machines have 64GB (as much as the servers can hold) and I wish
they had two or four times as much, so I could get better performance.
My larger indexes (155 million docs, 103 million docs, and 18 million
docs, not using SolrCloud) are NOT considered very large by this
community -- we have users wrangling billions of docs with SolrCloud,
using hundreds of servers.

On this Wiki page, I have tried to outline how various aspects of memory
can affect Solr performance:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn



Re: Injecting synonymns into Solr

2015-05-04 Thread Zheng Lin Edwin Yeo
Yes, the underlying mechanism uses java. But the collection isn't able to
load when the Solr starts up, so it didn't return anything even if I use
url.
Is it just due to my machine not having enough memory?

Regards,
Edwin
On 4 May 2015 20:12, Roman Chyla roman.ch...@gmail.com wrote:

 It shouldn't matter.  Btw try a url instead of a file path. I think the
 underlying loading mechanism uses java File , it could work.
 On May 4, 2015 2:07 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com
 wrote:

  Would like to check, will this method of splitting the synonyms into
  multiple files use up a lot of memory?
 
  I'm trying it with about 10 files and that collection is not able to be
  loaded due to insufficient memory.
 
  Although currently my machine only have 4GB of memory, but I only have
  500,000 records indexed, so not sure if there's a significant impact in
 the
  future (even with larger memory) when my index grows and other things
 like
  faceting, highlighting, and carrot tools are implemented.
 
  Regards,
  Edwin
 
 
 
  On 1 May 2015 at 11:08, Zheng Lin Edwin Yeo edwinye...@gmail.com
 wrote:
 
   Thank you for the info. Yup this works. I found out that we can't load
   files that are more than 1MB into zookeeper, as it happens to any files
   that's larger than 1MB in size, not just the synonyms files.
   But I'm not sure if there will be an impact to the system, as the
 number
   of synonym text file can potentially grow up to more than 20 since my
   sample synonym file size is more than 20MB.
  
   Currently I only have less than 500,000 records indexed in Solr, so not
   sure if there will be a significant impact as compared to one which has
   millions of records.
   Will try to get more records indexed and will update here again.
  
   Regards,
   Edwin
  
  
   On 1 May 2015 at 08:17, Philippe Soares soa...@genomequest.com
 wrote:
  
   Split your synonyms into multiple files and set the
 SynonymFilterFactory
   with a coma-separated list of files. e.g. :
   synonyms=syn1.txt,syn2.txt,syn3.txt
  
   On Thu, Apr 30, 2015 at 8:07 PM, Zheng Lin Edwin Yeo 
   edwinye...@gmail.com
   wrote:
  
Just to populate it with the general synonym words. I've managed to
populate it with some source online, but is there a limit to what it
  can
contains?
   
I can't load the configuration into zookeeper if the synonyms.txt
 file
contains more than 2100 lines.
   
Regards,
Edwin
On 1 May 2015 05:44, Chris Hostetter hossman_luc...@fucit.org
   wrote:
   

 : There is a possible solution here:
 : https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet
  to
SOLR
 : Synonym format).

 If you have WordNet synonyms you do't need any special code/tools
 to
 convert them -- the current solr.SynonymFilterFactory supports
  wordnet
 files (just specify format=wordnet)


 :   Does anyone knows any faster method of populating the
   synonyms.txt
 file
 :   instead of manually typing in the words into the file, which
   there
 could
 :  be
 :   thousands of synonyms around?

 populate from what?  what is hte source of your data?

 the default solr synonym file format is about as simple as it
 could
 possibly be -- pretty trivial to generate it from scripts -- the
  hard
part
 is usually selecting the synonym data you want to use and parsing
whatever
 format it is already in.



 -Hoss
 http://www.lucidworks.com/

   
  
  
  
 



Re: Injecting synonymns into Solr

2015-04-30 Thread Vincenzo D'Amore
Which version of solr?

On Thu, Apr 30, 2015 at 9:58 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com
wrote:

 Hi,

 Does anyone knows any faster method of populating the synonyms.txt file
 instead of manually typing in the words into the file, which there could be
 thousands of synonyms around?

 Regards,
 Edwin




-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: Injecting synonymns into Solr

2015-04-30 Thread Kaushik
I am facing the same problem; currently I am resorting to a custom program
to create this file. Hopefully there is a better solution out there.

Thanks,
Kaushik

On Thu, Apr 30, 2015 at 3:58 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com
wrote:

 Hi,

 Does anyone knows any faster method of populating the synonyms.txt file
 instead of manually typing in the words into the file, which there could be
 thousands of synonyms around?

 Regards,
 Edwin



Re: Injecting synonymns into Solr

2015-04-30 Thread Scott Dawson
There is a possible solution here:
https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet to SOLR
Synonym format).

I don't have personal experience with it. I only know about it because it's
mentioned on page 184 of the 'Solr in Action' book by Trey Grainger and
Timothy Potter.

Maybe someone out there knows more about it and can provide more
information.

Regards,
Scott

On Thu, Apr 30, 2015 at 9:45 AM, Kaushik kaushika...@gmail.com wrote:

 I am facing the same problem; currently I am resorting to a custom program
 to create this file. Hopefully there is a better solution out there.

 Thanks,
 Kaushik

 On Thu, Apr 30, 2015 at 3:58 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com
 
 wrote:

  Hi,
 
  Does anyone knows any faster method of populating the synonyms.txt file
  instead of manually typing in the words into the file, which there could
 be
  thousands of synonyms around?
 
  Regards,
  Edwin
 



Re: Injecting synonymns into Solr

2015-04-30 Thread Zheng Lin Edwin Yeo
I'm using Solr-5.0.0 and ZooKeeper-3.4.6.

I've gotton some samples from the Moby Treasure List
http://www.gutenberg.org/catalog/world/results?title=moby+list to try it
out.

However, currently I can only have up to around 2100 lines in my
synonyms.txt in when I load the configuration into ZooKeeper. Beyond which
I'll get the following errors.


2015-05-01 00:21:55,146 [myid:1] - DEBUG
[CommitProcessor:1:FinalRequestProcessor@160] - sessionid:0x14d0b1ffa860002
type:exists cxid:0x148 zxid:0xfffe txntype:unknown
reqpath:/configs/collection1/synonyms.txt
2015-05-01 00:21:55,287 [myid:1] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of
session 0x14d0b1ffa860002 due to java.io.IOException: Len error 24896062
2015-05-01 00:21:55,287 [myid:1] - DEBUG [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@366] - IOException stack trace
java.io.IOException: Len error 24896062
at
org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:928)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Unknown Source)
2015-05-01 00:21:55,288 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
client /127.0.0.1:56818 which had sessionid 0x14d0b1ffa860002
2015-05-01 00:21:55,578 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection
from /0:0:0:0:0:0:0:1:56819
2015-05-01 00:21:55,579 [myid:1] - DEBUG [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@810] - Session establishment request
from client /0:0:0:0:0:0:0:1:56819 client's lastZxid is 0x1800a7
2015-05-01 00:21:55,579 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861] - Client attempting to renew
session 0x14d0b1ffa860002 at /0:0:0:0:0:0:0:1:56819
2015-05-01 00:21:55,580 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating client: 0x14d0b1ffa860002
2015-05-01 00:21:55,582 [myid:1] - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@617] -
Established session 0x14d0b1ffa860002 with negotiated timeout 3 for
client /0:0:0:0:0:0:0:1:56819


Is there a limit to how much data that can be contain in the synonyms.txt
file?
The searching on the synonyms using the solr.SyynonymFilterFactory works
fine if I have less than 2100 records in synonyms.txt file, but beyond
which it cannot be loaded into ZooKeeper.


Regards,
Edwin


On 30 April 2015 at 22:42, Vincenzo D'Amore v.dam...@gmail.com wrote:

 Which version of solr?

 On Thu, Apr 30, 2015 at 9:58 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com
 
 wrote:

  Hi,
 
  Does anyone knows any faster method of populating the synonyms.txt file
  instead of manually typing in the words into the file, which there could
 be
  thousands of synonyms around?
 
  Regards,
  Edwin
 



 --
 Vincenzo D'Amore
 email: v.dam...@gmail.com
 skype: free.dev
 mobile: +39 349 8513251



Re: Injecting synonymns into Solr

2015-04-30 Thread Zheng Lin Edwin Yeo
Just to populate it with the general synonym words. I've managed to
populate it with some source online, but is there a limit to what it can
contains?

I can't load the configuration into zookeeper if the synonyms.txt file
contains more than 2100 lines.

Regards,
Edwin
On 1 May 2015 05:44, Chris Hostetter hossman_luc...@fucit.org wrote:


 : There is a possible solution here:
 : https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet to SOLR
 : Synonym format).

 If you have WordNet synonyms you do't need any special code/tools to
 convert them -- the current solr.SynonymFilterFactory supports wordnet
 files (just specify format=wordnet)


 :   Does anyone knows any faster method of populating the synonyms.txt
 file
 :   instead of manually typing in the words into the file, which there
 could
 :  be
 :   thousands of synonyms around?

 populate from what?  what is hte source of your data?

 the default solr synonym file format is about as simple as it could
 possibly be -- pretty trivial to generate it from scripts -- the hard part
 is usually selecting the synonym data you want to use and parsing whatever
 format it is already in.



 -Hoss
 http://www.lucidworks.com/



Re: Injecting synonymns into Solr

2015-04-30 Thread Zheng Lin Edwin Yeo
Thank you for the info. Yup this works. I found out that we can't load
files that are more than 1MB into zookeeper, as it happens to any files
that's larger than 1MB in size, not just the synonyms files.
But I'm not sure if there will be an impact to the system, as the number of
synonym text file can potentially grow up to more than 20 since my sample
synonym file size is more than 20MB.

Currently I only have less than 500,000 records indexed in Solr, so not
sure if there will be a significant impact as compared to one which has
millions of records.
Will try to get more records indexed and will update here again.

Regards,
Edwin


On 1 May 2015 at 08:17, Philippe Soares soa...@genomequest.com wrote:

 Split your synonyms into multiple files and set the SynonymFilterFactory
 with a coma-separated list of files. e.g. :
 synonyms=syn1.txt,syn2.txt,syn3.txt

 On Thu, Apr 30, 2015 at 8:07 PM, Zheng Lin Edwin Yeo edwinye...@gmail.com
 
 wrote:

  Just to populate it with the general synonym words. I've managed to
  populate it with some source online, but is there a limit to what it can
  contains?
 
  I can't load the configuration into zookeeper if the synonyms.txt file
  contains more than 2100 lines.
 
  Regards,
  Edwin
  On 1 May 2015 05:44, Chris Hostetter hossman_luc...@fucit.org wrote:
 
  
   : There is a possible solution here:
   : https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet to
  SOLR
   : Synonym format).
  
   If you have WordNet synonyms you do't need any special code/tools to
   convert them -- the current solr.SynonymFilterFactory supports wordnet
   files (just specify format=wordnet)
  
  
   :   Does anyone knows any faster method of populating the
 synonyms.txt
   file
   :   instead of manually typing in the words into the file, which
 there
   could
   :  be
   :   thousands of synonyms around?
  
   populate from what?  what is hte source of your data?
  
   the default solr synonym file format is about as simple as it could
   possibly be -- pretty trivial to generate it from scripts -- the hard
  part
   is usually selecting the synonym data you want to use and parsing
  whatever
   format it is already in.
  
  
  
   -Hoss
   http://www.lucidworks.com/
  
 



Re: Injecting synonymns into Solr

2015-04-30 Thread Philippe Soares
Split your synonyms into multiple files and set the SynonymFilterFactory
with a coma-separated list of files. e.g. :
synonyms=syn1.txt,syn2.txt,syn3.txt

On Thu, Apr 30, 2015 at 8:07 PM, Zheng Lin Edwin Yeo edwinye...@gmail.com
wrote:

 Just to populate it with the general synonym words. I've managed to
 populate it with some source online, but is there a limit to what it can
 contains?

 I can't load the configuration into zookeeper if the synonyms.txt file
 contains more than 2100 lines.

 Regards,
 Edwin
 On 1 May 2015 05:44, Chris Hostetter hossman_luc...@fucit.org wrote:

 
  : There is a possible solution here:
  : https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet to
 SOLR
  : Synonym format).
 
  If you have WordNet synonyms you do't need any special code/tools to
  convert them -- the current solr.SynonymFilterFactory supports wordnet
  files (just specify format=wordnet)
 
 
  :   Does anyone knows any faster method of populating the synonyms.txt
  file
  :   instead of manually typing in the words into the file, which there
  could
  :  be
  :   thousands of synonyms around?
 
  populate from what?  what is hte source of your data?
 
  the default solr synonym file format is about as simple as it could
  possibly be -- pretty trivial to generate it from scripts -- the hard
 part
  is usually selecting the synonym data you want to use and parsing
 whatever
  format it is already in.
 
 
 
  -Hoss
  http://www.lucidworks.com/
 



Re: Injecting synonymns into Solr

2015-04-30 Thread Chris Hostetter

: There is a possible solution here:
: https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet to SOLR
: Synonym format).

If you have WordNet synonyms you do't need any special code/tools to 
convert them -- the current solr.SynonymFilterFactory supports wordnet 
files (just specify format=wordnet)


:   Does anyone knows any faster method of populating the synonyms.txt file
:   instead of manually typing in the words into the file, which there could
:  be
:   thousands of synonyms around?

populate from what?  what is hte source of your data?

the default solr synonym file format is about as simple as it could 
possibly be -- pretty trivial to generate it from scripts -- the hard part 
is usually selecting the synonym data you want to use and parsing whatever 
format it is already in.



-Hoss
http://www.lucidworks.com/