Re: Injecting synonymns into Solr
Would like to check, will this method of splitting the synonyms into multiple files use up a lot of memory? I'm trying it with about 10 files and that collection is not able to be loaded due to insufficient memory. Although currently my machine only have 4GB of memory, but I only have 500,000 records indexed, so not sure if there's a significant impact in the future (even with larger memory) when my index grows and other things like faceting, highlighting, and carrot tools are implemented. Regards, Edwin On 1 May 2015 at 11:08, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Thank you for the info. Yup this works. I found out that we can't load files that are more than 1MB into zookeeper, as it happens to any files that's larger than 1MB in size, not just the synonyms files. But I'm not sure if there will be an impact to the system, as the number of synonym text file can potentially grow up to more than 20 since my sample synonym file size is more than 20MB. Currently I only have less than 500,000 records indexed in Solr, so not sure if there will be a significant impact as compared to one which has millions of records. Will try to get more records indexed and will update here again. Regards, Edwin On 1 May 2015 at 08:17, Philippe Soares soa...@genomequest.com wrote: Split your synonyms into multiple files and set the SynonymFilterFactory with a coma-separated list of files. e.g. : synonyms=syn1.txt,syn2.txt,syn3.txt On Thu, Apr 30, 2015 at 8:07 PM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Just to populate it with the general synonym words. I've managed to populate it with some source online, but is there a limit to what it can contains? I can't load the configuration into zookeeper if the synonyms.txt file contains more than 2100 lines. Regards, Edwin On 1 May 2015 05:44, Chris Hostetter hossman_luc...@fucit.org wrote: : There is a possible solution here: : https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet to SOLR : Synonym format). If you have WordNet synonyms you do't need any special code/tools to convert them -- the current solr.SynonymFilterFactory supports wordnet files (just specify format=wordnet) : Does anyone knows any faster method of populating the synonyms.txt file : instead of manually typing in the words into the file, which there could : be : thousands of synonyms around? populate from what? what is hte source of your data? the default solr synonym file format is about as simple as it could possibly be -- pretty trivial to generate it from scripts -- the hard part is usually selecting the synonym data you want to use and parsing whatever format it is already in. -Hoss http://www.lucidworks.com/
Re: Injecting synonymns into Solr
It shouldn't matter. Btw try a url instead of a file path. I think the underlying loading mechanism uses java File , it could work. On May 4, 2015 2:07 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Would like to check, will this method of splitting the synonyms into multiple files use up a lot of memory? I'm trying it with about 10 files and that collection is not able to be loaded due to insufficient memory. Although currently my machine only have 4GB of memory, but I only have 500,000 records indexed, so not sure if there's a significant impact in the future (even with larger memory) when my index grows and other things like faceting, highlighting, and carrot tools are implemented. Regards, Edwin On 1 May 2015 at 11:08, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Thank you for the info. Yup this works. I found out that we can't load files that are more than 1MB into zookeeper, as it happens to any files that's larger than 1MB in size, not just the synonyms files. But I'm not sure if there will be an impact to the system, as the number of synonym text file can potentially grow up to more than 20 since my sample synonym file size is more than 20MB. Currently I only have less than 500,000 records indexed in Solr, so not sure if there will be a significant impact as compared to one which has millions of records. Will try to get more records indexed and will update here again. Regards, Edwin On 1 May 2015 at 08:17, Philippe Soares soa...@genomequest.com wrote: Split your synonyms into multiple files and set the SynonymFilterFactory with a coma-separated list of files. e.g. : synonyms=syn1.txt,syn2.txt,syn3.txt On Thu, Apr 30, 2015 at 8:07 PM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Just to populate it with the general synonym words. I've managed to populate it with some source online, but is there a limit to what it can contains? I can't load the configuration into zookeeper if the synonyms.txt file contains more than 2100 lines. Regards, Edwin On 1 May 2015 05:44, Chris Hostetter hossman_luc...@fucit.org wrote: : There is a possible solution here: : https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet to SOLR : Synonym format). If you have WordNet synonyms you do't need any special code/tools to convert them -- the current solr.SynonymFilterFactory supports wordnet files (just specify format=wordnet) : Does anyone knows any faster method of populating the synonyms.txt file : instead of manually typing in the words into the file, which there could : be : thousands of synonyms around? populate from what? what is hte source of your data? the default solr synonym file format is about as simple as it could possibly be -- pretty trivial to generate it from scripts -- the hard part is usually selecting the synonym data you want to use and parsing whatever format it is already in. -Hoss http://www.lucidworks.com/
Re: Injecting synonymns into Solr
On 5/4/2015 12:07 AM, Zheng Lin Edwin Yeo wrote: Would like to check, will this method of splitting the synonyms into multiple files use up a lot of memory? I'm trying it with about 10 files and that collection is not able to be loaded due to insufficient memory. Although currently my machine only have 4GB of memory, but I only have 500,000 records indexed, so not sure if there's a significant impact in the future (even with larger memory) when my index grows and other things like faceting, highlighting, and carrot tools are implemented. For Solr, depending on exactly how you use it, the number of docs, and the nature of those docs, a 4GB machine will usually be considered quite small. Solr requires a big chunk of RAM for its heap, but usually requires an even larger chunk of RAM for the OS disk cache. My Solr machines have 64GB (as much as the servers can hold) and I wish they had two or four times as much, so I could get better performance. My larger indexes (155 million docs, 103 million docs, and 18 million docs, not using SolrCloud) are NOT considered very large by this community -- we have users wrangling billions of docs with SolrCloud, using hundreds of servers. On this Wiki page, I have tried to outline how various aspects of memory can affect Solr performance: http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn
Re: Injecting synonymns into Solr
Yes, the underlying mechanism uses java. But the collection isn't able to load when the Solr starts up, so it didn't return anything even if I use url. Is it just due to my machine not having enough memory? Regards, Edwin On 4 May 2015 20:12, Roman Chyla roman.ch...@gmail.com wrote: It shouldn't matter. Btw try a url instead of a file path. I think the underlying loading mechanism uses java File , it could work. On May 4, 2015 2:07 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Would like to check, will this method of splitting the synonyms into multiple files use up a lot of memory? I'm trying it with about 10 files and that collection is not able to be loaded due to insufficient memory. Although currently my machine only have 4GB of memory, but I only have 500,000 records indexed, so not sure if there's a significant impact in the future (even with larger memory) when my index grows and other things like faceting, highlighting, and carrot tools are implemented. Regards, Edwin On 1 May 2015 at 11:08, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Thank you for the info. Yup this works. I found out that we can't load files that are more than 1MB into zookeeper, as it happens to any files that's larger than 1MB in size, not just the synonyms files. But I'm not sure if there will be an impact to the system, as the number of synonym text file can potentially grow up to more than 20 since my sample synonym file size is more than 20MB. Currently I only have less than 500,000 records indexed in Solr, so not sure if there will be a significant impact as compared to one which has millions of records. Will try to get more records indexed and will update here again. Regards, Edwin On 1 May 2015 at 08:17, Philippe Soares soa...@genomequest.com wrote: Split your synonyms into multiple files and set the SynonymFilterFactory with a coma-separated list of files. e.g. : synonyms=syn1.txt,syn2.txt,syn3.txt On Thu, Apr 30, 2015 at 8:07 PM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Just to populate it with the general synonym words. I've managed to populate it with some source online, but is there a limit to what it can contains? I can't load the configuration into zookeeper if the synonyms.txt file contains more than 2100 lines. Regards, Edwin On 1 May 2015 05:44, Chris Hostetter hossman_luc...@fucit.org wrote: : There is a possible solution here: : https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet to SOLR : Synonym format). If you have WordNet synonyms you do't need any special code/tools to convert them -- the current solr.SynonymFilterFactory supports wordnet files (just specify format=wordnet) : Does anyone knows any faster method of populating the synonyms.txt file : instead of manually typing in the words into the file, which there could : be : thousands of synonyms around? populate from what? what is hte source of your data? the default solr synonym file format is about as simple as it could possibly be -- pretty trivial to generate it from scripts -- the hard part is usually selecting the synonym data you want to use and parsing whatever format it is already in. -Hoss http://www.lucidworks.com/
Re: Injecting synonymns into Solr
Which version of solr? On Thu, Apr 30, 2015 at 9:58 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, Does anyone knows any faster method of populating the synonyms.txt file instead of manually typing in the words into the file, which there could be thousands of synonyms around? Regards, Edwin -- Vincenzo D'Amore email: v.dam...@gmail.com skype: free.dev mobile: +39 349 8513251
Re: Injecting synonymns into Solr
I am facing the same problem; currently I am resorting to a custom program to create this file. Hopefully there is a better solution out there. Thanks, Kaushik On Thu, Apr 30, 2015 at 3:58 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, Does anyone knows any faster method of populating the synonyms.txt file instead of manually typing in the words into the file, which there could be thousands of synonyms around? Regards, Edwin
Re: Injecting synonymns into Solr
There is a possible solution here: https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet to SOLR Synonym format). I don't have personal experience with it. I only know about it because it's mentioned on page 184 of the 'Solr in Action' book by Trey Grainger and Timothy Potter. Maybe someone out there knows more about it and can provide more information. Regards, Scott On Thu, Apr 30, 2015 at 9:45 AM, Kaushik kaushika...@gmail.com wrote: I am facing the same problem; currently I am resorting to a custom program to create this file. Hopefully there is a better solution out there. Thanks, Kaushik On Thu, Apr 30, 2015 at 3:58 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, Does anyone knows any faster method of populating the synonyms.txt file instead of manually typing in the words into the file, which there could be thousands of synonyms around? Regards, Edwin
Re: Injecting synonymns into Solr
I'm using Solr-5.0.0 and ZooKeeper-3.4.6. I've gotton some samples from the Moby Treasure List http://www.gutenberg.org/catalog/world/results?title=moby+list to try it out. However, currently I can only have up to around 2100 lines in my synonyms.txt in when I load the configuration into ZooKeeper. Beyond which I'll get the following errors. 2015-05-01 00:21:55,146 [myid:1] - DEBUG [CommitProcessor:1:FinalRequestProcessor@160] - sessionid:0x14d0b1ffa860002 type:exists cxid:0x148 zxid:0xfffe txntype:unknown reqpath:/configs/collection1/synonyms.txt 2015-05-01 00:21:55,287 [myid:1] - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x14d0b1ffa860002 due to java.io.IOException: Len error 24896062 2015-05-01 00:21:55,287 [myid:1] - DEBUG [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@366] - IOException stack trace java.io.IOException: Len error 24896062 at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:928) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Unknown Source) 2015-05-01 00:21:55,288 [myid:1] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:56818 which had sessionid 0x14d0b1ffa860002 2015-05-01 00:21:55,578 [myid:1] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /0:0:0:0:0:0:0:1:56819 2015-05-01 00:21:55,579 [myid:1] - DEBUG [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@810] - Session establishment request from client /0:0:0:0:0:0:0:1:56819 client's lastZxid is 0x1800a7 2015-05-01 00:21:55,579 [myid:1] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861] - Client attempting to renew session 0x14d0b1ffa860002 at /0:0:0:0:0:0:0:1:56819 2015-05-01 00:21:55,580 [myid:1] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating client: 0x14d0b1ffa860002 2015-05-01 00:21:55,582 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@617] - Established session 0x14d0b1ffa860002 with negotiated timeout 3 for client /0:0:0:0:0:0:0:1:56819 Is there a limit to how much data that can be contain in the synonyms.txt file? The searching on the synonyms using the solr.SyynonymFilterFactory works fine if I have less than 2100 records in synonyms.txt file, but beyond which it cannot be loaded into ZooKeeper. Regards, Edwin On 30 April 2015 at 22:42, Vincenzo D'Amore v.dam...@gmail.com wrote: Which version of solr? On Thu, Apr 30, 2015 at 9:58 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, Does anyone knows any faster method of populating the synonyms.txt file instead of manually typing in the words into the file, which there could be thousands of synonyms around? Regards, Edwin -- Vincenzo D'Amore email: v.dam...@gmail.com skype: free.dev mobile: +39 349 8513251
Re: Injecting synonymns into Solr
Just to populate it with the general synonym words. I've managed to populate it with some source online, but is there a limit to what it can contains? I can't load the configuration into zookeeper if the synonyms.txt file contains more than 2100 lines. Regards, Edwin On 1 May 2015 05:44, Chris Hostetter hossman_luc...@fucit.org wrote: : There is a possible solution here: : https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet to SOLR : Synonym format). If you have WordNet synonyms you do't need any special code/tools to convert them -- the current solr.SynonymFilterFactory supports wordnet files (just specify format=wordnet) : Does anyone knows any faster method of populating the synonyms.txt file : instead of manually typing in the words into the file, which there could : be : thousands of synonyms around? populate from what? what is hte source of your data? the default solr synonym file format is about as simple as it could possibly be -- pretty trivial to generate it from scripts -- the hard part is usually selecting the synonym data you want to use and parsing whatever format it is already in. -Hoss http://www.lucidworks.com/
Re: Injecting synonymns into Solr
Thank you for the info. Yup this works. I found out that we can't load files that are more than 1MB into zookeeper, as it happens to any files that's larger than 1MB in size, not just the synonyms files. But I'm not sure if there will be an impact to the system, as the number of synonym text file can potentially grow up to more than 20 since my sample synonym file size is more than 20MB. Currently I only have less than 500,000 records indexed in Solr, so not sure if there will be a significant impact as compared to one which has millions of records. Will try to get more records indexed and will update here again. Regards, Edwin On 1 May 2015 at 08:17, Philippe Soares soa...@genomequest.com wrote: Split your synonyms into multiple files and set the SynonymFilterFactory with a coma-separated list of files. e.g. : synonyms=syn1.txt,syn2.txt,syn3.txt On Thu, Apr 30, 2015 at 8:07 PM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Just to populate it with the general synonym words. I've managed to populate it with some source online, but is there a limit to what it can contains? I can't load the configuration into zookeeper if the synonyms.txt file contains more than 2100 lines. Regards, Edwin On 1 May 2015 05:44, Chris Hostetter hossman_luc...@fucit.org wrote: : There is a possible solution here: : https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet to SOLR : Synonym format). If you have WordNet synonyms you do't need any special code/tools to convert them -- the current solr.SynonymFilterFactory supports wordnet files (just specify format=wordnet) : Does anyone knows any faster method of populating the synonyms.txt file : instead of manually typing in the words into the file, which there could : be : thousands of synonyms around? populate from what? what is hte source of your data? the default solr synonym file format is about as simple as it could possibly be -- pretty trivial to generate it from scripts -- the hard part is usually selecting the synonym data you want to use and parsing whatever format it is already in. -Hoss http://www.lucidworks.com/
Re: Injecting synonymns into Solr
Split your synonyms into multiple files and set the SynonymFilterFactory with a coma-separated list of files. e.g. : synonyms=syn1.txt,syn2.txt,syn3.txt On Thu, Apr 30, 2015 at 8:07 PM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Just to populate it with the general synonym words. I've managed to populate it with some source online, but is there a limit to what it can contains? I can't load the configuration into zookeeper if the synonyms.txt file contains more than 2100 lines. Regards, Edwin On 1 May 2015 05:44, Chris Hostetter hossman_luc...@fucit.org wrote: : There is a possible solution here: : https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet to SOLR : Synonym format). If you have WordNet synonyms you do't need any special code/tools to convert them -- the current solr.SynonymFilterFactory supports wordnet files (just specify format=wordnet) : Does anyone knows any faster method of populating the synonyms.txt file : instead of manually typing in the words into the file, which there could : be : thousands of synonyms around? populate from what? what is hte source of your data? the default solr synonym file format is about as simple as it could possibly be -- pretty trivial to generate it from scripts -- the hard part is usually selecting the synonym data you want to use and parsing whatever format it is already in. -Hoss http://www.lucidworks.com/
Re: Injecting synonymns into Solr
: There is a possible solution here: : https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet to SOLR : Synonym format). If you have WordNet synonyms you do't need any special code/tools to convert them -- the current solr.SynonymFilterFactory supports wordnet files (just specify format=wordnet) : Does anyone knows any faster method of populating the synonyms.txt file : instead of manually typing in the words into the file, which there could : be : thousands of synonyms around? populate from what? what is hte source of your data? the default solr synonym file format is about as simple as it could possibly be -- pretty trivial to generate it from scripts -- the hard part is usually selecting the synonym data you want to use and parsing whatever format it is already in. -Hoss http://www.lucidworks.com/