Storing 2 dimension array in Solr
Hi, I have a 2 dimension array and want it to be persisted in solr. How can I do that? Sample case: disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group2slight exist exist-1 not found - 2 slight-3 .. can be stored like this also. Note: This array has frequent updates. Every time new disease get's added and I have to add description about that disease to all groups. And at query time, I will do get by row - get by group only group = group2 row. Any suggestion on how I can achieve this? I am thankful to the forum for replying with patience, on achieving this, i will blog and will share it with all. Thanks - David
Re: Solr's Filtering approaches
David, We have a similar query in astrophysics, an user can select an area of the skymany stars out there I am long overdue in creating a Jira issue, but here you have another efficient mechanism for searching large number of ids https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/java/org/apache/solr/search/BitSetQParserPlugin.java Roman On 12 Oct 2013 01:57, David Philip davidphilipshe...@gmail.com wrote: Groups are pharmaceutical research expts.. User is presented with graph view, he can select some region and all the groups in that region gets included..user can modify the groups also here.. so we didn't maintain group information in same solr index but we have externalized. I looked at post filter article. So my understanding is that, I simply have to extended as you did and should include implementaton for isAllowed(acls[doc], groups) .This will filter the documents in the collector and finally this collector will be returned. am I right? @Override public void collect(int doc) throws IOException { if (isAllowed(acls[doc], user, groups)) super.collect(doc); } Erick, I am interested to know whether I can extend any class that can return me only the bitset of the documents that match the search query. I can then do bitset1.andbitset2OfGroups - finally, collect only those documents to return to user. How do I try this approach? Any pointers for bit set? Thanks - David On Thu, Oct 10, 2013 at 5:25 PM, Erick Erickson erickerick...@gmail.com wrote: Well, my first question is why 50K groups is necessary, and whether you can simplify that. How a user can manually choose from among that many groups is interesting. But assuming they're all necessary, I can think of two things. If the user can only select ranges, just put in filter queries using ranges. Or possibly both ranges and individual entries, as fq=group:[1A TO 1A] OR group:(2B 45C 98Z) etc. You need to be a little careful how you put index these so range queries work properly, in the above you'd miss 2A because it's sorting lexicographically, you'd need to store in some form that sorts like 001A 01A and so on. You wouldn't need to show that form to the user, just form your fq's in the app to work with that form. If that won't work (you wouldn't want this to get huge), think about a post filter that would only operate on documents that had made it through the select, although how to convey which groups the user selected to the post filter is an open question. Best, Erick On Wed, Oct 9, 2013 at 12:23 PM, David Philip davidphilipshe...@gmail.com wrote: Hi All, I have an issue in handling filters for one of our requirements and liked to get suggestion for the best approaches. *Use Case:* 1. We have List of groups and the number of groups can increase upto 1 million. Currently we have almost 90 thousand groups in the solr search system. 2. Just before the user hits a search, He has options to select the no. of groups he want to retrieve. [the distinct list of these group Names for display are retrieved from other solr index that has more information about groups] *3.User Operation:** * Say if user selected group 1A - group 1A. and searches for key:cancer. The current approach I was thinking is : get search results and filter query by groupids' list selected by user. But my concern is When these groups list is increasing to 50k unique Ids, This can cause lot of delay in getting search results. So wanted to know whether there are different filtering ways that I can try for? I was thinking of one more approach as suggested by my colleague to do - intersection. - Get the groupIds' selected by user. Get the list of groupId's from search results, Perform intersection of both and then get the entire result set of only those groupid that intersected. Is this better way? Can I use any cache technique in this case? - David.
Re: Replace NULL with 0 while Indexing
What about using COALESCE in SQL? like: select COALESCE(duration, 0) as duration from mytable On 11 October 2013 22:02, keshari.prerna keshari.pre...@gmail.com wrote: Hello, One of my indexing field have NULL values and i want it to be replaces with 0 while indexing itself. So that when i search after indexing it gives me 0 instead of NULL. This is my data-config.xml and duration is the field which has null values. dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://trdbadhoc/test_results responseBuffering=adaptive batchSize=-1 user=results password=resultsloader/ document entity name=Test_Syndrome pk=id query=SELECT TS.id AS id, TET.type AS error_type, TS.syndrome AS syndrome, S.start_date, SE.session_id AS sessionid, S.duration, TL.logfile, J.job_number AS job, cluster, S.hostname, platform FROM Test_Syndrome AS TS STRAIGHT_JOIN Session_Errors AS SE ON (SE.test_syndrome_id = TS.id) STRAIGHT_JOIN Session AS S ON (S.id = SE.session_id) STRAIGHT_JOIN Test_Run AS TR ON (TR.session_id = SE.session_id) STRAIGHT_JOIN Test_Log AS TL ON (TL.id = TR.test_log_id) STRAIGHT_JOIN Job AS J ON (J.id = TL.job_id) STRAIGHT_JOIN Cluster AS C ON (C.id = J.cluster_id) STRAIGHT_JOIN Platform ON (TR.platform_id = Platform.id) STRAIGHT_JOIN Test_Error_Type TET ON (SE.test_error_type_id = TET.id) Field column=id name=id/ Field column=error_type name=error_type/ Field column=syndrome name=syndrome/ Field column=sessionid name=sessionid/ Field column=duration name=duration/ Field column=logfile name=logfile/ Field column=job name=job/ Field column=cluster name=cluster/ Field column=hostname name=hostname/ Field column=platform name=platform/ /entity /document /dataConfig Please help. Thanks Regards, Prerna -- View this message in context: http://lucene.472066.n3.nabble.com/Replace-NULL-with-0-while-Indexing-tp4095059.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Replace NULL with 0 while Indexing
Or you can use custom update preprocessor. W dniu 11.10.2013 23:02, keshari.prerna pisze: Hello, One of my indexing field have NULL values and i want it to be replaces with 0 while indexing itself. So that when i search after indexing it gives me 0 instead of NULL. This is my data-config.xml and duration is the field which has null values. dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://trdbadhoc/test_results responseBuffering=adaptive batchSize=-1 user=results password=resultsloader/ document entity name=Test_Syndrome pk=id query=SELECT TS.id AS id, TET.type AS error_type, TS.syndrome AS syndrome, S.start_date, SE.session_id AS sessionid, S.duration, TL.logfile, J.job_number AS job, cluster, S.hostname, platform FROM Test_Syndrome AS TS STRAIGHT_JOIN Session_Errors AS SE ON (SE.test_syndrome_id = TS.id) STRAIGHT_JOIN Session AS S ON (S.id = SE.session_id) STRAIGHT_JOIN Test_Run AS TR ON (TR.session_id = SE.session_id) STRAIGHT_JOIN Test_Log AS TL ON (TL.id = TR.test_log_id) STRAIGHT_JOIN Job AS J ON (J.id = TL.job_id) STRAIGHT_JOIN Cluster AS C ON (C.id = J.cluster_id) STRAIGHT_JOIN Platform ON (TR.platform_id = Platform.id) STRAIGHT_JOIN Test_Error_Type TET ON (SE.test_error_type_id = TET.id) Field column=id name=id/ Field column=error_type name=error_type/ Field column=syndrome name=syndrome/ Field column=sessionid name=sessionid/ Field column=duration name=duration/ Field column=logfile name=logfile/ Field column=job name=job/ Field column=cluster name=cluster/ Field column=hostname name=hostname/ Field column=platform name=platform/ /entity /document /dataConfig Please help. Thanks Regards, Prerna -- View this message in context: http://lucene.472066.n3.nabble.com/Replace-NULL-with-0-while-Indexing-tp4095059.html Sent from the Solr - User mailing list archive at Nabble.com. -- Karol Sikora Kierownik Informatyczny Projektu CBN - Interfejs 2.0 +48 781 493 788 Laboratorium EE ul. Mokotowska 46A/23 | 00-543 Warszawa | www.laboratorium.ee | www.laboratorium.ee/facebook
Re: Storing 2 dimension array in Solr
David: This feels like it may be an XY problem. _Why_ do you want to store a 2-dimensional array and what do you want to do with it? Maybe there are better approaches. Best Erick On Sat, Oct 12, 2013 at 2:07 AM, David Philip davidphilipshe...@gmail.comwrote: Hi, I have a 2 dimension array and want it to be persisted in solr. How can I do that? Sample case: disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group2slight exist exist-1 not found - 2 slight-3 .. can be stored like this also. Note: This array has frequent updates. Every time new disease get's added and I have to add description about that disease to all groups. And at query time, I will do get by row - get by group only group = group2 row. Any suggestion on how I can achieve this? I am thankful to the forum for replying with patience, on achieving this, i will blog and will share it with all. Thanks - David
Re: Storing 2 dimension array in Solr
Hi Erick, We have set of groups as represented below. New columns (diseases as in below matrix) keep coming and we need to add them as new column. To that column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na, notfound) for respective groups. While querying we need to get the entire row for group:group1. We will not be searching on columns(*_disease) values, index=false but stored is true. for ex: we use, get group:group1 and we need to get the entire row- exist,slight, not found. Hoping this explanation is clearer. disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group3slight exist groupK-na exist Thanks - David On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson erickerick...@gmail.comwrote: David: This feels like it may be an XY problem. _Why_ do you want to store a 2-dimensional array and what do you want to do with it? Maybe there are better approaches. Best Erick On Sat, Oct 12, 2013 at 2:07 AM, David Philip davidphilipshe...@gmail.comwrote: Hi, I have a 2 dimension array and want it to be persisted in solr. How can I do that? Sample case: disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group2slight exist exist-1 not found - 2 slight-3 .. can be stored like this also. Note: This array has frequent updates. Every time new disease get's added and I have to add description about that disease to all groups. And at query time, I will do get by row - get by group only group = group2 row. Any suggestion on how I can achieve this? I am thankful to the forum for replying with patience, on achieving this, i will blog and will share it with all. Thanks - David
Re: Profiling Solr Lucene for query
Would adding a dummy shard instead of a dummy collection would resolve the situation? - e.g. editing clusterstate.json from a zookeeper client and adding a shard with a 0-range so no docs are routed to this core. This core would be on a separate server and act as the collection gateway.
Re: Storing 2 dimension array in Solr
Isn't this just indexing each row as a separate document with a suitable ID groupN in your example? On Sat, Oct 12, 2013 at 2:43 PM, David Philip davidphilipshe...@gmail.comwrote: Hi Erick, We have set of groups as represented below. New columns (diseases as in below matrix) keep coming and we need to add them as new column. To that column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na, notfound) for respective groups. While querying we need to get the entire row for group:group1. We will not be searching on columns(*_disease) values, index=false but stored is true. for ex: we use, get group:group1 and we need to get the entire row- exist,slight, not found. Hoping this explanation is clearer. disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group3slight exist groupK-na exist Thanks - David On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson erickerick...@gmail.com wrote: David: This feels like it may be an XY problem. _Why_ do you want to store a 2-dimensional array and what do you want to do with it? Maybe there are better approaches. Best Erick On Sat, Oct 12, 2013 at 2:07 AM, David Philip davidphilipshe...@gmail.comwrote: Hi, I have a 2 dimension array and want it to be persisted in solr. How can I do that? Sample case: disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group2slight exist exist-1 not found - 2 slight-3 .. can be stored like this also. Note: This array has frequent updates. Every time new disease get's added and I have to add description about that disease to all groups. And at query time, I will do get by row - get by group only group = group2 row. Any suggestion on how I can achieve this? I am thankful to the forum for replying with patience, on achieving this, i will blog and will share it with all. Thanks - David
Re: SolrCloud on SSL
On 10/11/2013 9:38 AM, Christopher Gross wrote: On Fri, Oct 11, 2013 at 11:08 AM, Shawn Heisey s...@elyograg.org wrote: On 10/11/2013 8:17 AM, Christopher Gross wrote: Is there a spot in a Solr configuration that I can set this up to use HTTPS? From what I can tell, not yet. https://issues.apache.org/jira/browse/SOLR-3854 https://issues.apache.org/jira/browse/SOLR-4407 https://issues.apache.org/jira/browse/SOLR-4470 Dang. Christopher, I was just looking through Solr source code for a completely different issue, and it seems that there *IS* a way to do this in your configuration. If you were to use https://hostname; or https://ipaddress; as the host parameter in your solr.xml file on each machine, it should do what you want. The parameter is described here, but not the behavior that I have discovered: http://wiki.apache.org/solr/SolrCloud#SolrCloud_Instance_Params Boring details: In the org.apache.solr.cloud package, there is a ZkController class. The getHostAddress method is where I discovered that you can do this. If you could try this out and confirm that it works, I will get the wiki page updated and look into the Solr reference guide as well. Thanks, Shawn
Re: Storing 2 dimension array in Solr
Hi Erick, Yes it is. But the columns here are dynamically and very frequently added.They can increase upto 1 million right now. So, 1 document with 1 million dynamic fields, is it fine? Or any other approach? While searching through web, I found that docValues are column oriented. http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/ However, I did not understand, how to use docValues to add these columns. What is the recommended approach? Thanks - David On Sun, Oct 13, 2013 at 3:33 AM, Erick Erickson erickerick...@gmail.comwrote: Isn't this just indexing each row as a separate document with a suitable ID groupN in your example? On Sat, Oct 12, 2013 at 2:43 PM, David Philip davidphilipshe...@gmail.comwrote: Hi Erick, We have set of groups as represented below. New columns (diseases as in below matrix) keep coming and we need to add them as new column. To that column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na, notfound) for respective groups. While querying we need to get the entire row for group:group1. We will not be searching on columns(*_disease) values, index=false but stored is true. for ex: we use, get group:group1 and we need to get the entire row- exist,slight, not found. Hoping this explanation is clearer. disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group3slight exist groupK-na exist Thanks - David On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson erickerick...@gmail.com wrote: David: This feels like it may be an XY problem. _Why_ do you want to store a 2-dimensional array and what do you want to do with it? Maybe there are better approaches. Best Erick On Sat, Oct 12, 2013 at 2:07 AM, David Philip davidphilipshe...@gmail.comwrote: Hi, I have a 2 dimension array and want it to be persisted in solr. How can I do that? Sample case: disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group2slight exist exist-1 not found - 2 slight-3 .. can be stored like this also. Note: This array has frequent updates. Every time new disease get's added and I have to add description about that disease to all groups. And at query time, I will do get by row - get by group only group = group2 row. Any suggestion on how I can achieve this? I am thankful to the forum for replying with patience, on achieving this, i will blog and will share it with all. Thanks - David
Re: Storing 2 dimension array in Solr
You may be better off indexing each element of the array as a solr document, with a group field and a disease field. Then you can easily and efficiently add new diseases. Then to query a row, you query for the group field having the desired group. If possible, index the array as being sparse - no document for a disease if it is not present for that group. -- Jack Krupansky -Original Message- From: David Philip Sent: Saturday, October 12, 2013 9:56 PM To: solr-user@lucene.apache.org Subject: Re: Storing 2 dimension array in Solr Hi Erick, Yes it is. But the columns here are dynamically and very frequently added.They can increase upto 1 million right now. So, 1 document with 1 million dynamic fields, is it fine? Or any other approach? While searching through web, I found that docValues are column oriented. http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/ However, I did not understand, how to use docValues to add these columns. What is the recommended approach? Thanks - David On Sun, Oct 13, 2013 at 3:33 AM, Erick Erickson erickerick...@gmail.comwrote: Isn't this just indexing each row as a separate document with a suitable ID groupN in your example? On Sat, Oct 12, 2013 at 2:43 PM, David Philip davidphilipshe...@gmail.comwrote: Hi Erick, We have set of groups as represented below. New columns (diseases as in below matrix) keep coming and we need to add them as new column. To that column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na, notfound) for respective groups. While querying we need to get the entire row for group:group1. We will not be searching on columns(*_disease) values, index=false but stored is true. for ex: we use, get group:group1 and we need to get the entire row- exist,slight, not found. Hoping this explanation is clearer. disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group3slight exist groupK-na exist Thanks - David On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson erickerick...@gmail.com wrote: David: This feels like it may be an XY problem. _Why_ do you want to store a 2-dimensional array and what do you want to do with it? Maybe there are better approaches. Best Erick On Sat, Oct 12, 2013 at 2:07 AM, David Philip davidphilipshe...@gmail.comwrote: Hi, I have a 2 dimension array and want it to be persisted in solr. How can I do that? Sample case: disease1disease2 disease3 group1exist slight not found groups2 slightnot foundexist group2slight exist exist-1 not found - 2 slight-3 .. can be stored like this also. Note: This array has frequent updates. Every time new disease get's added and I have to add description about that disease to all groups. And at query time, I will do get by row - get by group only group = group2 row. Any suggestion on how I can achieve this? I am thankful to the forum for replying with patience, on achieving this, i will blog and will share it with all. Thanks - David
Re: SolrCore 'collection1' is not available due to init failure
Liu Bo, Changing the permissions fixed the problem. Thank you for helping me. Best regards, Jim -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCore-collection1-is-not-available-due-to-init-failure-tp4094869p4095195.html Sent from the Solr - User mailing list archive at Nabble.com.