Re: FilterCache size should reduce as index grows?
On Wed, 2017-10-04 at 21:42 -0700, S G wrote: > The bit-vectors in filterCache are as long as the maximum number of > documents in a core. If there are a billion docs per core, every bit > vector will have a billion bits making its size as 10 9 / 8 = 128 mb The tricky part here is there are sparse (aka few hits) entries that takes up less space. The 1 bit/hit is worst case. This is both good and bad. The good part is of course that it saves memory. The bad part is that it often means that people set the filterCache size to a high number and that it works well, right until a series of filters with many hits. It seems that the memory limit option maxSizeMB was added in Solr 5.2: https://issues.apache.org/jira/browse/SOLR-7372 I am not sure if it works with all caches in Solr, but in my world it is way better to define the caches by memory instead of count. > With such a big cache-value per entry, the default value of 128 > values in will become 128x128mb = 16gb and would not be very good for > a system running below 32 gb of memory. Sure. The default values are just that. For an index with 1M documents and a lot of different filters, 128 would probably be too low. If someone were to create a well-researched set of config files for different scenarios, it would be a welcome addition to our shared knowledge pool. > If such a use-case is anticipated, either the JVM's max memory be > increased to beyond 40 gb or the filterCache size be reduced to 32. Best solution: Use maxSizeMB (if it works) Second best solution: Reduce to 32 or less Third best, but often used, solution: Hope that most of the entries are sparse and will remain so - Toke Eskildsen, Royal Danish Library
FilterCache size should reduce as index grows?
Hi, Here is a discussion we had recently with a fellow Solr user. It seems reasonable to me and wanted to see if this is an accepted theory. The bit-vectors in filterCache are as long as the maximum number of documents in a core. If there are a billion docs per core, every bit vector will have a billion bits making its size as 10 9 / 8 = 128 mb With such a big cache-value per entry, the default value of 128 values in will become 128x128mb = 16gb and would not be very good for a system running below 32 gb of memory. If such a use-case is anticipated, either the JVM's max memory be increased to beyond 40 gb or the filterCache size be reduced to 32. Thanks SG
Re: How to Index JSON field Solr 5.3.2
Thanks Emir! Deeksha Sharma Software Engineer 215 2nd St #2, San Francisco, CA 94105. United States Desk: 6316817418 Mobile: +64 21 084 54203 dsha...@flexera.com www.flexera.com<http://www.flexera.com> CONFIDENTIALITY NOTICE: This email message (including any attachments) is for the sole use of the intended recipient and may contain proprietary, confidential, or trade secret information. Any unauthorized review, use, disclosure, or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. Thank you . If you no longer wish to receive emails from Flexera Software, you can manage your email preferences or unsubscribe here<https://info.flexerasoftware.com/Manage-Preferences> From: Emir Arnautović Sent: Tuesday, October 3, 2017 12:58:57 AM To: solr-user@lucene.apache.org Subject: Re: How to Index JSON field Solr 5.3.2 Hi Sharma, I guess you are looking for nested documents: https://lucene.apache.org/solr/guide/6_6/uploading-data-with-index-handlers.html#UploadingDatawithIndexHandlers-NestedChildDocuments <https://lucene.apache.org/solr/guide/6_6/uploading-data-with-index-handlers.html#UploadingDatawithIndexHandlers-NestedChildDocuments> It seems DIH supports it since versions 5.1: https://issues.apache.org/jira/browse/SOLR-5147 <https://issues.apache.org/jira/browse/SOLR-5147> Regards, Emir > On 2 Oct 2017, at 10:50, Deeksha Sharma wrote: > > Hi everyone, > > > I have created a core and index data in Solr using dataImportHandler. > > > The schema for the core looks like this: > > required="true"/> > > required="true"/> > > > > This is my data in mysql database: > > > md5:"376463475574058bba96395bfb87" > > > rules: > {"fileRules":[{"file_id":1321241,"md5":"376463475574058bba96395bfb87","group_id":69253,"filecdata1":{"versionId":3382858,"version":"1.2.1","detectionNotes":"Generated > from Ibiblio Maven2, see URL > (http://maven.ibiblio.org/maven2/sk/seges/acris/acris-security-hibernate).","texts":[{"shortText":null,"header":"Sample > from URL > (http://maven.ibiblio.org/maven2/sk/seges/acris/acris-os-parent/1.2.1/acris-os-parent-1.2.1.pom)","text":" >The Apache Software License, Version 2.0 >http://www.apache.org/licenses/LICENSE-2.0.txt >repo > "}],"notes":[],"forge":"Ibiblio > Maven2"}}],"groupRules":[{"group_id":69253,"parent":-1,"component":"sk.seges.acris/acris-security-hibernate > - AcrIS Security with Hibernate metadata","license":"Apache > 2.0","groupcdata1":{"componentId":583560,"title":"sk.seges.acris/acris-security-hibernate > - Ibiblio > Maven2","licenseIds":[20],"priority":3,"url":"http://maven.ibiblio.org/maven2/sk/seges/acris/acris-security-hibernate","displayName":"AcrIS > Security with Hibernate > metadata","description":null,"texts":[],"notes":[],"forge":"Ibiblio > Maven2"}}]} > > Query results from Solr: > > { "responseHeader":{ "status":0, "QTime":0, "params":{ > "q":"md5:03bb576a6b6e001cd94e91ad4c29", "indent":"on", "wt":"json", > "_":"1506933082656"}}, "response":{"numFound":1,"start":0,"docs":[ { > "rules":"{\"fileRules\":[{\"file_id\":7328190,\"md5\":\"03bb576a6b6e001cd94e91ad4c29\",\"group_id\":241307,\"filecdata1\":{\"versionId\":15761972,\"version\":\"1.0.2\",\"detectionNotes\":null,\"texts\":[{\"shortText\":null,\"header\":\"The > following text is found at URL > (https://www.nuget.org/packages/HangFire.Redis/1.0.2)\",\"text\":\"License > details:\nLGPL-3.0\"}],\"notes\":[],\"forge\":\"NuGet > Gallery\"}}],\"groupRules\":[{\"group_id\":241307,\"parent\":-1,\"component\":\"HangFire.Redis\",\"license\":\"LGPL > > 3.0\",\"groupcdata1\":{\"componentId\":3524318,\"title\":null,\"licenseIds\":[216],\"priority\":1,\"url\":\"https://www.nuget.org/p
Re: Keeping the index naturally ordered by some field
Have you looked at Streaming and Streaming Expressions? This is pretty much what they were built for. Since you're talking a billion documents, you're probably sharding anyway, in which case I'd guess you're using SolrCloud. That's what I'd be using first if at all possible. Best, Erick On Mon, Oct 2, 2017 at 3:15 PM, alexpusch wrote: > The reason I'm interested in this is kind of unique. I'm writing a custom > query parser and search component. These components go over the search > results and perform some calculation over it. This calculation depends on > input sorted by a certain value. In this scenario, regular solr sorting is > insufficient as it's performed in post-search, and only collects needed rows > to satisfy the query. The alternative for naturally sorted index is to sort > all the docs myself, and I wish to avoid this. I use docValues extensively, > it really is a great help. > > Erick, I've tried using SortingMergePolicyFactory. It brings me close to my > goal, but it's not quite there. The problem with this approach is that while > each segment is sorted by itself there might be overlapping in ranges > between the segments. For example, lets say that some query results lay in > segments A, B, and C. Each one of the segments is sorted, so the docs coming > from segment A will be sorted in the range 0-50, docs coming from segment B > will be sorted in the range 20-70, and segment C will hold values in the > 50-90 range. The query result will be 0-50,20-70, 50-90. Almost sorted, but > not quite there. > > A helpful detail about my data is that the fields I'm interested in sorting > the index by is a timestamp. Docs are indexed more or less in the correct > order. As a result, if the merge policy I'm using will merge only > consecutive segments, it should satisfy my need. TieredMergePolicy does > merge non-consecutive segments so it's clearly a bad fit. I'm hoping to get > some insight about some additional steps I may take so that > SortingMergePolicyFactory could achieve perfection. > > Thanks! > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Keeping the index naturally ordered by some field
The reason I'm interested in this is kind of unique. I'm writing a custom query parser and search component. These components go over the search results and perform some calculation over it. This calculation depends on input sorted by a certain value. In this scenario, regular solr sorting is insufficient as it's performed in post-search, and only collects needed rows to satisfy the query. The alternative for naturally sorted index is to sort all the docs myself, and I wish to avoid this. I use docValues extensively, it really is a great help. Erick, I've tried using SortingMergePolicyFactory. It brings me close to my goal, but it's not quite there. The problem with this approach is that while each segment is sorted by itself there might be overlapping in ranges between the segments. For example, lets say that some query results lay in segments A, B, and C. Each one of the segments is sorted, so the docs coming from segment A will be sorted in the range 0-50, docs coming from segment B will be sorted in the range 20-70, and segment C will hold values in the 50-90 range. The query result will be 0-50,20-70, 50-90. Almost sorted, but not quite there. A helpful detail about my data is that the fields I'm interested in sorting the index by is a timestamp. Docs are indexed more or less in the correct order. As a result, if the merge policy I'm using will merge only consecutive segments, it should satisfy my need. TieredMergePolicy does merge non-consecutive segments so it's clearly a bad fit. I'm hoping to get some insight about some additional steps I may take so that SortingMergePolicyFactory could achieve perfection. Thanks! -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: How to Index JSON field Solr 5.3.2
Hi Sharma, I guess you are looking for nested documents: https://lucene.apache.org/solr/guide/6_6/uploading-data-with-index-handlers.html#UploadingDatawithIndexHandlers-NestedChildDocuments <https://lucene.apache.org/solr/guide/6_6/uploading-data-with-index-handlers.html#UploadingDatawithIndexHandlers-NestedChildDocuments> It seems DIH supports it since versions 5.1: https://issues.apache.org/jira/browse/SOLR-5147 <https://issues.apache.org/jira/browse/SOLR-5147> Regards, Emir > On 2 Oct 2017, at 10:50, Deeksha Sharma wrote: > > Hi everyone, > > > I have created a core and index data in Solr using dataImportHandler. > > > The schema for the core looks like this: > > required="true"/> > > required="true"/> > > > > This is my data in mysql database: > > > md5:"376463475574058bba96395bfb87" > > > rules: > {"fileRules":[{"file_id":1321241,"md5":"376463475574058bba96395bfb87","group_id":69253,"filecdata1":{"versionId":3382858,"version":"1.2.1","detectionNotes":"Generated > from Ibiblio Maven2, see URL > (http://maven.ibiblio.org/maven2/sk/seges/acris/acris-security-hibernate).","texts":[{"shortText":null,"header":"Sample > from URL > (http://maven.ibiblio.org/maven2/sk/seges/acris/acris-os-parent/1.2.1/acris-os-parent-1.2.1.pom)","text":" >The Apache Software License, Version 2.0 >http://www.apache.org/licenses/LICENSE-2.0.txt >repo > "}],"notes":[],"forge":"Ibiblio > Maven2"}}],"groupRules":[{"group_id":69253,"parent":-1,"component":"sk.seges.acris/acris-security-hibernate > - AcrIS Security with Hibernate metadata","license":"Apache > 2.0","groupcdata1":{"componentId":583560,"title":"sk.seges.acris/acris-security-hibernate > - Ibiblio > Maven2","licenseIds":[20],"priority":3,"url":"http://maven.ibiblio.org/maven2/sk/seges/acris/acris-security-hibernate","displayName":"AcrIS > Security with Hibernate > metadata","description":null,"texts":[],"notes":[],"forge":"Ibiblio > Maven2"}}]} > > Query results from Solr: > > { "responseHeader":{ "status":0, "QTime":0, "params":{ > "q":"md5:03bb576a6b6e001cd94e91ad4c29", "indent":"on", "wt":"json", > "_":"1506933082656"}}, "response":{"numFound":1,"start":0,"docs":[ { > "rules":"{\"fileRules\":[{\"file_id\":7328190,\"md5\":\"03bb576a6b6e001cd94e91ad4c29\",\"group_id\":241307,\"filecdata1\":{\"versionId\":15761972,\"version\":\"1.0.2\",\"detectionNotes\":null,\"texts\":[{\"shortText\":null,\"header\":\"The > following text is found at URL > (https://www.nuget.org/packages/HangFire.Redis/1.0.2)\",\"text\":\"License > details:\nLGPL-3.0\"}],\"notes\":[],\"forge\":\"NuGet > Gallery\"}}],\"groupRules\":[{\"group_id\":241307,\"parent\":-1,\"component\":\"HangFire.Redis\",\"license\":\"LGPL > > 3.0\",\"groupcdata1\":{\"componentId\":3524318,\"title\":null,\"licenseIds\":[216],\"priority\":1,\"url\":\"https://www.nuget.org/packages/HangFire.Redis\",\"displayName\":\"Hangfire > Redis Storage [DEPRECATED]\",\"description\":\"DEPRECATED -- DO NOT INSTALL > OR UPDATE. Now shipped with Hangfire Pro, please read the \"Project site\" > (http://odinserj.net/2014/11/15/hangfire-pro/) for more > information.\",\"texts\":[{\"shortText\":null,\"header\":\"License details > history:\n(Refer to https://www.nuget.org/packages/HangFire.Redis and select > the desired version for more information)\",\"text\":\"LGPL-3.0 - (for > HangFire.Redis versions 0.7.0, 0.7.1, 0.7.3, 0.7.4, 0.7.5, 0.8.0, 0.8.1, > 0.8.2, 0.8.3, 0.9.0, 0.9.1, 1.0.1, 1.0.0, 1.0.2)\nNo information - (for > HangFire.Redis versions 1.1.1, 2.0.1, > 2.0.0)\"}],\"notes\":[{\"header\":null,\"text\":\"Project Site: > http://odinserj.net/2014/11/15/h
How to Index JSON field Solr 5.3.2
Hi everyone, I have created a core and index data in Solr using dataImportHandler. The schema for the core looks like this: This is my data in mysql database: md5:"376463475574058bba96395bfb87" rules: {"fileRules":[{"file_id":1321241,"md5":"376463475574058bba96395bfb87","group_id":69253,"filecdata1":{"versionId":3382858,"version":"1.2.1","detectionNotes":"Generated from Ibiblio Maven2, see URL (http://maven.ibiblio.org/maven2/sk/seges/acris/acris-security-hibernate).","texts":[{"shortText":null,"header":"Sample from URL (http://maven.ibiblio.org/maven2/sk/seges/acris/acris-os-parent/1.2.1/acris-os-parent-1.2.1.pom)","text":" The Apache Software License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0.txt repo "}],"notes":[],"forge":"Ibiblio Maven2"}}],"groupRules":[{"group_id":69253,"parent":-1,"component":"sk.seges.acris/acris-security-hibernate - AcrIS Security with Hibernate metadata","license":"Apache 2.0","groupcdata1":{"componentId":583560,"title":"sk.seges.acris/acris-security-hibernate - Ibiblio Maven2","licenseIds":[20],"priority":3,"url":"http://maven.ibiblio.org/maven2/sk/seges/acris/acris-security-hibernate","displayName":"AcrIS Security with Hibernate metadata","description":null,"texts":[],"notes":[],"forge":"Ibiblio Maven2"}}]} Query results from Solr: { "responseHeader":{ "status":0, "QTime":0, "params":{ "q":"md5:03bb576a6b6e001cd94e91ad4c29", "indent":"on", "wt":"json", "_":"1506933082656"}}, "response":{"numFound":1,"start":0,"docs":[ { "rules":"{\"fileRules\":[{\"file_id\":7328190,\"md5\":\"03bb576a6b6e001cd94e91ad4c29\",\"group_id\":241307,\"filecdata1\":{\"versionId\":15761972,\"version\":\"1.0.2\",\"detectionNotes\":null,\"texts\":[{\"shortText\":null,\"header\":\"The following text is found at URL (https://www.nuget.org/packages/HangFire.Redis/1.0.2)\",\"text\":\"License details:\nLGPL-3.0\"}],\"notes\":[],\"forge\":\"NuGet Gallery\"}}],\"groupRules\":[{\"group_id\":241307,\"parent\":-1,\"component\":\"HangFire.Redis\",\"license\":\"LGPL 3.0\",\"groupcdata1\":{\"componentId\":3524318,\"title\":null,\"licenseIds\":[216],\"priority\":1,\"url\":\"https://www.nuget.org/packages/HangFire.Redis\",\"displayName\":\"Hangfire Redis Storage [DEPRECATED]\",\"description\":\"DEPRECATED -- DO NOT INSTALL OR UPDATE. Now shipped with Hangfire Pro, please read the \"Project site\" (http://odinserj.net/2014/11/15/hangfire-pro/) for more information.\",\"texts\":[{\"shortText\":null,\"header\":\"License details history:\n(Refer to https://www.nuget.org/packages/HangFire.Redis and select the desired version for more information)\",\"text\":\"LGPL-3.0 - (for HangFire.Redis versions 0.7.0, 0.7.1, 0.7.3, 0.7.4, 0.7.5, 0.8.0, 0.8.1, 0.8.2, 0.8.3, 0.9.0, 0.9.1, 1.0.1, 1.0.0, 1.0.2)\nNo information - (for HangFire.Redis versions 1.1.1, 2.0.1, 2.0.0)\"}],\"notes\":[{\"header\":null,\"text\":\"Project Site: http://odinserj.net/2014/11/15/hangfire-pro\"},{\"header\":\"Previous Project Sites\",\"text\":\"https://github.com/odinserj/HangFire - (for Hangfire Redis Storage [DEPRECATED] version 0.7.0)\nhttp://hangfire.io - (for Hangfire Redis Storage [DEPRECATED] versions 0.7.1, 0.7.3, 0.7.4, 0.7.5, 0.8.0, 0.8.1, 0.8.2, 0.8.3, 0.9.0, 0.9.1, 1.0.1, 1.0.0, 1.0.2, 1.1.1)\nNo information - (for Hangfire Redis Storage [DEPRECATED] versions 2.0.1, 2.0.0)\"},{\"header\":\"License links\",\"text\":\"https://raw.github.com/odinserj/HangFire/master/COPYING.LESSER - (for HangFire.Redis version 0.7.0)\nhttps://raw.github.com/odinserj/HangFire/master/LICENSE.md - (for HangFire.Redis versions 0.7.1, 0.7.3, 0.7.4, 0.7.5, 0.8.0, 0.8.1, 0.8.2, 0.8.3, 0.9.0, 0.9.1)\nhttps://raw.github.com/odinserj/Hangfire/master/LICENSE.md - (for HangFire.Redis versions 1.0.1, 1.0.0, 1.0.2)\nhttps://raw.github.com/HangfireIO/Hangfire/master/LICENSE.md - (for HangFire.Redis version 1.1.1)\nNo information - (for HangFire.Redis versions 2.0.1, 2.0.0)\"}],\"forge\":\"NuGet Gallery\"}}]}", "md5":"03bb576a6b6e001cd94e91ad4c29", "_version_":1579807444777828352}] }} Now when I receive the results from Solr query, it returns me the String for rules. How can I tell Solr to index rules as JSON and return a valid JSON instead of escaped String ? Any help is greatly appreciated. Thanks!
Re: Keeping the index naturally ordered by some field
Hi Alex, just to explore a bit your question, why do you need that ? Do you need to reduce query time ? Have you tried enabling docValues for the fields of interest ? Doc Values seem to me a pretty useful data structure when sorting is a requirement. I am curious to understand why that was not an option. Regards - --- Alessandro Benedetti Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Keeping the index naturally ordered by some field
I think you're looking for SortingMergePolicyFactory, see: https://issues.apache.org/jira/browse/SOLR-5730 The JIRA has some extensive discussion and the reference guide has an example. It might take a little digging Best, Erick On Sun, Oct 1, 2017 at 4:36 AM, Ahmet Arslan wrote: > > > Hi Alex, > > Lucene has this capability (borrowed from Nutch) under > org.apache.lucene.index.sorter package.I think it has been integrated into > Solr, but could not find the Jira issue. > > Ahmet > > > On Sunday, October 1, 2017, 10:22:45 AM GMT+3, alexpusch > wrote: > > > > > > Hello, > We've got a pretty big index (~1B small docs). I'm interested in managing > the index so that the search results would be naturally sorted by a certain > numeric field, without specifying the actual sort field in query time. > > My first attempt was using SortingMergePolicyFactory. I've found that this > provides only partial success. The results were occasionally sorted, but > overall there where 'jumps' in the ordering. > > After some research I've found this excellent blog post > <http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html> > that taught me that TieredMergePolicy merges non consecutive segments, and > thus creating several segments with interlacing ordering. I've tried > replacing the merge policy to LogByteSizeMergePolicy, but results are still > inconsistent. > > The post is from 2011, and it's not clear to me whether today > LogByteSizeMergePolicy merges only consecutive segments, or it can merge non > consecutive segments as well. > > Is there an approach that will allow me achieve this goal? > > Solr version: 6.0 > > Thanks, Alex. > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: Keeping the index naturally ordered by some field
Hi Alex, Lucene has this capability (borrowed from Nutch) under org.apache.lucene.index.sorter package.I think it has been integrated into Solr, but could not find the Jira issue. Ahmet On Sunday, October 1, 2017, 10:22:45 AM GMT+3, alexpusch wrote: Hello, We've got a pretty big index (~1B small docs). I'm interested in managing the index so that the search results would be naturally sorted by a certain numeric field, without specifying the actual sort field in query time. My first attempt was using SortingMergePolicyFactory. I've found that this provides only partial success. The results were occasionally sorted, but overall there where 'jumps' in the ordering. After some research I've found this excellent blog post <http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html> that taught me that TieredMergePolicy merges non consecutive segments, and thus creating several segments with interlacing ordering. I've tried replacing the merge policy to LogByteSizeMergePolicy, but results are still inconsistent. The post is from 2011, and it's not clear to me whether today LogByteSizeMergePolicy merges only consecutive segments, or it can merge non consecutive segments as well. Is there an approach that will allow me achieve this goal? Solr version: 6.0 Thanks, Alex. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Keeping the index naturally ordered by some field
Hello, We've got a pretty big index (~1B small docs). I'm interested in managing the index so that the search results would be naturally sorted by a certain numeric field, without specifying the actual sort field in query time. My first attempt was using SortingMergePolicyFactory. I've found that this provides only partial success. The results were occasionally sorted, but overall there where 'jumps' in the ordering. After some research I've found this excellent blog post <http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html> that taught me that TieredMergePolicy merges non consecutive segments, and thus creating several segments with interlacing ordering. I've tried replacing the merge policy to LogByteSizeMergePolicy, but results are still inconsistent. The post is from 2011, and it's not clear to me whether today LogByteSizeMergePolicy merges only consecutive segments, or it can merge non consecutive segments as well. Is there an approach that will allow me achieve this goal? Solr version: 6.0 Thanks, Alex. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr 7.0.0 -- can it use a 6.5.0 data repository (index)
Regarding not finding the issue, JIRA has a problem with queries when the user is not logged in (see also https://jira.atlassian.com/browse/JRASERVER-38511 if you're interested in the details). There's unfortunately not much we can do about it besides manually edit issues to remove a security setting which gets automatically added to issues when they are created (which I've now done for SOLR-11406). Your best bet in the future would be to log into JIRA before initiating a search to be sure you aren't missing one that's "hidden" inadvertently. Cassandra On Wed, Sep 27, 2017 at 1:39 PM, Wayne L. Johnson wrote: > First, thanks for the quick response. Yes, it sounds like the same problem!! > > I did a bunch of searching before repoting the issue, I didn't come across > that JIRA or I wouldn't have reported it. My apologies for the duplication > (although it is a new JIRA). > > Is there a good place to start searching in the future? I'm a fairly > experiences Solr user, and I don't mind slogging through Java code. > > Meanwhile I'll follow the JIRA so I know when it gets fixed. > > Thanks!! > > -Original Message- > From: Stefan Matheis [mailto:matheis.ste...@gmail.com] > Sent: Wednesday, September 27, 2017 12:32 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr 7.0.0 -- can it use a 6.5.0 data repository (index) > > That sounds like > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D11406&d=DwIFaQ&c=z0adcvxXWKG6LAMN6dVEqQ&r=4gLDKHTqOXldY2aQti2VNXYWPtqa1bUKE6MA9VrIJfU&m=iYU948dQo6G0tKFQUguY6SHOZNZoCOEAEv1sCf4ukcA&s=HvPPQL--s3bFtNyBdUiz1hNIqfLEVrb4Cu-HIC71dKY&e= > if i'm not mistaken? > > -Stefan > > On Sep 27, 2017 8:20 PM, "Wayne L. Johnson" > wrote: > >> I’m testing Solr 7.0.0. When I start with an empty index, Solr comes >> up just fine, I can add documents and query documents. However when I >> start with an already-populated set of documents (from 6.5.0), Solr >> will not start. The relevant portion of the traceback seems to be: >> >> Caused by: java.lang.NullPointerException >> >> at java.util.Objects.requireNonNull(Objects.java:203) >> >> … >> >> at java.util.stream.ReferencePipeline.reduce( >> ReferencePipeline.java:479) >> >> at org.apache.solr.index.SlowCompositeReaderWrapper.( >> SlowCompositeReaderWrapper.java:76) >> >> at org.apache.solr.index.SlowCompositeReaderWrapper.wrap( >> SlowCompositeReaderWrapper.java:57) >> >> at org.apache.solr.search.SolrIndexSearcher.( >> SolrIndexSearcher.java:252) >> >> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java: >> 2034) >> >> ... 12 more >> >> >> >> In looking at the de-compiled code (SlowCompositeReaderWrapper), lines >> 72-77, and it appears that one or more “leaf” files doesn’t have a >> “min-version” set. That’s a guess. If so, does this mean Solr 7.0.0 >> can’t read a 6.5.0 index? >> >> >> >> Thanks >> >> >> >> Wayne Johnson >> >> 801-240-4024 >> >> wjohnson...@ldschurch.org >> >> [image: familysearch2.JPG] >> >> >>
RE: Solr 7.0.0 -- can it use a 6.5.0 data repository (index)
First, thanks for the quick response. Yes, it sounds like the same problem!! I did a bunch of searching before repoting the issue, I didn't come across that JIRA or I wouldn't have reported it. My apologies for the duplication (although it is a new JIRA). Is there a good place to start searching in the future? I'm a fairly experiences Solr user, and I don't mind slogging through Java code. Meanwhile I'll follow the JIRA so I know when it gets fixed. Thanks!! -Original Message- From: Stefan Matheis [mailto:matheis.ste...@gmail.com] Sent: Wednesday, September 27, 2017 12:32 PM To: solr-user@lucene.apache.org Subject: Re: Solr 7.0.0 -- can it use a 6.5.0 data repository (index) That sounds like https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D11406&d=DwIFaQ&c=z0adcvxXWKG6LAMN6dVEqQ&r=4gLDKHTqOXldY2aQti2VNXYWPtqa1bUKE6MA9VrIJfU&m=iYU948dQo6G0tKFQUguY6SHOZNZoCOEAEv1sCf4ukcA&s=HvPPQL--s3bFtNyBdUiz1hNIqfLEVrb4Cu-HIC71dKY&e= if i'm not mistaken? -Stefan On Sep 27, 2017 8:20 PM, "Wayne L. Johnson" wrote: > I’m testing Solr 7.0.0. When I start with an empty index, Solr comes > up just fine, I can add documents and query documents. However when I > start with an already-populated set of documents (from 6.5.0), Solr > will not start. The relevant portion of the traceback seems to be: > > Caused by: java.lang.NullPointerException > > at java.util.Objects.requireNonNull(Objects.java:203) > > … > > at java.util.stream.ReferencePipeline.reduce( > ReferencePipeline.java:479) > > at org.apache.solr.index.SlowCompositeReaderWrapper.( > SlowCompositeReaderWrapper.java:76) > > at org.apache.solr.index.SlowCompositeReaderWrapper.wrap( > SlowCompositeReaderWrapper.java:57) > > at org.apache.solr.search.SolrIndexSearcher.( > SolrIndexSearcher.java:252) > > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java: > 2034) > > ... 12 more > > > > In looking at the de-compiled code (SlowCompositeReaderWrapper), lines > 72-77, and it appears that one or more “leaf” files doesn’t have a > “min-version” set. That’s a guess. If so, does this mean Solr 7.0.0 > can’t read a 6.5.0 index? > > > > Thanks > > > > Wayne Johnson > > 801-240-4024 > > wjohnson...@ldschurch.org > > [image: familysearch2.JPG] > > >
Re: Solr 7.0.0 -- can it use a 6.5.0 data repository (index)
That sounds like https://issues.apache.org/jira/browse/SOLR-11406 if i'm not mistaken? -Stefan On Sep 27, 2017 8:20 PM, "Wayne L. Johnson" wrote: > I’m testing Solr 7.0.0. When I start with an empty index, Solr comes up > just fine, I can add documents and query documents. However when I start > with an already-populated set of documents (from 6.5.0), Solr will not > start. The relevant portion of the traceback seems to be: > > Caused by: java.lang.NullPointerException > > at java.util.Objects.requireNonNull(Objects.java:203) > > … > > at java.util.stream.ReferencePipeline.reduce( > ReferencePipeline.java:479) > > at org.apache.solr.index.SlowCompositeReaderWrapper.( > SlowCompositeReaderWrapper.java:76) > > at org.apache.solr.index.SlowCompositeReaderWrapper.wrap( > SlowCompositeReaderWrapper.java:57) > > at org.apache.solr.search.SolrIndexSearcher.( > SolrIndexSearcher.java:252) > > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java: > 2034) > > ... 12 more > > > > In looking at the de-compiled code (SlowCompositeReaderWrapper), lines > 72-77, and it appears that one or more “leaf” files doesn’t have a > “min-version” set. That’s a guess. If so, does this mean Solr 7.0.0 can’t > read a 6.5.0 index? > > > > Thanks > > > > Wayne Johnson > > 801-240-4024 > > wjohnson...@ldschurch.org > > [image: familysearch2.JPG] > > >
Solr 7.0.0 -- can it use a 6.5.0 data repository (index)
I'm testing Solr 7.0.0. When I start with an empty index, Solr comes up just fine, I can add documents and query documents. However when I start with an already-populated set of documents (from 6.5.0), Solr will not start. The relevant portion of the traceback seems to be: Caused by: java.lang.NullPointerException at java.util.Objects.requireNonNull(Objects.java:203) ... at java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:479) at org.apache.solr.index.SlowCompositeReaderWrapper.(SlowCompositeReaderWrapper.java:76) at org.apache.solr.index.SlowCompositeReaderWrapper.wrap(SlowCompositeReaderWrapper.java:57) at org.apache.solr.search.SolrIndexSearcher.(SolrIndexSearcher.java:252) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2034) ... 12 more In looking at the de-compiled code (SlowCompositeReaderWrapper), lines 72-77, and it appears that one or more "leaf" files doesn't have a "min-version" set. That's a guess. If so, does this mean Solr 7.0.0 can't read a 6.5.0 index? Thanks Wayne Johnson 801-240-4024 wjohnson...@ldschurch.org<mailto:wjohnson...@ldschurch.org> [familysearch2.JPG]
Re: Solr Spatial Index and Data
Hi Can, For your first question: you should share more information with us as Rick indicated. Do you have any errors, do you have unique ids or not etc? For the second one: you should read here: https://cwiki.apache.org/confluence/display/solr/Spatial+Search and ask your questions if you have any. Kind Regards, Furkan KAMACI On Thu, Sep 14, 2017 at 1:34 PM, Rick Leir wrote: > hi Can Ezgi > > First of all, i want to use spatial index for my data include polyghons > and points. But solr indexed first 18 rows, other rows not indexed. > > Do all rows have a unique id field? > > Are there errors in the logfile? > cheers -- Rick > > > . >
Re: Adding UniqueKey to an existing Solr 6.4 Index
Not really. Do note that atomic updates require 1> all _original_ fields (i.e. fields that are _not_ destinations for copyFields) have stored=true 2> no destination of a copyField has stored=true 3> compose the original document from stored fields and re-index the doc. This latter just means that atomic updates are actually slightly more work than just re-indexing the doc from the system-of-record (as far as Solr is concerned). The decision to use atomic updates is up to you of course, the slight extra work may be bettern than getting the docs from the original source... Best, Erick On Fri, Sep 15, 2017 at 10:38 AM, Pankaj Gurumukhi wrote: > Hello, > > I have a single node Solr 6.4 server, with a Index of 100 Million documents. > The default "id" is the primary key of this index. Now, I would like to setup > an update process to insert new documents, and update existing documents > based on availability of value in another field (say ProductId), that is > different from the default "id". Now, to ensure that I use the Solr provided > De-Duplication method by having a new field SignatureField using the > ProductId as UniqueKey. Considering the millions of documents I have, I would > like to ask if its possible to setup a De-Duplication mechanism in an > existing solr index with the following steps: > > a. Add new field SignatureField, and configure it as UniqueKey in Solr > schema. > > b.Run an Atomic Update process on all documents, to update the value of > this new field SignatureField. > > Is there an easier/better way to add a SignatureField to an existing large > index? > > Thx, > Pankaj >
Adding UniqueKey to an existing Solr 6.4 Index
Hello, I have a single node Solr 6.4 server, with a Index of 100 Million documents. The default "id" is the primary key of this index. Now, I would like to setup an update process to insert new documents, and update existing documents based on availability of value in another field (say ProductId), that is different from the default "id". Now, to ensure that I use the Solr provided De-Duplication method by having a new field SignatureField using the ProductId as UniqueKey. Considering the millions of documents I have, I would like to ask if its possible to setup a De-Duplication mechanism in an existing solr index with the following steps: a. Add new field SignatureField, and configure it as UniqueKey in Solr schema. b.Run an Atomic Update process on all documents, to update the value of this new field SignatureField. Is there an easier/better way to add a SignatureField to an existing large index? Thx, Pankaj
Re: Solr Spatial Index and Data
hi Can Ezgi > First of all, i want to use spatial index for my data include polyghons and points. But solr indexed first 18 rows, other rows not indexed. Do all rows have a unique id field? Are there errors in the logfile? cheers -- Rick .
Solr Spatial Index and Data
Hi everyone, First of all, i want to use spatial index for my data include polyghons and points. But solr indexed first 18 rows, other rows not indexed. I need sample datas include polyghons and points. Other problem, i will write spatial query this datas. This spatial query include intersect, neighborhood, in etc. Please could you help me this query prepare? Thx for interest. Best regards. [cid:74426A0B-010D-4871-A556-A3590DE88C60@islem.com.tr.] Can Ezgi AYDEMİR Oracle Veri Tabanı Yöneticisi İşlem Coğrafi Bilgi Sistemleri Müh. & Eğitim AŞ. 2024.Cadde No:14, Beysukent 06800, Ankara, Türkiye T : 0 312 233 50 00 .:. F : 0312 235 56 82 E : cayde...@islem.com.tr<https://mail.islem.com.tr/owa/redir.aspx?REF=nPSsfnBmV5Ce9vWorvlOrrYthN1Wt5jhrDrHz4IuPgJuXODmM8nUCAFtYWlsdG86Z2R1cmFuQGlzbGVtLmNvbS50cg..> .:. W : https://mail.islem.com.tr/owa/redir.aspx?REF=q0Pp2HH-W10G07gbyIRn7NyrFWyaL2QLhqXKE1SMNj1uXODmM8nUCAFodHRwOi8vd3d3LmlzbGVtLmNvbS50ci8.> Bu e-posta ve ekindekiler gizli bilgiler içeriyor olabilir ve sadece adreslenen kişileri ilgilendirir. Eğer adreslenen kişi siz değilseniz, bu e-postayı yaymayınız, dağıtmayınız veya kopyalamayınız. Eğer bu e-posta yanlışlıkla size gönderildiyse, lütfen bu e-posta ve ekindeki dosyaları sisteminizden siliniz ve göndereni hemen bilgilendiriniz. Ayrıca, bu e-posta ve ekindeki dosyaları virüs bulaşması ihtimaline karşı taratınız. İŞLEM GIS® bu e-posta ile taşınabilecek herhangi bir virüsün neden olabileceği hasarın sorumluluğunu kabul etmez. Bilgi için:b...@islem.com.tr This message may contain confidential information and is intended only for recipient name. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately if you have received this e-mail by mistake and delete this e-mail from your system. Finally, the recipient should check this email and any attachments for the presence of viruses. İŞLEM GIS® accepts no liability for any damage may be caused by any virus transmitted by this email.” For information: b...@islem.com.tr
Re: Freeze Index
On Wed, 2017-09-13 at 11:56 -0700, fabigol wrote: > my problem is that my index freeze several time and i don't know why. > So i lost all the data of my index. > I have 14 million of documents from postgresql database. I have an > only node with 31 GO for my JVM and my server has 64GO. My index make > 6 GO on the HDD. > > Is it a good configuration? If you look in the admin GUI, you can see how much memory is actually used by the JVM. My guess is that it is _way_ lower than 31GB. A 6GB index is quite small and unless you do special processing, you should be fine with a 2GB JVM or something like that. One of the symptoms for having too large a memory allocation for the JVM are occasional long pauses due to garbage collection. However, you should not lose anything - it is just a pause. Can you describe in more detail what you mean by freeze and losing data? - Toke Eskildsen, Royal Danish Library
Re: Freeze Index
Fabien, What do you see in the logfile at the time of the freeze? Cheers -- Rick On September 13, 2017 3:01:17 PM EDT, fabigol wrote: >hi, >my problem is that my index freeze several time and i don't know why. >So i >lost all the data of my index. >I have 14 million of documents from postgresql database. I have an only >node >with 31 GO for my JVM and my server has 64GO. My index make 6 GO on the >HDD. >Is it a good configuration? > >Someone can help me. > >thank for advance > > > > > >-- >Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Freeze Index
hi, my problem is that my index freeze several time and i don't know why. So i lost all the data of my index. I have 14 million of documents from postgresql database. I have an only node with 31 GO for my JVM and my server has 64GO. My index make 6 GO on the HDD. Is it a good configuration? Someone can help me. thank for advance -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Freeze Index
hi, my problem is that my index freeze several time and i don't know why. So i lost all the data of my index. I have 14 million of documents from postgresql database. I have an only node with 31 GO for my JVM and my server has 64GO. My index make 6 GO on the HDD. Is it a good configuration? Someone can help me. thank for advance -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Index relational database
To pile on here: When you denormalize you also get some functionality that you do not get with Solr joins, they've been called "pseudo joins" in Solr for a reason. If you just use the simple approach of indexing the two tables then joining across them you can't return fields from both tables in a single document. To do that you need to use parent/child docs which has its own restrictions. So rather than worry excessively about which is faster, I'd recommend you decide on the functionality you need as a starting point. Best, Erick On Thu, Aug 31, 2017 at 7:34 AM, Walter Underwood wrote: > There is no way tell which is faster without trying it. > > Query speed depends on the size of the data (rows), the complexity of the > join, which database, what kind of disk, etc. > > Solr speed depends on the size of the documents, the complexity of your > analysis chains, what kind of disk, how much CPU is available, etc. > > We have one query that extracts 9 million documents from MySQL in about 20 > minutes. We have another query on a different MySQL database that takes 90 > minutes to get 7 million documents. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > >> On Aug 31, 2017, at 12:54 AM, Renuka Srishti >> wrote: >> >> Thanks Erick, Walter >> But I think join query will reduce the performance. Denormalization will be >> the better way than join query, am I right? >> >> >> >> On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood >> wrote: >> >>> Think about making a denormalized view, with all the fields needed in one >>> table. That view gets sent to Solr. Each row is a Solr document. >>> >>> It could be implemented as a view or as SQL, but that is a useful mental >>> model for people starting from a relational background. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >>> >>>> On Aug 30, 2017, at 9:14 AM, Erick Erickson >>> wrote: >>>> >>>> First, it's often best, by far, to denormalize the data in your solr >>> index, >>>> that's what I'd explore first. >>>> >>>> If you can't do that, the join query parser might work for you. >>>> >>>> On Aug 30, 2017 4:49 AM, "Renuka Srishti" >>>> wrote: >>>> >>>>> Thanks Susheel for your response. >>>>> Here is the scenario about which I am talking: >>>>> >>>>> - Let suppose there are two documents doc1 and doc2. >>>>> - I want to fetch the data from doc2 on the basis of doc1 fields which >>>>> are related to doc2. >>>>> >>>>> How to achieve this efficiently. >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Renuka Srishti >>>>> >>>>> >>>>> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar >>>>> wrote: >>>>> >>>>>> Hello Renuka, >>>>>> >>>>>> I would suggest to start with your use case(s). May be start with your >>>>>> first use case with the below questions >>>>>> >>>>>> a) What is that you want to search (which fields like name, desc, city >>>>>> etc.) >>>>>> b) What is that you want to show part of search result (name, city >>> etc.) >>>>>> >>>>>> Based on above two questions, you would know what data to pull in from >>>>>> relational database and create solr schema and index the data. >>>>>> >>>>>> You may first try to denormalize / flatten the structure so that you >>> deal >>>>>> with one collection/schema and query upon it. >>>>>> >>>>>> HTH. >>>>>> >>>>>> Thanks, >>>>>> Susheel >>>>>> >>>>>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti < >>>>>> renuka.srisht...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hii, >>>>>>> >>>>>>> What is the best way to index relational database, and how it impacts >>>>> on >>>>>>> the performance? >>>>>>> >>>>>>> Thanks >>>>>>> Renuka Srishti >>>>>>> >>>>>> >>>>> >>> >>> >
Re: Index relational database
There is no way tell which is faster without trying it. Query speed depends on the size of the data (rows), the complexity of the join, which database, what kind of disk, etc. Solr speed depends on the size of the documents, the complexity of your analysis chains, what kind of disk, how much CPU is available, etc. We have one query that extracts 9 million documents from MySQL in about 20 minutes. We have another query on a different MySQL database that takes 90 minutes to get 7 million documents. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 31, 2017, at 12:54 AM, Renuka Srishti > wrote: > > Thanks Erick, Walter > But I think join query will reduce the performance. Denormalization will be > the better way than join query, am I right? > > > > On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood > wrote: > >> Think about making a denormalized view, with all the fields needed in one >> table. That view gets sent to Solr. Each row is a Solr document. >> >> It could be implemented as a view or as SQL, but that is a useful mental >> model for people starting from a relational background. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >> >>> On Aug 30, 2017, at 9:14 AM, Erick Erickson >> wrote: >>> >>> First, it's often best, by far, to denormalize the data in your solr >> index, >>> that's what I'd explore first. >>> >>> If you can't do that, the join query parser might work for you. >>> >>> On Aug 30, 2017 4:49 AM, "Renuka Srishti" >>> wrote: >>> >>>> Thanks Susheel for your response. >>>> Here is the scenario about which I am talking: >>>> >>>> - Let suppose there are two documents doc1 and doc2. >>>> - I want to fetch the data from doc2 on the basis of doc1 fields which >>>> are related to doc2. >>>> >>>> How to achieve this efficiently. >>>> >>>> >>>> Thanks, >>>> >>>> Renuka Srishti >>>> >>>> >>>> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar >>>> wrote: >>>> >>>>> Hello Renuka, >>>>> >>>>> I would suggest to start with your use case(s). May be start with your >>>>> first use case with the below questions >>>>> >>>>> a) What is that you want to search (which fields like name, desc, city >>>>> etc.) >>>>> b) What is that you want to show part of search result (name, city >> etc.) >>>>> >>>>> Based on above two questions, you would know what data to pull in from >>>>> relational database and create solr schema and index the data. >>>>> >>>>> You may first try to denormalize / flatten the structure so that you >> deal >>>>> with one collection/schema and query upon it. >>>>> >>>>> HTH. >>>>> >>>>> Thanks, >>>>> Susheel >>>>> >>>>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti < >>>>> renuka.srisht...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hii, >>>>>> >>>>>> What is the best way to index relational database, and how it impacts >>>> on >>>>>> the performance? >>>>>> >>>>>> Thanks >>>>>> Renuka Srishti >>>>>> >>>>> >>>> >> >>
Re: Solr index getting replaced instead of merged
>Can anyone tell is it possible to paginate the data using Solr UI? use the start/rows input fields using standard array start as 0, ie start=0, rows=10 start=10, rows=10 start=20, rows=10 On Thu, Aug 31, 2017 at 8:21 AM, Agrawal, Harshal (GE Digital) < harshal.agra...@ge.com> wrote: > Hello All, > > If I check out clear option while indexing 2nd table it worked.Thanks > Gurdeep :) > Can anyone tell is it possible to paginate the data using Solr UI? > If yes please tell me the features which I can use? > > Regards > Harshal > > From: Agrawal, Harshal (GE Digital) > Sent: Wednesday, August 30, 2017 4:36 PM > To: 'solr-user@lucene.apache.org' > Cc: Singh, Susnata (GE Digital) > Subject: Solr index getting replaced instead of merged > > Hello Guys, > > I have installed solr in my local system and was able to connect to > Teradata successfully. > For single table I am able to index the data and query it also but when I > am trying for multiple tables in the same schema and doing indexing one by > one respectively. > I can see datasets getting replaced instead of merged . > > Can anyone help me please: > > Regards > Harshal > > >
RE: Solr index getting replaced instead of merged
Hello All, If I check out clear option while indexing 2nd table it worked.Thanks Gurdeep :) Can anyone tell is it possible to paginate the data using Solr UI? If yes please tell me the features which I can use? Regards Harshal From: Agrawal, Harshal (GE Digital) Sent: Wednesday, August 30, 2017 4:36 PM To: 'solr-user@lucene.apache.org' Cc: Singh, Susnata (GE Digital) Subject: Solr index getting replaced instead of merged Hello Guys, I have installed solr in my local system and was able to connect to Teradata successfully. For single table I am able to index the data and query it also but when I am trying for multiple tables in the same schema and doing indexing one by one respectively. I can see datasets getting replaced instead of merged . Can anyone help me please: Regards Harshal
Re: Index relational database
when indexing a relational database its generally always best to denormalize it in a view or in your indexing code On Thu, Aug 31, 2017 at 3:54 AM, Renuka Srishti wrote: > Thanks Erick, Walter > But I think join query will reduce the performance. Denormalization will be > the better way than join query, am I right? > > > > On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood > wrote: > > > Think about making a denormalized view, with all the fields needed in one > > table. That view gets sent to Solr. Each row is a Solr document. > > > > It could be implemented as a view or as SQL, but that is a useful mental > > model for people starting from a relational background. > > > > wunder > > Walter Underwood > > wun...@wunderwood.org > > http://observer.wunderwood.org/ (my blog) > > > > > > > On Aug 30, 2017, at 9:14 AM, Erick Erickson > > wrote: > > > > > > First, it's often best, by far, to denormalize the data in your solr > > index, > > > that's what I'd explore first. > > > > > > If you can't do that, the join query parser might work for you. > > > > > > On Aug 30, 2017 4:49 AM, "Renuka Srishti" > > > wrote: > > > > > >> Thanks Susheel for your response. > > >> Here is the scenario about which I am talking: > > >> > > >> - Let suppose there are two documents doc1 and doc2. > > >> - I want to fetch the data from doc2 on the basis of doc1 fields > which > > >> are related to doc2. > > >> > > >> How to achieve this efficiently. > > >> > > >> > > >> Thanks, > > >> > > >> Renuka Srishti > > >> > > >> > > >> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar > > > >> wrote: > > >> > > >>> Hello Renuka, > > >>> > > >>> I would suggest to start with your use case(s). May be start with > your > > >>> first use case with the below questions > > >>> > > >>> a) What is that you want to search (which fields like name, desc, > city > > >>> etc.) > > >>> b) What is that you want to show part of search result (name, city > > etc.) > > >>> > > >>> Based on above two questions, you would know what data to pull in > from > > >>> relational database and create solr schema and index the data. > > >>> > > >>> You may first try to denormalize / flatten the structure so that you > > deal > > >>> with one collection/schema and query upon it. > > >>> > > >>> HTH. > > >>> > > >>> Thanks, > > >>> Susheel > > >>> > > >>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti < > > >>> renuka.srisht...@gmail.com> > > >>> wrote: > > >>> > > >>>> Hii, > > >>>> > > >>>> What is the best way to index relational database, and how it > impacts > > >> on > > >>>> the performance? > > >>>> > > >>>> Thanks > > >>>> Renuka Srishti > > >>>> > > >>> > > >> > > > > >
Re: Index relational database
Thank all for sharing your thoughts :) On Thu, Aug 31, 2017 at 5:28 PM, Susheel Kumar wrote: > Yes, if you can avoid join and work with flat/denormalized structure then > that's the best. > > On Thu, Aug 31, 2017 at 3:54 AM, Renuka Srishti < > renuka.srisht...@gmail.com> > wrote: > > > Thanks Erick, Walter > > But I think join query will reduce the performance. Denormalization will > be > > the better way than join query, am I right? > > > > > > > > On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood < > wun...@wunderwood.org> > > wrote: > > > > > Think about making a denormalized view, with all the fields needed in > one > > > table. That view gets sent to Solr. Each row is a Solr document. > > > > > > It could be implemented as a view or as SQL, but that is a useful > mental > > > model for people starting from a relational background. > > > > > > wunder > > > Walter Underwood > > > wun...@wunderwood.org > > > http://observer.wunderwood.org/ (my blog) > > > > > > > > > > On Aug 30, 2017, at 9:14 AM, Erick Erickson > > > > wrote: > > > > > > > > First, it's often best, by far, to denormalize the data in your solr > > > index, > > > > that's what I'd explore first. > > > > > > > > If you can't do that, the join query parser might work for you. > > > > > > > > On Aug 30, 2017 4:49 AM, "Renuka Srishti" < > renuka.srisht...@gmail.com> > > > > wrote: > > > > > > > >> Thanks Susheel for your response. > > > >> Here is the scenario about which I am talking: > > > >> > > > >> - Let suppose there are two documents doc1 and doc2. > > > >> - I want to fetch the data from doc2 on the basis of doc1 fields > > which > > > >> are related to doc2. > > > >> > > > >> How to achieve this efficiently. > > > >> > > > >> > > > >> Thanks, > > > >> > > > >> Renuka Srishti > > > >> > > > >> > > > >> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar < > susheel2...@gmail.com > > > > > > >> wrote: > > > >> > > > >>> Hello Renuka, > > > >>> > > > >>> I would suggest to start with your use case(s). May be start with > > your > > > >>> first use case with the below questions > > > >>> > > > >>> a) What is that you want to search (which fields like name, desc, > > city > > > >>> etc.) > > > >>> b) What is that you want to show part of search result (name, city > > > etc.) > > > >>> > > > >>> Based on above two questions, you would know what data to pull in > > from > > > >>> relational database and create solr schema and index the data. > > > >>> > > > >>> You may first try to denormalize / flatten the structure so that > you > > > deal > > > >>> with one collection/schema and query upon it. > > > >>> > > > >>> HTH. > > > >>> > > > >>> Thanks, > > > >>> Susheel > > > >>> > > > >>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti < > > > >>> renuka.srisht...@gmail.com> > > > >>> wrote: > > > >>> > > > >>>> Hii, > > > >>>> > > > >>>> What is the best way to index relational database, and how it > > impacts > > > >> on > > > >>>> the performance? > > > >>>> > > > >>>> Thanks > > > >>>> Renuka Srishti > > > >>>> > > > >>> > > > >> > > > > > > > > >
Re: Index relational database
Yes, if you can avoid join and work with flat/denormalized structure then that's the best. On Thu, Aug 31, 2017 at 3:54 AM, Renuka Srishti wrote: > Thanks Erick, Walter > But I think join query will reduce the performance. Denormalization will be > the better way than join query, am I right? > > > > On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood > wrote: > > > Think about making a denormalized view, with all the fields needed in one > > table. That view gets sent to Solr. Each row is a Solr document. > > > > It could be implemented as a view or as SQL, but that is a useful mental > > model for people starting from a relational background. > > > > wunder > > Walter Underwood > > wun...@wunderwood.org > > http://observer.wunderwood.org/ (my blog) > > > > > > > On Aug 30, 2017, at 9:14 AM, Erick Erickson > > wrote: > > > > > > First, it's often best, by far, to denormalize the data in your solr > > index, > > > that's what I'd explore first. > > > > > > If you can't do that, the join query parser might work for you. > > > > > > On Aug 30, 2017 4:49 AM, "Renuka Srishti" > > > wrote: > > > > > >> Thanks Susheel for your response. > > >> Here is the scenario about which I am talking: > > >> > > >> - Let suppose there are two documents doc1 and doc2. > > >> - I want to fetch the data from doc2 on the basis of doc1 fields > which > > >> are related to doc2. > > >> > > >> How to achieve this efficiently. > > >> > > >> > > >> Thanks, > > >> > > >> Renuka Srishti > > >> > > >> > > >> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar > > > >> wrote: > > >> > > >>> Hello Renuka, > > >>> > > >>> I would suggest to start with your use case(s). May be start with > your > > >>> first use case with the below questions > > >>> > > >>> a) What is that you want to search (which fields like name, desc, > city > > >>> etc.) > > >>> b) What is that you want to show part of search result (name, city > > etc.) > > >>> > > >>> Based on above two questions, you would know what data to pull in > from > > >>> relational database and create solr schema and index the data. > > >>> > > >>> You may first try to denormalize / flatten the structure so that you > > deal > > >>> with one collection/schema and query upon it. > > >>> > > >>> HTH. > > >>> > > >>> Thanks, > > >>> Susheel > > >>> > > >>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti < > > >>> renuka.srisht...@gmail.com> > > >>> wrote: > > >>> > > >>>> Hii, > > >>>> > > >>>> What is the best way to index relational database, and how it > impacts > > >> on > > >>>> the performance? > > >>>> > > >>>> Thanks > > >>>> Renuka Srishti > > >>>> > > >>> > > >> > > > > >
Re: Index relational database
Thanks Erick, Walter But I think join query will reduce the performance. Denormalization will be the better way than join query, am I right? On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood wrote: > Think about making a denormalized view, with all the fields needed in one > table. That view gets sent to Solr. Each row is a Solr document. > > It could be implemented as a view or as SQL, but that is a useful mental > model for people starting from a relational background. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > > On Aug 30, 2017, at 9:14 AM, Erick Erickson > wrote: > > > > First, it's often best, by far, to denormalize the data in your solr > index, > > that's what I'd explore first. > > > > If you can't do that, the join query parser might work for you. > > > > On Aug 30, 2017 4:49 AM, "Renuka Srishti" > > wrote: > > > >> Thanks Susheel for your response. > >> Here is the scenario about which I am talking: > >> > >> - Let suppose there are two documents doc1 and doc2. > >> - I want to fetch the data from doc2 on the basis of doc1 fields which > >> are related to doc2. > >> > >> How to achieve this efficiently. > >> > >> > >> Thanks, > >> > >> Renuka Srishti > >> > >> > >> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar > >> wrote: > >> > >>> Hello Renuka, > >>> > >>> I would suggest to start with your use case(s). May be start with your > >>> first use case with the below questions > >>> > >>> a) What is that you want to search (which fields like name, desc, city > >>> etc.) > >>> b) What is that you want to show part of search result (name, city > etc.) > >>> > >>> Based on above two questions, you would know what data to pull in from > >>> relational database and create solr schema and index the data. > >>> > >>> You may first try to denormalize / flatten the structure so that you > deal > >>> with one collection/schema and query upon it. > >>> > >>> HTH. > >>> > >>> Thanks, > >>> Susheel > >>> > >>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti < > >>> renuka.srisht...@gmail.com> > >>> wrote: > >>> > >>>> Hii, > >>>> > >>>> What is the best way to index relational database, and how it impacts > >> on > >>>> the performance? > >>>> > >>>> Thanks > >>>> Renuka Srishti > >>>> > >>> > >> > >
Re: Index relational database
Think about making a denormalized view, with all the fields needed in one table. That view gets sent to Solr. Each row is a Solr document. It could be implemented as a view or as SQL, but that is a useful mental model for people starting from a relational background. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 30, 2017, at 9:14 AM, Erick Erickson wrote: > > First, it's often best, by far, to denormalize the data in your solr index, > that's what I'd explore first. > > If you can't do that, the join query parser might work for you. > > On Aug 30, 2017 4:49 AM, "Renuka Srishti" > wrote: > >> Thanks Susheel for your response. >> Here is the scenario about which I am talking: >> >> - Let suppose there are two documents doc1 and doc2. >> - I want to fetch the data from doc2 on the basis of doc1 fields which >> are related to doc2. >> >> How to achieve this efficiently. >> >> >> Thanks, >> >> Renuka Srishti >> >> >> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar >> wrote: >> >>> Hello Renuka, >>> >>> I would suggest to start with your use case(s). May be start with your >>> first use case with the below questions >>> >>> a) What is that you want to search (which fields like name, desc, city >>> etc.) >>> b) What is that you want to show part of search result (name, city etc.) >>> >>> Based on above two questions, you would know what data to pull in from >>> relational database and create solr schema and index the data. >>> >>> You may first try to denormalize / flatten the structure so that you deal >>> with one collection/schema and query upon it. >>> >>> HTH. >>> >>> Thanks, >>> Susheel >>> >>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti < >>> renuka.srisht...@gmail.com> >>> wrote: >>> >>>> Hii, >>>> >>>> What is the best way to index relational database, and how it impacts >> on >>>> the performance? >>>> >>>> Thanks >>>> Renuka Srishti >>>> >>> >>
Re: Index relational database
First, it's often best, by far, to denormalize the data in your solr index, that's what I'd explore first. If you can't do that, the join query parser might work for you. On Aug 30, 2017 4:49 AM, "Renuka Srishti" wrote: > Thanks Susheel for your response. > Here is the scenario about which I am talking: > >- Let suppose there are two documents doc1 and doc2. >- I want to fetch the data from doc2 on the basis of doc1 fields which >are related to doc2. > > How to achieve this efficiently. > > > Thanks, > > Renuka Srishti > > > On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar > wrote: > > > Hello Renuka, > > > > I would suggest to start with your use case(s). May be start with your > > first use case with the below questions > > > > a) What is that you want to search (which fields like name, desc, city > > etc.) > > b) What is that you want to show part of search result (name, city etc.) > > > > Based on above two questions, you would know what data to pull in from > > relational database and create solr schema and index the data. > > > > You may first try to denormalize / flatten the structure so that you deal > > with one collection/schema and query upon it. > > > > HTH. > > > > Thanks, > > Susheel > > > > On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti < > > renuka.srisht...@gmail.com> > > wrote: > > > > > Hii, > > > > > > What is the best way to index relational database, and how it impacts > on > > > the performance? > > > > > > Thanks > > > Renuka Srishti > > > > > >
Re: Solr index getting replaced instead of merged
Not sure how you are doing indexing. Try adding clean=false in your indexing command/script when you do second table indexing. > On 30 Aug 2017, at 7:06 PM, Agrawal, Harshal (GE Digital) > wrote: > > Hello Guys, > > I have installed solr in my local system and was able to connect to Teradata > successfully. > For single table I am able to index the data and query it also but when I am > trying for multiple tables in the same schema and doing indexing one by one > respectively. > I can see datasets getting replaced instead of merged . > > Can anyone help me please: > > Regards > Harshal > >
Solr index getting replaced instead of merged
Hello Guys, I have installed solr in my local system and was able to connect to Teradata successfully. For single table I am able to index the data and query it also but when I am trying for multiple tables in the same schema and doing indexing one by one respectively. I can see datasets getting replaced instead of merged . Can anyone help me please: Regards Harshal
Re: Index relational database
Thanks Susheel for your response. Here is the scenario about which I am talking: - Let suppose there are two documents doc1 and doc2. - I want to fetch the data from doc2 on the basis of doc1 fields which are related to doc2. How to achieve this efficiently. Thanks, Renuka Srishti On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar wrote: > Hello Renuka, > > I would suggest to start with your use case(s). May be start with your > first use case with the below questions > > a) What is that you want to search (which fields like name, desc, city > etc.) > b) What is that you want to show part of search result (name, city etc.) > > Based on above two questions, you would know what data to pull in from > relational database and create solr schema and index the data. > > You may first try to denormalize / flatten the structure so that you deal > with one collection/schema and query upon it. > > HTH. > > Thanks, > Susheel > > On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti < > renuka.srisht...@gmail.com> > wrote: > > > Hii, > > > > What is the best way to index relational database, and how it impacts on > > the performance? > > > > Thanks > > Renuka Srishti > > >
solr index replace with index from another environment
Hi there, We are using solr-6.3.0 and have the need to replace the solr index in production with the solr index from another environment on periodical basis. But the jvms have to be recycled for the updated index to take effect. Is there any way this can be achieved without restarting the jvms? Using aliases as described below, there is an alternative, but I dont think it is useful in my case, where I have the index from other environment ready. If I build new collection and replace index, again, the jvms need to be restarted for the new index to take effect. https://stackoverflow.com/questions/45158394/replacing-old-indexed-data-with-new-data-in-apache-solr-with-zero-downtime Any other suggestions please. Thanks, satya
Re: Index relational database
Hello Renuka, I would suggest to start with your use case(s). May be start with your first use case with the below questions a) What is that you want to search (which fields like name, desc, city etc.) b) What is that you want to show part of search result (name, city etc.) Based on above two questions, you would know what data to pull in from relational database and create solr schema and index the data. You may first try to denormalize / flatten the structure so that you deal with one collection/schema and query upon it. HTH. Thanks, Susheel On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti wrote: > Hii, > > What is the best way to index relational database, and how it impacts on > the performance? > > Thanks > Renuka Srishti >
Index relational database
Hii, What is the best way to index relational database, and how it impacts on the performance? Thanks Renuka Srishti
Re: Correct approach to copy index between solr clouds?
write.lock is used whenever a core(replica) wants to, well, write to the index. Each individual replica is sure to only write to the index with one thread. If two threads were to write to an index, there's a very good chance the index will be corrupt, so it's a safeguard against two or more threads or processes writing to the same index at the same time. Since a dataDir can be pointed at an arbitrary directory, not only could two replicas point to the same index within the same Solr JVM, but you could have some completely different JVM, possibly even on a completely different machine point at the _same_ directory (this latter with any kind of shared filesystem). In the default case, Java's FileChannel.tryLock(); is used to acquire an exclusive lock. If two or more threads in the same JVM or two or more processes point to the same write.lock file one of the replicas will fail to open. So I mis-spoke. Just copying the write.lock file from one place to another along with all the rest of the index files should be OK. Since it's a new file in a new place, FileChannel.tryLock() can succeed. You still should be sure that the indexing is stopped on the source and a hard commit has been performed though. If you just copy from one to another while indexing is actively happening you might get a mismatched segments file. This last might need a bit of explanation. During normal indexing, new segment(s) are written to. On hard commit (or when background merging happens) once all the new segment(s) are successfully closed, the segments file is updated with a list of all of them. This, by the way, is how an indexSearcher has a "snapshot" of the directory as of the last commit; it reads the current segments file and opens a all the segments. Anyway, theoretically if you just copy the current index directory while indexing is going on, you could potentially have a mismatch between the truly closed segments and what has been written the segments file. This would be avoided by using fetchIndex since that's been hardened to handle this case, but being sure indexing is stopped would serve as well. Best, Erick On Sat, Aug 26, 2017 at 6:36 PM, Wei wrote: > Thanks Erick. Can you explain a bit more on the write.lock file? So far I > have been copying it over from B to A and haven't seen issue starting the > replica. > > On Sat, Aug 26, 2017 at 9:25 AM, Erick Erickson > wrote: > >> Approach 2 is sufficient. You do have to insure that you don't copy >> over the write.lock file however as you may not be able to start >> replicas if that's there. >> >> There's a relatively little-known third option. You an (ab)use the >> replication API "fetchindex" command, see: >> https://cwiki.apache.org/confluence/display/solr/Index+Replication to >> pull the index from Cloud B to replicas on Cloud A. That has the >> advantage of working even if you are actively indexing to Cloud B. >> NOTE: currently you cannot _query_ CloudA (the target) while the >> fetchindex is going on, but I doubt you really care since you were >> talking about having Cloud A offline anyway. So for each replica you >> fetch to you'll send the fetchindex command directly to the replica on >> Cloud A and the "masterURL" will be the corresponding replica on Cloud >> B. >> >> Finally, what I'd really do is _only_ have one replica for each shard >> on Cloud A active and fetch to _that_ replica. I'd also delete the >> data dir on all the other replicas for the shard on Cloud A. Then as >> you bring the additional replicas up they'll do a full synch from the >> leader. >> >> FWIW, >> Erick >> >> On Fri, Aug 25, 2017 at 6:53 PM, Wei wrote: >> > Hi, >> > >> > In our set up there are two solr clouds: >> > >> > Cloud A: production cloud serves both writes and reads >> > >> > Cloud B: back up cloud serves only writes >> > >> > Cloud A and B have the same shard configuration. >> > >> > Write requests are sent to both cloud A and B. In certain circumstances >> > when Cloud A's update lags behind, we want to bulk copy the binary index >> > from B to A. >> > >> > We have tried two approaches: >> > >> > Approach 1. >> > For cloud A: >> > a. delete collection to wipe out everything >> > b. create new collection (data is empty now) >> > c. shut down solr server >> > d. copy binary index from cloud B to corresponding shard replicas >> in >> > cloud A >> > e. start solr server >> > >> > Approach 2. >> > For cloud A: >> > a. shut down solr server >> > b. remove the whole 'data' folder under index/ in each replica >> > c. copy binary index from cloud B to corresponding shard replicas >> in >> > cloud A >> > d. start solr server >> > >> > Is approach 2 sufficient? I am wondering if delete/recreate collection >> > each time is necessary to get cloud into a "clean" state for copy binary >> > index between solr clouds. >> > >> > Thanks for your advice! >>
Re: Correct approach to copy index between solr clouds?
Thanks Erick. Can you explain a bit more on the write.lock file? So far I have been copying it over from B to A and haven't seen issue starting the replica. On Sat, Aug 26, 2017 at 9:25 AM, Erick Erickson wrote: > Approach 2 is sufficient. You do have to insure that you don't copy > over the write.lock file however as you may not be able to start > replicas if that's there. > > There's a relatively little-known third option. You an (ab)use the > replication API "fetchindex" command, see: > https://cwiki.apache.org/confluence/display/solr/Index+Replication to > pull the index from Cloud B to replicas on Cloud A. That has the > advantage of working even if you are actively indexing to Cloud B. > NOTE: currently you cannot _query_ CloudA (the target) while the > fetchindex is going on, but I doubt you really care since you were > talking about having Cloud A offline anyway. So for each replica you > fetch to you'll send the fetchindex command directly to the replica on > Cloud A and the "masterURL" will be the corresponding replica on Cloud > B. > > Finally, what I'd really do is _only_ have one replica for each shard > on Cloud A active and fetch to _that_ replica. I'd also delete the > data dir on all the other replicas for the shard on Cloud A. Then as > you bring the additional replicas up they'll do a full synch from the > leader. > > FWIW, > Erick > > On Fri, Aug 25, 2017 at 6:53 PM, Wei wrote: > > Hi, > > > > In our set up there are two solr clouds: > > > > Cloud A: production cloud serves both writes and reads > > > > Cloud B: back up cloud serves only writes > > > > Cloud A and B have the same shard configuration. > > > > Write requests are sent to both cloud A and B. In certain circumstances > > when Cloud A's update lags behind, we want to bulk copy the binary index > > from B to A. > > > > We have tried two approaches: > > > > Approach 1. > > For cloud A: > > a. delete collection to wipe out everything > > b. create new collection (data is empty now) > > c. shut down solr server > > d. copy binary index from cloud B to corresponding shard replicas > in > > cloud A > > e. start solr server > > > > Approach 2. > > For cloud A: > > a. shut down solr server > > b. remove the whole 'data' folder under index/ in each replica > > c. copy binary index from cloud B to corresponding shard replicas > in > > cloud A > > d. start solr server > > > > Is approach 2 sufficient? I am wondering if delete/recreate collection > > each time is necessary to get cloud into a "clean" state for copy binary > > index between solr clouds. > > > > Thanks for your advice! >
Re: Correct approach to copy index between solr clouds?
Approach 2 is sufficient. You do have to insure that you don't copy over the write.lock file however as you may not be able to start replicas if that's there. There's a relatively little-known third option. You an (ab)use the replication API "fetchindex" command, see: https://cwiki.apache.org/confluence/display/solr/Index+Replication to pull the index from Cloud B to replicas on Cloud A. That has the advantage of working even if you are actively indexing to Cloud B. NOTE: currently you cannot _query_ CloudA (the target) while the fetchindex is going on, but I doubt you really care since you were talking about having Cloud A offline anyway. So for each replica you fetch to you'll send the fetchindex command directly to the replica on Cloud A and the "masterURL" will be the corresponding replica on Cloud B. Finally, what I'd really do is _only_ have one replica for each shard on Cloud A active and fetch to _that_ replica. I'd also delete the data dir on all the other replicas for the shard on Cloud A. Then as you bring the additional replicas up they'll do a full synch from the leader. FWIW, Erick On Fri, Aug 25, 2017 at 6:53 PM, Wei wrote: > Hi, > > In our set up there are two solr clouds: > > Cloud A: production cloud serves both writes and reads > > Cloud B: back up cloud serves only writes > > Cloud A and B have the same shard configuration. > > Write requests are sent to both cloud A and B. In certain circumstances > when Cloud A's update lags behind, we want to bulk copy the binary index > from B to A. > > We have tried two approaches: > > Approach 1. > For cloud A: > a. delete collection to wipe out everything > b. create new collection (data is empty now) > c. shut down solr server > d. copy binary index from cloud B to corresponding shard replicas in > cloud A > e. start solr server > > Approach 2. > For cloud A: > a. shut down solr server > b. remove the whole 'data' folder under index/ in each replica > c. copy binary index from cloud B to corresponding shard replicas in > cloud A > d. start solr server > > Is approach 2 sufficient? I am wondering if delete/recreate collection > each time is necessary to get cloud into a "clean" state for copy binary > index between solr clouds. > > Thanks for your advice!
Correct approach to copy index between solr clouds?
Hi, In our set up there are two solr clouds: Cloud A: production cloud serves both writes and reads Cloud B: back up cloud serves only writes Cloud A and B have the same shard configuration. Write requests are sent to both cloud A and B. In certain circumstances when Cloud A's update lags behind, we want to bulk copy the binary index from B to A. We have tried two approaches: Approach 1. For cloud A: a. delete collection to wipe out everything b. create new collection (data is empty now) c. shut down solr server d. copy binary index from cloud B to corresponding shard replicas in cloud A e. start solr server Approach 2. For cloud A: a. shut down solr server b. remove the whole 'data' folder under index/ in each replica c. copy binary index from cloud B to corresponding shard replicas in cloud A d. start solr server Is approach 2 sufficient? I am wondering if delete/recreate collection each time is necessary to get cloud into a "clean" state for copy binary index between solr clouds. Thanks for your advice!
Re: Solr caching the index file make server refuse serving
10 billion documents on 12 cores is over 800M documents/shard at best. This is _very_ aggressive for a shard. Could you give more information about your setup? I've seen 250M docs fit in 12G memory. I've also seen 10M documents strain 32G of memory. Details matter a lot. The only way I've been able to determine what a reasonable number of docs with my queries on my data is to do "the sizing exercise", which I've outlined here: https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ While this was written over 5 years ago, it's still accurate. Best, Erick On Thu, Aug 24, 2017 at 6:10 PM, 陈永龙 wrote: > Hello, > > ENV: solrcloud 6.3 > > 3*dell server > > 128G 12cores 4.3T /server > > 3 solr node /server > > 20G /node (with parameter –m 20G) > > 10 billlion documents totle > > Problem: > > When we start solrcloud ,the cached index will make memory 98% or > more used . And if we continue to index document (batch commit 10 000 > documents),one or more server will refuse serving.Cannot login wia ssh,even > refuse the monitor. > > So,how can I limit the solr’s caching index to memory behavior? > > Anyone thanks! >
Solr caching the index file make server refuse serving
Hello, ENV: solrcloud 6.3 3*dell server 128G 12cores 4.3T /server 3 solr node /server 20G /node (with parameter �Cm 20G) 10 billlion documents totle Problem: When we start solrcloud ,the cached index will make memory 98% or more used . And if we continue to index document (batch commit 10 000 documents),one or more server will refuse serving.Cannot login wia ssh,even refuse the monitor. So,how can I limit the solr’s caching index to memory behavior? Anyone thanks!
Re: Move index directory to another partition
Thanks all for your commits. I followed Shawn steps (rsync) cause everything on that volume (ZooKeeper, Solr home and data) and everything went great. Thanks again, Mahmoud On Sun, Aug 6, 2017 at 12:47 AM, Erick Erickson wrote: > bq: I was envisioning a scenario where the entire solr home is on the old > volume that's going away. If I were setting up a Solr install where the > large/fast storage was a separate filesystem, I would put the solr home > (or possibly even the entire install) under that mount point. It would > be a lot easier than setting dataDir in core.properties for every core, > especially in a cloud install. > > Agreed. Nothing in what I said precludes this. If you don't specify > dataDir, > then the index for a new replica goes in the default place, i.e. under > your install > directory usually. In your case under your new mount point. I usually don't > recommend trying to take control of where dataDir points, just let it > default. > I only mentioned it so you'd be aware it exists. So if your new install > is associated with a bigger/better/larger EBS it's all automatic. > > bq: If the dataDir property is already in use to relocate index data, then > ADDREPLICA and DELETEREPLICA would be a great way to go. I would not > expect most SolrCloud users to use that method. > > I really don't understand this. Each Solr replica has an associated > dataDir whether you specified it or not (the default is relative to > the core.properties file). ADDREPLICA creates a new replica in a new > place, initially the data directory and index are empty. The new > replica goes into recovery and uses the standard replication process > to copy the index via HTTP from a healthy replica and write it to its > data directory. Once that's done, the replica becomes live. There's > nothing about dataDir already being in use here at all. > > When you start Solr there's the default place Solr expects to find the > replicas. This is not necessarily where Solr is executing from, see > the "-s" option in bin/solr start -s. > > If you're talking about using dataDir to point to an existing index, > yes that would be a problem and not something I meant to imply at all. > > Why wouldn't most SolrCloud users use ADDREPLICA/DELTEREPLICA? It's > commonly used to more replicas around a cluster. > > Best, > Erick > > On Fri, Aug 4, 2017 at 11:15 AM, Shawn Heisey wrote: > > On 8/2/2017 9:17 AM, Erick Erickson wrote: > >> Not entirely sure about AWS intricacies, but getting a new replica to > >> use a particular index directory in the general case is just > >> specifying dataDir=some_directory on the ADDREPLICA command. The index > >> just needs an HTTP connection (uses the old replication process) so > >> nothing huge there. Then DELETEREPLICA for the old one. There's > >> nothing that ZK has to know about to make this work, it's all local to > >> the Solr instance. > > > > I was envisioning a scenario where the entire solr home is on the old > > volume that's going away. If I were setting up a Solr install where the > > large/fast storage was a separate filesystem, I would put the solr home > > (or possibly even the entire install) under that mount point. It would > > be a lot easier than setting dataDir in core.properties for every core, > > especially in a cloud install. > > > > If the dataDir property is already in use to relocate index data, then > > ADDREPLICA and DELETEREPLICA would be a great way to go. I would not > > expect most SolrCloud users to use that method. > > > > Thanks, > > Shawn > > >
Building Solr index from AEM using and ELB
I am looking for lessons learned or problems seen when building a Solr index from AEM using a Solr cluster with content passing through an ELB. Our configuration is AEM 6.1 indexing to a cluster of Solr servers running version 4.7.1. When building an index with a smaller data set - 4 million items, AEM sends the content in about 3 minutes and the index is built without issue. When building an index for 14 million items, AEM sends the content in about 9 minutes. The Solr server error log records errors of EofException. When a single Solr server is used and the ELB bypassed, the index is built in about 1.75 hours with no errors. Thanks for your comments and suggestions. Pete
Re: Move index directory to another partition
bq: I was envisioning a scenario where the entire solr home is on the old volume that's going away. If I were setting up a Solr install where the large/fast storage was a separate filesystem, I would put the solr home (or possibly even the entire install) under that mount point. It would be a lot easier than setting dataDir in core.properties for every core, especially in a cloud install. Agreed. Nothing in what I said precludes this. If you don't specify dataDir, then the index for a new replica goes in the default place, i.e. under your install directory usually. In your case under your new mount point. I usually don't recommend trying to take control of where dataDir points, just let it default. I only mentioned it so you'd be aware it exists. So if your new install is associated with a bigger/better/larger EBS it's all automatic. bq: If the dataDir property is already in use to relocate index data, then ADDREPLICA and DELETEREPLICA would be a great way to go. I would not expect most SolrCloud users to use that method. I really don't understand this. Each Solr replica has an associated dataDir whether you specified it or not (the default is relative to the core.properties file). ADDREPLICA creates a new replica in a new place, initially the data directory and index are empty. The new replica goes into recovery and uses the standard replication process to copy the index via HTTP from a healthy replica and write it to its data directory. Once that's done, the replica becomes live. There's nothing about dataDir already being in use here at all. When you start Solr there's the default place Solr expects to find the replicas. This is not necessarily where Solr is executing from, see the "-s" option in bin/solr start -s. If you're talking about using dataDir to point to an existing index, yes that would be a problem and not something I meant to imply at all. Why wouldn't most SolrCloud users use ADDREPLICA/DELTEREPLICA? It's commonly used to more replicas around a cluster. Best, Erick On Fri, Aug 4, 2017 at 11:15 AM, Shawn Heisey wrote: > On 8/2/2017 9:17 AM, Erick Erickson wrote: >> Not entirely sure about AWS intricacies, but getting a new replica to >> use a particular index directory in the general case is just >> specifying dataDir=some_directory on the ADDREPLICA command. The index >> just needs an HTTP connection (uses the old replication process) so >> nothing huge there. Then DELETEREPLICA for the old one. There's >> nothing that ZK has to know about to make this work, it's all local to >> the Solr instance. > > I was envisioning a scenario where the entire solr home is on the old > volume that's going away. If I were setting up a Solr install where the > large/fast storage was a separate filesystem, I would put the solr home > (or possibly even the entire install) under that mount point. It would > be a lot easier than setting dataDir in core.properties for every core, > especially in a cloud install. > > If the dataDir property is already in use to relocate index data, then > ADDREPLICA and DELETEREPLICA would be a great way to go. I would not > expect most SolrCloud users to use that method. > > Thanks, > Shawn >
Re: Move index directory to another partition
On 8/2/2017 9:17 AM, Erick Erickson wrote: > Not entirely sure about AWS intricacies, but getting a new replica to > use a particular index directory in the general case is just > specifying dataDir=some_directory on the ADDREPLICA command. The index > just needs an HTTP connection (uses the old replication process) so > nothing huge there. Then DELETEREPLICA for the old one. There's > nothing that ZK has to know about to make this work, it's all local to > the Solr instance. I was envisioning a scenario where the entire solr home is on the old volume that's going away. If I were setting up a Solr install where the large/fast storage was a separate filesystem, I would put the solr home (or possibly even the entire install) under that mount point. It would be a lot easier than setting dataDir in core.properties for every core, especially in a cloud install. If the dataDir property is already in use to relocate index data, then ADDREPLICA and DELETEREPLICA would be a great way to go. I would not expect most SolrCloud users to use that method. Thanks, Shawn
Re: mixed index with commongrams
Haven't really looked much into that, here is a snipped form todays gc log, if you wouldn't mind shedding any details on it: 2017-08-03T11:46:16.265-0400: 3200938.383: [GC (Allocation Failure) 2017-08-03T11:46:16.265-0400: 3200938.383: [ParNew Desired survivor size 1966060336 bytes, new threshold 8 (max 8) - age 1: 128529184 bytes, 128529184 total - age 2: 43075632 bytes, 171604816 total - age 3: 64402592 bytes, 236007408 total - age 4: 35621704 bytes, 271629112 total - age 5: 44285584 bytes, 315914696 total - age 6: 45372512 bytes, 361287208 total - age 7: 41975368 bytes, 403262576 total - age 8: 72959688 bytes, 47664 total : 9133992K->577219K(1088K), 0.2730329 secs] 23200886K->14693007K(49066688K), 0.2732690 secs] [Times: user=2.01 sys=0.01, real=0.28 secs] Heap after GC invocations=12835 (full 109): par new generation total 1088K, used 577219K [0x7f802300, 0x7f833040, 0x7f833040) eden space 8533376K, 0% used [0x7f802300, 0x7f802300, 0x7f822bd6) from space 2133312K, 27% used [0x7f82ae0b, 0x7f82d1460d98, 0x7f833040) to space 2133312K, 0% used [0x7f822bd6, 0x7f822bd6, 0x7f82ae0b) concurrent mark-sweep generation total 3840K, used 14115788K [0x7f833040, 0x7f8c5800, 0x7f8c5800) Metaspace used 36698K, capacity 37169K, committed 37512K, reserved 38912K } On Thu, Aug 3, 2017 at 11:58 AM, Walter Underwood wrote: > How long are your GC pauses? Those affect all queries, so they make the > 99th percentile slow with queries that should be fast. > > The G1 collector has helped our 99th percentile. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > > On Aug 3, 2017, at 8:48 AM, David Hastings > wrote: > > > > Thanks, thats what i kind of expected. still debating whether the space > > increase is worth it, right now Im at .7% of searches taking longer than > 10 > > seconds, and 6% taking longer than 1, so when i see things like this in > the > > morning it bugs me a bit: > > > > 2017-08-02 11:50:48 : 58979/1000 secs : ("Rules of Practice for the > Courts > > of Equity of the United States") > > 2017-08-02 02:16:36 : 54749/1000 secs : ("The American Cause") > > 2017-08-02 19:27:58 : 54561/1000 secs : ("register of the department of > > justice") > > > > which could all be annihilated with CG's, at the expense, according to > HT, > > of a 40% increase in index size. > > > > > > > > On Thu, Aug 3, 2017 at 11:21 AM, Erick Erickson > > > wrote: > > > >> bq: will that search still return results form the earlier documents > >> as well as the new ones > >> > >> In a word, "no". By definition the analysis chain applied at index > >> time puts tokens in the index and that's all you have to search > >> against for the doc unless and until you re-index the document. > >> > >> You really have two choices here: > >> 1> live with the differing results until you get done re-indexing > >> 2> index to an offline collection and then use, say, collection > >> aliasing to make the switch atomically. > >> > >> Best, > >> Erick > >> > >> On Thu, Aug 3, 2017 at 8:07 AM, David Hastings > >> wrote: > >>> Hey all, I have yet to run an experiment to test this but was wondering > >> if > >>> anyone knows the answer ahead of time. > >>> If i have an index built with documents before implementing the > >> commongrams > >>> filter, then enable it, and start adding documents that have the > >>> filter/tokenizer applied, will searches that fit the criteria, for > >> example: > >>> "to be or not to be" > >>> will that search still return results form the earlier documents as > well > >> as > >>> the new ones? The idea is that a full re-index is going to be > difficult, > >>> so would rather do it over time by replacing large numbers of documents > >>> incrementally. Thanks, > >>> Dave > >> > >
Re: mixed index with commongrams
How long are your GC pauses? Those affect all queries, so they make the 99th percentile slow with queries that should be fast. The G1 collector has helped our 99th percentile. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 3, 2017, at 8:48 AM, David Hastings > wrote: > > Thanks, thats what i kind of expected. still debating whether the space > increase is worth it, right now Im at .7% of searches taking longer than 10 > seconds, and 6% taking longer than 1, so when i see things like this in the > morning it bugs me a bit: > > 2017-08-02 11:50:48 : 58979/1000 secs : ("Rules of Practice for the Courts > of Equity of the United States") > 2017-08-02 02:16:36 : 54749/1000 secs : ("The American Cause") > 2017-08-02 19:27:58 : 54561/1000 secs : ("register of the department of > justice") > > which could all be annihilated with CG's, at the expense, according to HT, > of a 40% increase in index size. > > > > On Thu, Aug 3, 2017 at 11:21 AM, Erick Erickson > wrote: > >> bq: will that search still return results form the earlier documents >> as well as the new ones >> >> In a word, "no". By definition the analysis chain applied at index >> time puts tokens in the index and that's all you have to search >> against for the doc unless and until you re-index the document. >> >> You really have two choices here: >> 1> live with the differing results until you get done re-indexing >> 2> index to an offline collection and then use, say, collection >> aliasing to make the switch atomically. >> >> Best, >> Erick >> >> On Thu, Aug 3, 2017 at 8:07 AM, David Hastings >> wrote: >>> Hey all, I have yet to run an experiment to test this but was wondering >> if >>> anyone knows the answer ahead of time. >>> If i have an index built with documents before implementing the >> commongrams >>> filter, then enable it, and start adding documents that have the >>> filter/tokenizer applied, will searches that fit the criteria, for >> example: >>> "to be or not to be" >>> will that search still return results form the earlier documents as well >> as >>> the new ones? The idea is that a full re-index is going to be difficult, >>> so would rather do it over time by replacing large numbers of documents >>> incrementally. Thanks, >>> Dave >>
Re: mixed index with commongrams
Thanks, thats what i kind of expected. still debating whether the space increase is worth it, right now Im at .7% of searches taking longer than 10 seconds, and 6% taking longer than 1, so when i see things like this in the morning it bugs me a bit: 2017-08-02 11:50:48 : 58979/1000 secs : ("Rules of Practice for the Courts of Equity of the United States") 2017-08-02 02:16:36 : 54749/1000 secs : ("The American Cause") 2017-08-02 19:27:58 : 54561/1000 secs : ("register of the department of justice") which could all be annihilated with CG's, at the expense, according to HT, of a 40% increase in index size. On Thu, Aug 3, 2017 at 11:21 AM, Erick Erickson wrote: > bq: will that search still return results form the earlier documents > as well as the new ones > > In a word, "no". By definition the analysis chain applied at index > time puts tokens in the index and that's all you have to search > against for the doc unless and until you re-index the document. > > You really have two choices here: > 1> live with the differing results until you get done re-indexing > 2> index to an offline collection and then use, say, collection > aliasing to make the switch atomically. > > Best, > Erick > > On Thu, Aug 3, 2017 at 8:07 AM, David Hastings > wrote: > > Hey all, I have yet to run an experiment to test this but was wondering > if > > anyone knows the answer ahead of time. > > If i have an index built with documents before implementing the > commongrams > > filter, then enable it, and start adding documents that have the > > filter/tokenizer applied, will searches that fit the criteria, for > example: > > "to be or not to be" > > will that search still return results form the earlier documents as well > as > > the new ones? The idea is that a full re-index is going to be difficult, > > so would rather do it over time by replacing large numbers of documents > > incrementally. Thanks, > > Dave >
Re: mixed index with commongrams
bq: will that search still return results form the earlier documents as well as the new ones In a word, "no". By definition the analysis chain applied at index time puts tokens in the index and that's all you have to search against for the doc unless and until you re-index the document. You really have two choices here: 1> live with the differing results until you get done re-indexing 2> index to an offline collection and then use, say, collection aliasing to make the switch atomically. Best, Erick On Thu, Aug 3, 2017 at 8:07 AM, David Hastings wrote: > Hey all, I have yet to run an experiment to test this but was wondering if > anyone knows the answer ahead of time. > If i have an index built with documents before implementing the commongrams > filter, then enable it, and start adding documents that have the > filter/tokenizer applied, will searches that fit the criteria, for example: > "to be or not to be" > will that search still return results form the earlier documents as well as > the new ones? The idea is that a full re-index is going to be difficult, > so would rather do it over time by replacing large numbers of documents > incrementally. Thanks, > Dave
mixed index with commongrams
Hey all, I have yet to run an experiment to test this but was wondering if anyone knows the answer ahead of time. If i have an index built with documents before implementing the commongrams filter, then enable it, and start adding documents that have the filter/tokenizer applied, will searches that fit the criteria, for example: "to be or not to be" will that search still return results form the earlier documents as well as the new ones? The idea is that a full re-index is going to be difficult, so would rather do it over time by replacing large numbers of documents incrementally. Thanks, Dave
Re: Custom Sort option to apply at SOLR index
I guess I don't see the problem, just store it as a string and sort on the field. # sorts before numbers which sort before characters. Or I'm reading the ASCII chart wrong. Best, Erick On Wed, Aug 2, 2017 at 6:55 AM, padmanabhan wrote: > Hello Solr Geeks, > > Am newbie to SOLR. I have a requirement as given below, Could any one please > provide some insights on how to go about on this. > > "Ascending by name" (#, 0 - 9, A - Z) > > "Descending by name" (Z - A, 9 - 0, #) > > Sample name value can be > > ABCD5678 > 1234ABCD > #2345ABCD > #1234ABCD > 5678ABCD > #2345ACBD > 5678EFGH > #2345DBCA > ABCD1234 > 1234#ABCD > > *Expected Ascending order* > > #2345ABCD > #2345ACBD > #2345DBCA > 1234#ABCD > 1234ABCD > 5678ABCD > ABCD1234 > ABCD5678 > > *Expected Descending order* > > ABCD5678 > ABCD1234 > 5678ABCD > 1234ABCD > 1234#ABCD > #2345DBCA > #2345ACBD > #2345ABCD > > Thanks & Regards, > Paddy > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Custom-Sort-option-to-apply-at-SOLR-index-tp4348787.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Move index directory to another partition
Shawn: Not entirely sure about AWS intricacies, but getting a new replica to use a particular index directory in the general case is just specifying dataDir=some_directory on the ADDREPLICA command. The index just needs an HTTP connection (uses the old replication process) so nothing huge there. Then DELETEREPLICA for the old one. There's nothing that ZK has to know about to make this work, it's all local to the Solr instance. Or I'm completely out in the weeds. Best, Erick On Tue, Aug 1, 2017 at 7:52 PM, Dave wrote: > To add to this, not sure of solr cloud uses it, but you're going to want to > destroy the wrote.lock file as well > >> On Aug 1, 2017, at 9:31 PM, Shawn Heisey wrote: >> >>> On 8/1/2017 7:09 PM, Erick Erickson wrote: >>> WARNING: what I currently understand about the limitations of AWS >>> could fill volumes so I might be completely out to lunch. >>> >>> If you ADDREPLICA with the new replica's data residing on the new EBS >>> volume, then wait for it to sync (which it'll do all by itself) then >>> DELETEREPLICA on the original you'll be all set. >>> >>> In recent Solr's, theres also the MOVENODE collections API call. >> >> I did consider mentioning that as a possible way forward, but I hate to >> rely on special configurations with core.properties, particularly if the >> newly built replica core instanceDirs aren't in the solr home (or >> coreRootDirectory) at all. I didn't want to try and explain the precise >> steps required to get that plan to work. I would expect to need some >> arcane Collections API work or manual ZK modification to reach a correct >> state -- steps that would be prone to error. >> >> The idea I mentioned seemed to me to be the way forward that would >> require the least specialized knowledge. Here's a simplified stating of >> the steps: >> >> * Mount the new volume somewhere. >> * Use multiple rsync passes to get the data copied. >> * Stop Solr. >> * Do a final rsync pass. >> * Unmount the original volume. >> * Remount the new volume in the original location. >> * Start Solr. >> >> Thanks, >> Shawn >>
RE: Solr Index issue on string type while querying
Thank you Matt for the reply. my apologize on the clarity about the problem statement. The problem was with the source attribute value defined at the source system. Source system with the heightSquareTube_string_mv: > 90 - 100 mm Solr index converts the xml or html code to its symbol equivalent. heightSquareTube_string_mv: > 90 - 100 mm -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Index-issue-on-string-type-while-querying-tp4335340p4348788.html Sent from the Solr - User mailing list archive at Nabble.com.
Custom Sort option to apply at SOLR index
Hello Solr Geeks, Am newbie to SOLR. I have a requirement as given below, Could any one please provide some insights on how to go about on this. "Ascending by name" (#, 0 - 9, A - Z) "Descending by name" (Z - A, 9 - 0, #) Sample name value can be ABCD5678 1234ABCD #2345ABCD #1234ABCD 5678ABCD #2345ACBD 5678EFGH #2345DBCA ABCD1234 1234#ABCD *Expected Ascending order* #2345ABCD #2345ACBD #2345DBCA 1234#ABCD 1234ABCD 5678ABCD ABCD1234 ABCD5678 *Expected Descending order* ABCD5678 ABCD1234 5678ABCD 1234ABCD 1234#ABCD #2345DBCA #2345ACBD #2345ABCD Thanks & Regards, Paddy -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-Sort-option-to-apply-at-SOLR-index-tp4348787.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Move index directory to another partition
To add to this, not sure of solr cloud uses it, but you're going to want to destroy the wrote.lock file as well > On Aug 1, 2017, at 9:31 PM, Shawn Heisey wrote: > >> On 8/1/2017 7:09 PM, Erick Erickson wrote: >> WARNING: what I currently understand about the limitations of AWS >> could fill volumes so I might be completely out to lunch. >> >> If you ADDREPLICA with the new replica's data residing on the new EBS >> volume, then wait for it to sync (which it'll do all by itself) then >> DELETEREPLICA on the original you'll be all set. >> >> In recent Solr's, theres also the MOVENODE collections API call. > > I did consider mentioning that as a possible way forward, but I hate to > rely on special configurations with core.properties, particularly if the > newly built replica core instanceDirs aren't in the solr home (or > coreRootDirectory) at all. I didn't want to try and explain the precise > steps required to get that plan to work. I would expect to need some > arcane Collections API work or manual ZK modification to reach a correct > state -- steps that would be prone to error. > > The idea I mentioned seemed to me to be the way forward that would > require the least specialized knowledge. Here's a simplified stating of > the steps: > > * Mount the new volume somewhere. > * Use multiple rsync passes to get the data copied. > * Stop Solr. > * Do a final rsync pass. > * Unmount the original volume. > * Remount the new volume in the original location. > * Start Solr. > > Thanks, > Shawn >
Re: Move index directory to another partition
On 8/1/2017 7:09 PM, Erick Erickson wrote: > WARNING: what I currently understand about the limitations of AWS > could fill volumes so I might be completely out to lunch. > > If you ADDREPLICA with the new replica's data residing on the new EBS > volume, then wait for it to sync (which it'll do all by itself) then > DELETEREPLICA on the original you'll be all set. > > In recent Solr's, theres also the MOVENODE collections API call. I did consider mentioning that as a possible way forward, but I hate to rely on special configurations with core.properties, particularly if the newly built replica core instanceDirs aren't in the solr home (or coreRootDirectory) at all. I didn't want to try and explain the precise steps required to get that plan to work. I would expect to need some arcane Collections API work or manual ZK modification to reach a correct state -- steps that would be prone to error. The idea I mentioned seemed to me to be the way forward that would require the least specialized knowledge. Here's a simplified stating of the steps: * Mount the new volume somewhere. * Use multiple rsync passes to get the data copied. * Stop Solr. * Do a final rsync pass. * Unmount the original volume. * Remount the new volume in the original location. * Start Solr. Thanks, Shawn
Re: Move index directory to another partition
WARNING: what I currently understand about the limitations of AWS could fill volumes so I might be completely out to lunch. If you ADDREPLICA with the new replica's data residing on the new EBS volume, then wait for it to sync (which it'll do all by itself) then DELETEREPLICA on the original you'll be all set. In recent Solr's, theres also the MOVENODE collections API call. Best, Erick On Tue, Aug 1, 2017 at 6:03 PM, Shawn Heisey wrote: > On 8/1/2017 4:00 PM, Mahmoud Almokadem wrote: >> I'm using ubuntu and I'll try rsync command. Unfortunately I'm using one >> replication factor but I think the downtime will be less than five minutes >> after following your steps. >> >> But how can I start Solr backup or why should I run it although I copied >> the index and changed theo path? >> >> And what do you mean with "Using multiple passes with rsync"? > > The first time you copy the data, which you could do with cp if you > want, the time required will be limited by the size of the data and the > speed of the disks. Depending on the size, it could take several hours > like you estimated. I would suggest using rsync for the first copy just > because you're going to need the same command again for the later passes. > > Doing a second pass with rsync should go very quickly. How fast would > depend on the rate that the index data is changing. You might need to > do this step more than once just so that it gets faster each time, in > preparation for the final pass. > > A final pass with rsync might only take a few seconds, and if Solr is > stopped before that final copy is started, then there's no way the index > data can change. > > Thanks, > Shawn >
Re: Move index directory to another partition
On 8/1/2017 4:00 PM, Mahmoud Almokadem wrote: > I'm using ubuntu and I'll try rsync command. Unfortunately I'm using one > replication factor but I think the downtime will be less than five minutes > after following your steps. > > But how can I start Solr backup or why should I run it although I copied > the index and changed theo path? > > And what do you mean with "Using multiple passes with rsync"? The first time you copy the data, which you could do with cp if you want, the time required will be limited by the size of the data and the speed of the disks. Depending on the size, it could take several hours like you estimated. I would suggest using rsync for the first copy just because you're going to need the same command again for the later passes. Doing a second pass with rsync should go very quickly. How fast would depend on the rate that the index data is changing. You might need to do this step more than once just so that it gets faster each time, in preparation for the final pass. A final pass with rsync might only take a few seconds, and if Solr is stopped before that final copy is started, then there's no way the index data can change. Thanks, Shawn
Re: Move index directory to another partition
Thanks Shawn, I'm using ubuntu and I'll try rsync command. Unfortunately I'm using one replication factor but I think the downtime will be less than five minutes after following your steps. But how can I start Solr backup or why should I run it although I copied the index and changed theo path? And what do you mean with "Using multiple passes with rsync"? Thanks, Mahmoud On Tuesday, August 1, 2017, Shawn Heisey wrote: > On 7/31/2017 12:28 PM, Mahmoud Almokadem wrote: > > I've a SolrCloud of four instances on Amazon and the EBS volumes that > > contain the data on everynode is going to be full, unfortunately Amazon > > doesn't support expanding the EBS. So, I'll attach larger EBS volumes to > > move the index to. > > > > I can stop the updates on the index, but I'm afraid to use "cp" command > to > > copy the files that are "on merge" operation. > > > > The copy operation may take several hours. > > > > How can I move the data directory without stopping the instance? > > Use rsync to do the copy. Do an initial copy while Solr is running, > then do a second copy, which should be pretty fast because rsync will > see the data from the first copy. Then shut Solr down and do a third > rsync which will only copy a VERY small changeset. Reconfigure Solr > and/or the OS to use the new location, and start Solr back up. Because > you mentioned "cp" I am assuming that you're NOT on Windows, and that > the OS will most likely allow you to do anything you need with index > files while Solr has them open. > > If you have set up your replicas with SolrCloud properly, then your > collections will not go offline when one Solr instance is shut down, and > that instance will be brought back into sync with the rest of the > cluster when it starts back up. Using multiple passes with rsync should > mean that Solr will not need to be shutdown for very long. > > The options I typically use for this kind of copy with rsync are "-avH > --delete". I would recommend that you research rsync options so that > you fully understand what I have suggested. > > Thanks, > Shawn > >
Re: Move index directory to another partition
Way back in the 1.x days, replication was done with shell scripts and rsync, right? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 1, 2017, at 2:45 PM, Shawn Heisey wrote: > > On 7/31/2017 12:28 PM, Mahmoud Almokadem wrote: >> I've a SolrCloud of four instances on Amazon and the EBS volumes that >> contain the data on everynode is going to be full, unfortunately Amazon >> doesn't support expanding the EBS. So, I'll attach larger EBS volumes to >> move the index to. >> >> I can stop the updates on the index, but I'm afraid to use "cp" command to >> copy the files that are "on merge" operation. >> >> The copy operation may take several hours. >> >> How can I move the data directory without stopping the instance? > > Use rsync to do the copy. Do an initial copy while Solr is running, > then do a second copy, which should be pretty fast because rsync will > see the data from the first copy. Then shut Solr down and do a third > rsync which will only copy a VERY small changeset. Reconfigure Solr > and/or the OS to use the new location, and start Solr back up. Because > you mentioned "cp" I am assuming that you're NOT on Windows, and that > the OS will most likely allow you to do anything you need with index > files while Solr has them open. > > If you have set up your replicas with SolrCloud properly, then your > collections will not go offline when one Solr instance is shut down, and > that instance will be brought back into sync with the rest of the > cluster when it starts back up. Using multiple passes with rsync should > mean that Solr will not need to be shutdown for very long. > > The options I typically use for this kind of copy with rsync are "-avH > --delete". I would recommend that you research rsync options so that > you fully understand what I have suggested. > > Thanks, > Shawn >
Re: Move index directory to another partition
On 7/31/2017 12:28 PM, Mahmoud Almokadem wrote: > I've a SolrCloud of four instances on Amazon and the EBS volumes that > contain the data on everynode is going to be full, unfortunately Amazon > doesn't support expanding the EBS. So, I'll attach larger EBS volumes to > move the index to. > > I can stop the updates on the index, but I'm afraid to use "cp" command to > copy the files that are "on merge" operation. > > The copy operation may take several hours. > > How can I move the data directory without stopping the instance? Use rsync to do the copy. Do an initial copy while Solr is running, then do a second copy, which should be pretty fast because rsync will see the data from the first copy. Then shut Solr down and do a third rsync which will only copy a VERY small changeset. Reconfigure Solr and/or the OS to use the new location, and start Solr back up. Because you mentioned "cp" I am assuming that you're NOT on Windows, and that the OS will most likely allow you to do anything you need with index files while Solr has them open. If you have set up your replicas with SolrCloud properly, then your collections will not go offline when one Solr instance is shut down, and that instance will be brought back into sync with the rest of the cluster when it starts back up. Using multiple passes with rsync should mean that Solr will not need to be shutdown for very long. The options I typically use for this kind of copy with rsync are "-avH --delete". I would recommend that you research rsync options so that you fully understand what I have suggested. Thanks, Shawn
Move index directory to another partition
Hello, I've a SolrCloud of four instances on Amazon and the EBS volumes that contain the data on everynode is going to be full, unfortunately Amazon doesn't support expanding the EBS. So, I'll attach larger EBS volumes to move the index to. I can stop the updates on the index, but I'm afraid to use "cp" command to copy the files that are "on merge" operation. The copy operation may take several hours. How can I move the data directory without stopping the instance? Thanks, Mahmoud
Re: index version - replicable versus searching
Ronald: Actually, people generally don't search on master ;). The idea is that master is configured for heavy indexing and then people search on the slaves which are configured for heavy query loads (e.g. memory, autowarming, whatever may be different). Which is it's own problem since the time the slaves poll won't necessarily be the exact same wall-clock time. SolrCloud doesn't use replication except in certain recovery scenarios. In normal operations, documents are forwarded to each replica and indexed separately on all nodes. That's about the only way to support Near Real Time. Best, Erick On Tue, Jul 25, 2017 at 9:39 AM, Stanonik, Ronald wrote: > Bingo! Right on both counts! opensearcher was false. When I changed it to > true, then I could see that master(searching) and master(replicable) both > changed. And autocommit.maxtime is causing a commit on the master. > > Who uses master(replicable)? It seems for my simple master/slave > configuration master(searching) is the relevant version. Maybe solr cloud > uses master(replicable)? > > Thanks, > > Ron > >
RE: index version - replicable versus searching
Bingo! Right on both counts! opensearcher was false. When I changed it to true, then I could see that master(searching) and master(replicable) both changed. And autocommit.maxtime is causing a commit on the master. Who uses master(replicable)? It seems for my simple master/slave configuration master(searching) is the relevant version. Maybe solr cloud uses master(replicable)? Thanks, Ron
Re: Lucene index corruption and recovery
Another sanity check. With deletion, only option would be to reindex those documents. Could someone please let me know if I am missing anything or if I am on track here. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-index-corruption-and-recovery-tp4347439p4347528.html Sent from the Solr - User mailing list archive at Nabble.com.
Lucene index corruption and recovery
While trying to upgrade 100G index from Solr 4 to 5, check index (actually updater) indicates that the index is corrupted. Hence, I ran check index to fix the index which showed broken segment warning and then deleted those documents. I then ran index update on the fixed index which upgraded fine without any error (need to setup Solr/ZK to test though). WARNING: 2 broken segments (containing 5 documents) detected Is there an easy way to figure out which documents (by ID) got deleted, or I need to compare document IDs in old and new index? Also, what does broken segments mean with respect to querying documents? Are those documents still searchable in corrupted index as long as the segments are not deleted? Note a few small test indexes had no issues with corruption or upgrade. Large index problem could be related to memory or network issues. Thanks in advance.
Re: index version - replicable versus searching
Actually, I'm surprised that the slave returns the new document and I suspect that there's actually a commit on the master, but no new searcher is being opened. On replication, the slave copies all _closed_ segments from the master whether or not they have been opened for searching. Hmmm, a little arcane. Here's a long blog on the subject: https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ But... Whenever you hard commit (often configured in solrconfig.xml) you have a choice whether opensearcher=true|false. _IF_ opensearcher=false, the current segment is closed but the docs are not searchable yet. When the slave does a replication, it copies all closed segments and opens a new searcher on them. So here's one possibility: 1> you added some docs on the master but your solrconfig has an autocommit setting that tripped in and has openSearcher=false. This closed all open segments (i.e. the segments with the new docs) 2> the slave replicated the closed segments and opened a new searcher on the index, so it shows the new docs 3> the master still hasn't opened a new searcher so continues to not be able to see the new documents. Is that possible? Erick On Mon, Jul 24, 2017 at 3:04 PM, Stanonik, Ronald wrote: > I'm testing replication on solr 5.5.0. > > I set up one master and one slave. > > The index versions match; that is, master(replicable), master(searching), and > slave(searching) are the same. > > I make a change to the index on the master, but do not commit yet. > > As expected, the version master(replicable) changes, but not > master(searching). > > If I "replicate now" on the slave, then slave(searching) matches > master(replicable), which seems wrong because the slave now returns answers > from master(replicable), while the master returns answers from > master(searching). > > Shouldn't the slave continue to return answers from master(searching), so > that master and slave return the same answers? > > What do I not understand? The documentation I found about replication > doesn't seem to explain in depth how the versions are affected by changes and > commit. > > Thanks, > > Ron
index version - replicable versus searching
I'm testing replication on solr 5.5.0. I set up one master and one slave. The index versions match; that is, master(replicable), master(searching), and slave(searching) are the same. I make a change to the index on the master, but do not commit yet. As expected, the version master(replicable) changes, but not master(searching). If I "replicate now" on the slave, then slave(searching) matches master(replicable), which seems wrong because the slave now returns answers from master(replicable), while the master returns answers from master(searching). Shouldn't the slave continue to return answers from master(searching), so that master and slave return the same answers? What do I not understand? The documentation I found about replication doesn't seem to explain in depth how the versions are affected by changes and commit. Thanks, Ron
Re: index new discovered fileds of different types
I think Thaer’s answer clarify how they do it. So at the time they assemble the full Solr doc to index, there may be a new field name not known in advance, but to my understanding the RDF source contains information on the type (else they could not do the mapping to dynamic field either) and so adding a field to the managed schema on the fly once an unknown field is detected should work just fine! -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 10. jul. 2017 kl. 02.08 skrev Rick Leir : > > Jan > > I hope this is not off-topic, but I am curious: if you do not use the three > fields, subject, predicate, and object for indexing RDF > then what is your algorithm? Maybe document nesting is appropriate for this? > cheers -- Rick > > > On 2017-07-09 05:52 PM, Jan Høydahl wrote: >> Hi, >> >> I have personally written a Python script to parse RDF files into an >> in-memory graph structure and then pull data from that structure to index to >> Solr. >> I.e. you may perfectly well have RDF (nt, turtle, whatever) as source but >> index sub structures in very specific ways. >> Anyway, as Erick points out, that’s probably where in your code that you >> should use Managed Schema REST API in order to >> 1. Query Solr for what fields are defined >> 2. If you need to index a field that is not yet in Solr, add it, using the >> correct field type (your app should know) >> 3. Push the data >> 4. Repeat >> >> -- >> Jan Høydahl, search solution architect >> Cominvent AS - www.cominvent.com >> >>> 8. jul. 2017 kl. 02.36 skrev Rick Leir : >>> >>> Thaer >>> Whoa, hold everything! You said RDF, meaning resource description >>> framework? If so, you have exactly three fields: subject, predicate, and >>> object. Maybe they are text type, or for exact matches you might want >>> string fields. Add an ID field, which could be automatically generated by >>> Solr, so now you have four fields. Or am I on a tangent again? Cheers -- >>> Rick >>> >>> On July 7, 2017 6:01:00 AM EDT, Thaer Sammar wrote: >>>> Hi Jan, >>>> >>>> Thanks!, I am exploring the schemaless option based on Furkan >>>> suggestion. I >>>> need the the flexibility because not all fields are known. We get the >>>> data >>>> from RDF database (which changes continuously). To be more specific, we >>>> have a database and all changes on it are sent to a kafka queue. and we >>>> have a consumer which listen to the queue and update the Solr index. >>>> >>>> regards, >>>> Thaer >>>> >>>> On 7 July 2017 at 10:53, Jan Høydahl wrote: >>>> >>>>> If you do not need the flexibility of dynamic fields, don’t use them. >>>>> Sounds to me that you really want a field “price” to be float and a >>>> field >>>>> “birthdate” to be of type date etc. >>>>> If so, simply create your schema (either manually, through Schema API >>>> or >>>>> using schemaless) up front and index each field as correct type >>>> without >>>>> messing with field name prefixes. >>>>> >>>>> -- >>>>> Jan Høydahl, search solution architect >>>>> Cominvent AS - www.cominvent.com >>>>> >>>>>> 5. jul. 2017 kl. 15.23 skrev Thaer Sammar : >>>>>> >>>>>> Hi, >>>>>> We are trying to index documents of different types. Document have >>>>> different fields. fields are known at indexing time. We run a query >>>> on a >>>>> database and we index what comes using query variables as field names >>>> in >>>>> solr. Our current solution: we use dynamic fields with prefix, for >>>> example >>>>> feature_i_*, the issue with that >>>>>> 1) we need to define the type of the dynamic field and to be able >>>> to >>>>> cover the type of discovered fields we define the following >>>>>> feature_i_* for integers, feature_t_* for string, feature_d_* for >>>>> double, >>>>>> 1.a) this means we need to check the type of the discovered field >>>> and >>>>> then put in the corresponding dynamic field >>>>>> 2) at search time, we need to know the right prefix >>>>>> We are looking for help to find away to ignore the prefix and check >>>> of >>>>> the type >>>>>> regards, >>>>>> Thaer >>>>> >>> -- >>> Sorry for being brief. Alternate email is rickleir at yahoo dot com >
Re: index new discovered fileds of different types
Hi Rick, yes the RDF structure has subject, predicate and object. The object data type is not only text, it can be integer or double as well or other data types. The structure of our solar document doesn't only contain these three fields. We compose one document per subject and we use all found objects as fields. Currently, in the schema we define two static fields uri (subject) and geo filed which contain the geographic point. When we find a message in the kafka queue, which means something change in the DB, we query DB to get all subject,predicate,object of the found subjects, based on that we create the document. For example, for subjects s1 and s2, we might get the following from the DB s1,geo,(latitude, longitude) s1,are,200.0 s1,type,office s2,geo,(latitude, longitude) for s1, there are more information available and we like to include it in the solr doc, therefore we used the dynamic filed feature_double_*, and feature_text_*. based on the object data type we add to appropriate dynamic field s1 (latitude,longitude) 200.0 office we appended the predicate name with dynamic filed prefix, and we used pdf data type to decide which dynamic filed to use regards, Thaer On 8 July 2017 at 02:36, Rick Leir wrote: > Thaer > Whoa, hold everything! You said RDF, meaning resource description > framework? If so, you have exactly three fields: subject, predicate, and > object. Maybe they are text type, or for exact matches you might want > string fields. Add an ID field, which could be automatically generated by > Solr, so now you have four fields. Or am I on a tangent again? Cheers -- > Rick > > On July 7, 2017 6:01:00 AM EDT, Thaer Sammar wrote: > >Hi Jan, > > > >Thanks!, I am exploring the schemaless option based on Furkan > >suggestion. I > >need the the flexibility because not all fields are known. We get the > >data > >from RDF database (which changes continuously). To be more specific, we > >have a database and all changes on it are sent to a kafka queue. and we > >have a consumer which listen to the queue and update the Solr index. > > > >regards, > >Thaer > > > >On 7 July 2017 at 10:53, Jan Høydahl wrote: > > > >> If you do not need the flexibility of dynamic fields, don’t use them. > >> Sounds to me that you really want a field “price” to be float and a > >field > >> “birthdate” to be of type date etc. > >> If so, simply create your schema (either manually, through Schema API > >or > >> using schemaless) up front and index each field as correct type > >without > >> messing with field name prefixes. > >> > >> -- > >> Jan Høydahl, search solution architect > >> Cominvent AS - www.cominvent.com > >> > >> > 5. jul. 2017 kl. 15.23 skrev Thaer Sammar : > >> > > >> > Hi, > >> > We are trying to index documents of different types. Document have > >> different fields. fields are known at indexing time. We run a query > >on a > >> database and we index what comes using query variables as field names > >in > >> solr. Our current solution: we use dynamic fields with prefix, for > >example > >> feature_i_*, the issue with that > >> > 1) we need to define the type of the dynamic field and to be able > >to > >> cover the type of discovered fields we define the following > >> > feature_i_* for integers, feature_t_* for string, feature_d_* for > >> double, > >> > 1.a) this means we need to check the type of the discovered field > >and > >> then put in the corresponding dynamic field > >> > 2) at search time, we need to know the right prefix > >> > We are looking for help to find away to ignore the prefix and check > >of > >> the type > >> > > >> > regards, > >> > Thaer > >> > >> > > -- > Sorry for being brief. Alternate email is rickleir at yahoo dot com
Re: index new discovered fileds of different types
Jan I hope this is not off-topic, but I am curious: if you do not use the three fields, subject, predicate, and object for indexing RDF then what is your algorithm? Maybe document nesting is appropriate for this? cheers -- Rick On 2017-07-09 05:52 PM, Jan Høydahl wrote: Hi, I have personally written a Python script to parse RDF files into an in-memory graph structure and then pull data from that structure to index to Solr. I.e. you may perfectly well have RDF (nt, turtle, whatever) as source but index sub structures in very specific ways. Anyway, as Erick points out, that’s probably where in your code that you should use Managed Schema REST API in order to 1. Query Solr for what fields are defined 2. If you need to index a field that is not yet in Solr, add it, using the correct field type (your app should know) 3. Push the data 4. Repeat -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 8. jul. 2017 kl. 02.36 skrev Rick Leir : Thaer Whoa, hold everything! You said RDF, meaning resource description framework? If so, you have exactly three fields: subject, predicate, and object. Maybe they are text type, or for exact matches you might want string fields. Add an ID field, which could be automatically generated by Solr, so now you have four fields. Or am I on a tangent again? Cheers -- Rick On July 7, 2017 6:01:00 AM EDT, Thaer Sammar wrote: Hi Jan, Thanks!, I am exploring the schemaless option based on Furkan suggestion. I need the the flexibility because not all fields are known. We get the data from RDF database (which changes continuously). To be more specific, we have a database and all changes on it are sent to a kafka queue. and we have a consumer which listen to the queue and update the Solr index. regards, Thaer On 7 July 2017 at 10:53, Jan Høydahl wrote: If you do not need the flexibility of dynamic fields, don’t use them. Sounds to me that you really want a field “price” to be float and a field “birthdate” to be of type date etc. If so, simply create your schema (either manually, through Schema API or using schemaless) up front and index each field as correct type without messing with field name prefixes. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 5. jul. 2017 kl. 15.23 skrev Thaer Sammar : Hi, We are trying to index documents of different types. Document have different fields. fields are known at indexing time. We run a query on a database and we index what comes using query variables as field names in solr. Our current solution: we use dynamic fields with prefix, for example feature_i_*, the issue with that 1) we need to define the type of the dynamic field and to be able to cover the type of discovered fields we define the following feature_i_* for integers, feature_t_* for string, feature_d_* for double, 1.a) this means we need to check the type of the discovered field and then put in the corresponding dynamic field 2) at search time, we need to know the right prefix We are looking for help to find away to ignore the prefix and check of the type regards, Thaer -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Re: index new discovered fileds of different types
Hi, I have personally written a Python script to parse RDF files into an in-memory graph structure and then pull data from that structure to index to Solr. I.e. you may perfectly well have RDF (nt, turtle, whatever) as source but index sub structures in very specific ways. Anyway, as Erick points out, that’s probably where in your code that you should use Managed Schema REST API in order to 1. Query Solr for what fields are defined 2. If you need to index a field that is not yet in Solr, add it, using the correct field type (your app should know) 3. Push the data 4. Repeat -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 8. jul. 2017 kl. 02.36 skrev Rick Leir : > > Thaer > Whoa, hold everything! You said RDF, meaning resource description framework? > If so, you have exactly three fields: subject, predicate, and object. Maybe > they are text type, or for exact matches you might want string fields. Add an > ID field, which could be automatically generated by Solr, so now you have > four fields. Or am I on a tangent again? Cheers -- Rick > > On July 7, 2017 6:01:00 AM EDT, Thaer Sammar wrote: >> Hi Jan, >> >> Thanks!, I am exploring the schemaless option based on Furkan >> suggestion. I >> need the the flexibility because not all fields are known. We get the >> data >> from RDF database (which changes continuously). To be more specific, we >> have a database and all changes on it are sent to a kafka queue. and we >> have a consumer which listen to the queue and update the Solr index. >> >> regards, >> Thaer >> >> On 7 July 2017 at 10:53, Jan Høydahl wrote: >> >>> If you do not need the flexibility of dynamic fields, don’t use them. >>> Sounds to me that you really want a field “price” to be float and a >> field >>> “birthdate” to be of type date etc. >>> If so, simply create your schema (either manually, through Schema API >> or >>> using schemaless) up front and index each field as correct type >> without >>> messing with field name prefixes. >>> >>> -- >>> Jan Høydahl, search solution architect >>> Cominvent AS - www.cominvent.com >>> >>>> 5. jul. 2017 kl. 15.23 skrev Thaer Sammar : >>>> >>>> Hi, >>>> We are trying to index documents of different types. Document have >>> different fields. fields are known at indexing time. We run a query >> on a >>> database and we index what comes using query variables as field names >> in >>> solr. Our current solution: we use dynamic fields with prefix, for >> example >>> feature_i_*, the issue with that >>>> 1) we need to define the type of the dynamic field and to be able >> to >>> cover the type of discovered fields we define the following >>>> feature_i_* for integers, feature_t_* for string, feature_d_* for >>> double, >>>> 1.a) this means we need to check the type of the discovered field >> and >>> then put in the corresponding dynamic field >>>> 2) at search time, we need to know the right prefix >>>> We are looking for help to find away to ignore the prefix and check >> of >>> the type >>>> >>>> regards, >>>> Thaer >>> >>> > > -- > Sorry for being brief. Alternate email is rickleir at yahoo dot com
Re: index new discovered fileds of different types
Thaer Whoa, hold everything! You said RDF, meaning resource description framework? If so, you have exactly three fields: subject, predicate, and object. Maybe they are text type, or for exact matches you might want string fields. Add an ID field, which could be automatically generated by Solr, so now you have four fields. Or am I on a tangent again? Cheers -- Rick On July 7, 2017 6:01:00 AM EDT, Thaer Sammar wrote: >Hi Jan, > >Thanks!, I am exploring the schemaless option based on Furkan >suggestion. I >need the the flexibility because not all fields are known. We get the >data >from RDF database (which changes continuously). To be more specific, we >have a database and all changes on it are sent to a kafka queue. and we >have a consumer which listen to the queue and update the Solr index. > >regards, >Thaer > >On 7 July 2017 at 10:53, Jan Høydahl wrote: > >> If you do not need the flexibility of dynamic fields, don’t use them. >> Sounds to me that you really want a field “price” to be float and a >field >> “birthdate” to be of type date etc. >> If so, simply create your schema (either manually, through Schema API >or >> using schemaless) up front and index each field as correct type >without >> messing with field name prefixes. >> >> -- >> Jan Høydahl, search solution architect >> Cominvent AS - www.cominvent.com >> >> > 5. jul. 2017 kl. 15.23 skrev Thaer Sammar : >> > >> > Hi, >> > We are trying to index documents of different types. Document have >> different fields. fields are known at indexing time. We run a query >on a >> database and we index what comes using query variables as field names >in >> solr. Our current solution: we use dynamic fields with prefix, for >example >> feature_i_*, the issue with that >> > 1) we need to define the type of the dynamic field and to be able >to >> cover the type of discovered fields we define the following >> > feature_i_* for integers, feature_t_* for string, feature_d_* for >> double, >> > 1.a) this means we need to check the type of the discovered field >and >> then put in the corresponding dynamic field >> > 2) at search time, we need to know the right prefix >> > We are looking for help to find away to ignore the prefix and check >of >> the type >> > >> > regards, >> > Thaer >> >> -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Re: index new discovered fileds of different types
I'd recommend "managed schema" rather than schemaless. They're related but distinct. The problem is that schemaless makes assumptions based on the first field it finds. So if it finds a field with a "1" in it, it guesses "int". That'll break if the next doc has a 1.0 since it doesn't parse to an int. Managed schema uses the same underlying mechanism to change the schema, it just let's you control exactly what gets changed. Best, Erick On Fri, Jul 7, 2017 at 3:01 AM, Thaer Sammar wrote: > Hi Jan, > > Thanks!, I am exploring the schemaless option based on Furkan suggestion. I > need the the flexibility because not all fields are known. We get the data > from RDF database (which changes continuously). To be more specific, we > have a database and all changes on it are sent to a kafka queue. and we > have a consumer which listen to the queue and update the Solr index. > > regards, > Thaer > > On 7 July 2017 at 10:53, Jan Høydahl wrote: > >> If you do not need the flexibility of dynamic fields, don’t use them. >> Sounds to me that you really want a field “price” to be float and a field >> “birthdate” to be of type date etc. >> If so, simply create your schema (either manually, through Schema API or >> using schemaless) up front and index each field as correct type without >> messing with field name prefixes. >> >> -- >> Jan Høydahl, search solution architect >> Cominvent AS - www.cominvent.com >> >> > 5. jul. 2017 kl. 15.23 skrev Thaer Sammar : >> > >> > Hi, >> > We are trying to index documents of different types. Document have >> different fields. fields are known at indexing time. We run a query on a >> database and we index what comes using query variables as field names in >> solr. Our current solution: we use dynamic fields with prefix, for example >> feature_i_*, the issue with that >> > 1) we need to define the type of the dynamic field and to be able to >> cover the type of discovered fields we define the following >> > feature_i_* for integers, feature_t_* for string, feature_d_* for >> double, >> > 1.a) this means we need to check the type of the discovered field and >> then put in the corresponding dynamic field >> > 2) at search time, we need to know the right prefix >> > We are looking for help to find away to ignore the prefix and check of >> the type >> > >> > regards, >> > Thaer >> >>
Re: index new discovered fileds of different types
Hi Jan, Thanks!, I am exploring the schemaless option based on Furkan suggestion. I need the the flexibility because not all fields are known. We get the data from RDF database (which changes continuously). To be more specific, we have a database and all changes on it are sent to a kafka queue. and we have a consumer which listen to the queue and update the Solr index. regards, Thaer On 7 July 2017 at 10:53, Jan Høydahl wrote: > If you do not need the flexibility of dynamic fields, don’t use them. > Sounds to me that you really want a field “price” to be float and a field > “birthdate” to be of type date etc. > If so, simply create your schema (either manually, through Schema API or > using schemaless) up front and index each field as correct type without > messing with field name prefixes. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > > 5. jul. 2017 kl. 15.23 skrev Thaer Sammar : > > > > Hi, > > We are trying to index documents of different types. Document have > different fields. fields are known at indexing time. We run a query on a > database and we index what comes using query variables as field names in > solr. Our current solution: we use dynamic fields with prefix, for example > feature_i_*, the issue with that > > 1) we need to define the type of the dynamic field and to be able to > cover the type of discovered fields we define the following > > feature_i_* for integers, feature_t_* for string, feature_d_* for > double, > > 1.a) this means we need to check the type of the discovered field and > then put in the corresponding dynamic field > > 2) at search time, we need to know the right prefix > > We are looking for help to find away to ignore the prefix and check of > the type > > > > regards, > > Thaer > >
Re: index new discovered fileds of different types
If you do not need the flexibility of dynamic fields, don’t use them. Sounds to me that you really want a field “price” to be float and a field “birthdate” to be of type date etc. If so, simply create your schema (either manually, through Schema API or using schemaless) up front and index each field as correct type without messing with field name prefixes. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 5. jul. 2017 kl. 15.23 skrev Thaer Sammar : > > Hi, > We are trying to index documents of different types. Document have different > fields. fields are known at indexing time. We run a query on a database and > we index what comes using query variables as field names in solr. Our current > solution: we use dynamic fields with prefix, for example feature_i_*, the > issue with that > 1) we need to define the type of the dynamic field and to be able to cover > the type of discovered fields we define the following > feature_i_* for integers, feature_t_* for string, feature_d_* for double, > 1.a) this means we need to check the type of the discovered field and then > put in the corresponding dynamic field > 2) at search time, we need to know the right prefix > We are looking for help to find away to ignore the prefix and check of the > type > > regards, > Thaer
Re: index new discovered fileds of different types
I really have no idea what "to ignore the prefix and check of the type" means. When? How? Can you give an example of inputs and outputs? You might want to review: https://wiki.apache.org/solr/UsingMailingLists And to add to what Furkan mentioned, in addition to schemaless you can use "managed schema" which will allow you to add fields and types on the fly. Best, Erick On Wed, Jul 5, 2017 at 8:12 AM, Thaer Sammar wrote: > Hi Furkan, > > No, In the schema we also defined some static fields such as uri and geo > field. > > On 5 July 2017 at 17:07, Furkan KAMACI wrote: > >> Hi Thaer, >> >> Do you use schemeless mode [1] ? >> >> Kind Regards, >> Furkan KAMACI >> >> [1] https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode >> >> On Wed, Jul 5, 2017 at 4:23 PM, Thaer Sammar wrote: >> >> > Hi, >> > We are trying to index documents of different types. Document have >> > different fields. fields are known at indexing time. We run a query on a >> > database and we index what comes using query variables as field names in >> > solr. Our current solution: we use dynamic fields with prefix, for >> example >> > feature_i_*, the issue with that >> > 1) we need to define the type of the dynamic field and to be able to >> cover >> > the type of discovered fields we define the following >> > feature_i_* for integers, feature_t_* for string, feature_d_* for double, >> > >> > 1.a) this means we need to check the type of the discovered field and >> then >> > put in the corresponding dynamic field >> > 2) at search time, we need to know the right prefix >> > We are looking for help to find away to ignore the prefix and check of >> the >> > type >> > >> > regards, >> > Thaer >>
Re: index new discovered fileds of different types
Hi Furkan, No, In the schema we also defined some static fields such as uri and geo field. On 5 July 2017 at 17:07, Furkan KAMACI wrote: > Hi Thaer, > > Do you use schemeless mode [1] ? > > Kind Regards, > Furkan KAMACI > > [1] https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode > > On Wed, Jul 5, 2017 at 4:23 PM, Thaer Sammar wrote: > > > Hi, > > We are trying to index documents of different types. Document have > > different fields. fields are known at indexing time. We run a query on a > > database and we index what comes using query variables as field names in > > solr. Our current solution: we use dynamic fields with prefix, for > example > > feature_i_*, the issue with that > > 1) we need to define the type of the dynamic field and to be able to > cover > > the type of discovered fields we define the following > > feature_i_* for integers, feature_t_* for string, feature_d_* for double, > > > > 1.a) this means we need to check the type of the discovered field and > then > > put in the corresponding dynamic field > > 2) at search time, we need to know the right prefix > > We are looking for help to find away to ignore the prefix and check of > the > > type > > > > regards, > > Thaer >
Re: index new discovered fileds of different types
Hi Thaer, Do you use schemeless mode [1] ? Kind Regards, Furkan KAMACI [1] https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode On Wed, Jul 5, 2017 at 4:23 PM, Thaer Sammar wrote: > Hi, > We are trying to index documents of different types. Document have > different fields. fields are known at indexing time. We run a query on a > database and we index what comes using query variables as field names in > solr. Our current solution: we use dynamic fields with prefix, for example > feature_i_*, the issue with that > 1) we need to define the type of the dynamic field and to be able to cover > the type of discovered fields we define the following > feature_i_* for integers, feature_t_* for string, feature_d_* for double, > > 1.a) this means we need to check the type of the discovered field and then > put in the corresponding dynamic field > 2) at search time, we need to know the right prefix > We are looking for help to find away to ignore the prefix and check of the > type > > regards, > Thaer
index new discovered fileds of different types
Hi, We are trying to index documents of different types. Document have different fields. fields are known at indexing time. We run a query on a database and we index what comes using query variables as field names in solr. Our current solution: we use dynamic fields with prefix, for example feature_i_*, the issue with that 1) we need to define the type of the dynamic field and to be able to cover the type of discovered fields we define the following feature_i_* for integers, feature_t_* for string, feature_d_* for double, 1.a) this means we need to check the type of the discovered field and then put in the corresponding dynamic field 2) at search time, we need to know the right prefix We are looking for help to find away to ignore the prefix and check of the type regards, Thaer
Re: Solr 6.4. Can't index MS Visio vsdx files
On 11/04/2017 20:48, Allison, Timothy B. wrote: It depends. We've been trying to make parsers more, erm, flexible, but there are some problems from which we cannot recover. Tl;dr there isn't a short answer. :( My sense is that DIH/ExtractingDocumentHandler is intended to get people up and running with Solr easily but it is not really a great idea for production. See Erick's gem: https://lucidworks.com/2012/02/14/indexing-with-solrj/ +1. Tika extraction should happen *outside* Solr in production. A colleague even wrote a simple wrapper for Tika to help build this sort of thing: https://github.com/mattflax/dropwizard-tika-server Charlie As for the Tika portion... at the very least, Tika _shouldn't_ cause the ingesting process to crash. At most, it should fail at the file level and not cause greater havoc. In practice, if you're processing millions of files from the wild, you'll run into bad behavior and need to defend against permanent hangs, oom, memory leaks. Also, at the least, if there's an exception with an embedded file, Tika should catch it and keep going with the rest of the file. If this doesn't happen let us know! We are aware that some types of embedded file stream problems were causing parse failures on the entire file, and we now catch those in Tika 1.15-SNAPSHOT and don't let them percolate up through the parent file (they're reported in the metadata though). Specifically for your stack traces: For your initial problem with the missing class exceptions -- I thought we used to catch those in docx and log them. I haven't been able to track this down, though. I can look more if you have a need. For "Caused by: org.apache.poi.POIXMLException: Invalid 'Row_Type' name 'PolylineTo' ", this problem might go away if we implemented a pure SAX parser for vsdx. We just did this for docx and pptx (coming in 1.15) and these are more robust to variation because they aren't requiring a match with the ooxml schema. I haven't looked much at vsdx, but that _might_ help. For "TODO Support v5 Pointers", this isn't supported and would require contributions. However, I agree that POI shouldn't throw a Runtime exception. Perhaps open an issue in POI, or maybe we should catch this special example at the Tika level? For "Caused by: java.lang.ArrayIndexOutOfBoundsException:", the POI team _might_ be able to modify the parser to ignore a stream if there's an exception, but that's often a sign that something needs to be fixed with the parser. In short, the solution will come from POI. Best, Tim -Original Message- From: Gytis Mikuciunas [mailto:gyt...@gmail.com] Sent: Tuesday, April 11, 2017 1:56 PM To: solr-user@lucene.apache.org Subject: RE: Solr 6.4. Can't index MS Visio vsdx files Thanks for your responses. Are there any posibilities to ignore parsing errors and continue indexing? because now solr/tika stops parsing whole document if it finds any exception On Apr 11, 2017 19:51, "Allison, Timothy B." wrote: You might want to drop a note to the dev or user's list on Apache POI. I'm not extremely familiar with the vsd(x) portion of our code base. The first item ("PolylineTo") may be caused by a mismatch btwn your doc and the ooxml spec. The second item appears to be an unsupported feature. The third item may be an area for improvement within our codebase...I can't tell just from the stacktrace. You'll probably get more helpful answers over on POI. Sorry, I can't help with this... Best, Tim P.S. 3.1. ooxml-schemas-1.3.jar instead of poi-ooxml-schemas-3.15.jar You shouldn't need both. Ooxml-schemas-1.3.jar should be a super set of poi-ooxml-schemas-3.15.jar --- This email has been checked for viruses by AVG. http://www.avg.com -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk
RE: Solr 6.4. Can't index MS Visio vsdx files
Sorry. Y, you'll have to update commons-compress to 1.14. -Original Message- From: Gytis Mikuciunas [mailto:gyt...@gmail.com] Sent: Monday, July 3, 2017 9:15 AM To: solr-user@lucene.apache.org Subject: Re: Solr 6.4. Can't index MS Visio vsdx files hi, So I'm back from my long vacations :) I'm trying to bring-up a fresh solr 6.6 standalone instance on windows 2012R2 server. Replaced: poi-*3.15-beta1 ---> poi-*3.16 tika-*1.13 ---> tika-*1.15 Tried to index one txt file and got (with poi and tika files that come out of the box, it indexes this txt file without errors): SimplePostTool: WARNING: Response: Error 500 Server Error HTTP ERROR 500 Problem accessing /solr/v20170703xxx/update/extract. Reason: Server ErrorCaused by:java.lang.NoClassDefFoundError: org/apache/commons/compress/archivers/ArchiveStreamProvider at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(Unknown Source) at java.security.SecureClassLoader.defineClass(Unknown Source) at java.net.URLClassLoader.defineClass(Unknown Source) at java.net.URLClassLoader.access$100(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.net.FactoryURLClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at org.apache.tika.parser.pkg.ZipContainerDetector.detectArchiveFormat(ZipContainerDetector.java:112) at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:83) at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:115) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceCo
Re: Solr 6.4. Can't index MS Visio vsdx files
hi, So I'm back from my long vacations :) I'm trying to bring-up a fresh solr 6.6 standalone instance on windows 2012R2 server. Replaced: poi-*3.15-beta1 ---> poi-*3.16 tika-*1.13 ---> tika-*1.15 Tried to index one txt file and got (with poi and tika files that come out of the box, it indexes this txt file without errors): SimplePostTool: WARNING: Response: Error 500 Server Error HTTP ERROR 500 Problem accessing /solr/v20170703xxx/update/extract. Reason: Server ErrorCaused by:java.lang.NoClassDefFoundError: org/apache/commons/compress/archivers/ArchiveStreamProvider at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(Unknown Source) at java.security.SecureClassLoader.defineClass(Unknown Source) at java.net.URLClassLoader.defineClass(Unknown Source) at java.net.URLClassLoader.access$100(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.net.FactoryURLClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at org.apache.tika.parser.pkg.ZipContainerDetector.detectArchiveFormat(ZipContainerDetector.java:112) at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:83) at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:115) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.ClassN
How to index binary files from ftp Servers using Solr DIH?
I need a way to index binary files from ftp servers, using UrlDataSource. I’m doing this locally but I need to do the same from remote sources (Ftp servers). I read a lot and I can’t find any example of indexing binary files from ftps. Is it possible to achieve that? How can I use Data Import Handler to index binary files from ftp servers? This is what I’m doing locally and I need your help to achieve the same requirements but from a remote ftp server https://i.stack.imgur.com/biSlL.jpg
Sharding of index data takes long time.
I am just trying to shard my index data of size 22GB(1.7M documents) into three shards. The total time for splitting takes about 7 hours. In used the same query that is mentioned in solr collections API. Is there anyway to do that quicker. Can i use REBALANCE API . is that secured?? Is there any benchmark that is available for sharding the index . -- View this message in context: http://lucene.472066.n3.nabble.com/Sharding-of-index-data-takes-long-time-tp4343029.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index 0, Size 0 - hashJoin Stream function Error
Ok, I'll take a look. Thanks! Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Jun 27, 2017 at 10:01 AM, Susheel Kumar wrote: > Hi Joel, > > I have submitted a patch to handle this. Please review. > > https://issues.apache.org/jira/secure/attachment/12874681/SOLR-10944.patch > > Thanks, > Susheel > > On Fri, Jun 23, 2017 at 12:32 PM, Susheel Kumar > wrote: > > > Thanks for confirming. Here is the JIRA > > > > https://issues.apache.org/jira/browse/SOLR-10944 > > > > On Fri, Jun 23, 2017 at 11:20 AM, Joel Bernstein > > wrote: > > > >> yeah, this looks like a bug in the get expression. > >> > >> Joel Bernstein > >> http://joelsolr.blogspot.com/ > >> > >> On Fri, Jun 23, 2017 at 11:07 AM, Susheel Kumar > >> wrote: > >> > >> > Hi Joel, > >> > > >> > As i am getting deeper, it doesn't look like a problem due to hashJoin > >> etc. > >> > > >> > > >> > Below is a simple let expr where if search would not find a match and > >> > return 0 result. In that case, I would expect get(a) to show a EOF > >> tuple > >> > while it is throwing exception. It looks like something wrong/bug in > the > >> > code. Please suggest > >> > > >> > === > >> > let(a=search(collection1, > >> > q=id:9, > >> > fl="id,business_email", > >> > sort="business_email asc"), > >> > get(a) > >> > ) > >> > > >> > > >> > { > >> > "result-set": { > >> > "docs": [ > >> > { > >> > "EXCEPTION": "Index: 0, Size: 0", > >> > "EOF": true, > >> > "RESPONSE_TIME": 8 > >> > } > >> > ] > >> > } > >> > } > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > On Fri, Jun 23, 2017 at 7:44 AM, Joel Bernstein > >> > wrote: > >> > > >> > > Ok, I hadn't anticipated some of the scenarios that you've been > trying > >> > out. > >> > > Particularly reading streams into variables and performing joins > >> etc... > >> > > > >> > > The main idea with variables was to use them with the new > statistical > >> > > evaluators. So you perform retrievals (search, random, nodes, knn > >> etc...) > >> > > set the results to variables and then perform statistical analysis. > >> > > > >> > > The problem with joining variables is that is doesn't scale very > well > >> > > because all the records are read into memory. Also the parallel > stream > >> > > won't work over variables. > >> > > > >> > > Joel Bernstein > >> > > http://joelsolr.blogspot.com/ > >> > > > >> > > On Thu, Jun 22, 2017 at 3:50 PM, Susheel Kumar < > susheel2...@gmail.com > >> > > >> > > wrote: > >> > > > >> > > > Hi Joel, > >> > > > > >> > > > I am able to reproduce this in a simple way. Looks like Let > Stream > >> is > >> > > > having some issues. Below complement function works fine if I > >> execute > >> > > > outside let and returns an EOF:true tuple but if a tuple with > >> EOF:true > >> > > > assigned to let variable, it gets changed to EXCEPTION "Index 0, > >> Size > >> > 0" > >> > > > etc. > >> > > > > >> > > > So let stream not able to handle the stream/results which has only > >> EOF > >> > > > tuple and breaks the whole let expression block > >> > > > > >> > > > > >> > > > ===Complement inside let > >> > > > let( > >> > > > a=echo(Hello), > >> > > > b=complement(sort(select(tuple(id=1,email="A"),id,email),by="id > >> > > asc,email > >> > > > asc"), > >> > > > sort(select(tuple(id=1,email="A"),id,email),by="id asc,email > asc"), > >> > > > on=&q
Re: Index 0, Size 0 - hashJoin Stream function Error
Hi Joel, I have submitted a patch to handle this. Please review. https://issues.apache.org/jira/secure/attachment/12874681/SOLR-10944.patch Thanks, Susheel On Fri, Jun 23, 2017 at 12:32 PM, Susheel Kumar wrote: > Thanks for confirming. Here is the JIRA > > https://issues.apache.org/jira/browse/SOLR-10944 > > On Fri, Jun 23, 2017 at 11:20 AM, Joel Bernstein > wrote: > >> yeah, this looks like a bug in the get expression. >> >> Joel Bernstein >> http://joelsolr.blogspot.com/ >> >> On Fri, Jun 23, 2017 at 11:07 AM, Susheel Kumar >> wrote: >> >> > Hi Joel, >> > >> > As i am getting deeper, it doesn't look like a problem due to hashJoin >> etc. >> > >> > >> > Below is a simple let expr where if search would not find a match and >> > return 0 result. In that case, I would expect get(a) to show a EOF >> tuple >> > while it is throwing exception. It looks like something wrong/bug in the >> > code. Please suggest >> > >> > === >> > let(a=search(collection1, >> > q=id:9, >> > fl="id,business_email", >> > sort="business_email asc"), >> > get(a) >> > ) >> > >> > >> > { >> > "result-set": { >> > "docs": [ >> > { >> > "EXCEPTION": "Index: 0, Size: 0", >> > "EOF": true, >> > "RESPONSE_TIME": 8 >> > } >> > ] >> > } >> > } >> > >> > >> > >> > >> > >> > >> > >> > On Fri, Jun 23, 2017 at 7:44 AM, Joel Bernstein >> > wrote: >> > >> > > Ok, I hadn't anticipated some of the scenarios that you've been trying >> > out. >> > > Particularly reading streams into variables and performing joins >> etc... >> > > >> > > The main idea with variables was to use them with the new statistical >> > > evaluators. So you perform retrievals (search, random, nodes, knn >> etc...) >> > > set the results to variables and then perform statistical analysis. >> > > >> > > The problem with joining variables is that is doesn't scale very well >> > > because all the records are read into memory. Also the parallel stream >> > > won't work over variables. >> > > >> > > Joel Bernstein >> > > http://joelsolr.blogspot.com/ >> > > >> > > On Thu, Jun 22, 2017 at 3:50 PM, Susheel Kumar > > >> > > wrote: >> > > >> > > > Hi Joel, >> > > > >> > > > I am able to reproduce this in a simple way. Looks like Let Stream >> is >> > > > having some issues. Below complement function works fine if I >> execute >> > > > outside let and returns an EOF:true tuple but if a tuple with >> EOF:true >> > > > assigned to let variable, it gets changed to EXCEPTION "Index 0, >> Size >> > 0" >> > > > etc. >> > > > >> > > > So let stream not able to handle the stream/results which has only >> EOF >> > > > tuple and breaks the whole let expression block >> > > > >> > > > >> > > > ===Complement inside let >> > > > let( >> > > > a=echo(Hello), >> > > > b=complement(sort(select(tuple(id=1,email="A"),id,email),by="id >> > > asc,email >> > > > asc"), >> > > > sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"), >> > > > on="id,email"), >> > > > c=get(b), >> > > > get(a) >> > > > ) >> > > > >> > > > Result >> > > > === >> > > > { >> > > > "result-set": { >> > > > "docs": [ >> > > > { >> > > > "EXCEPTION": "Index: 0, Size: 0", >> > > > "EOF": true, >> > > > "RESPONSE_TIME": 1 >> > > > } >> > > > ] >> > > > } >> > > > } >> > > > >> > > > ===Complement outside let >> > > > >> &g
Re: Index 0, Size 0 - hashJoin Stream function Error
Thanks for confirming. Here is the JIRA https://issues.apache.org/jira/browse/SOLR-10944 On Fri, Jun 23, 2017 at 11:20 AM, Joel Bernstein wrote: > yeah, this looks like a bug in the get expression. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Fri, Jun 23, 2017 at 11:07 AM, Susheel Kumar > wrote: > > > Hi Joel, > > > > As i am getting deeper, it doesn't look like a problem due to hashJoin > etc. > > > > > > Below is a simple let expr where if search would not find a match and > > return 0 result. In that case, I would expect get(a) to show a EOF tuple > > while it is throwing exception. It looks like something wrong/bug in the > > code. Please suggest > > > > === > > let(a=search(collection1, > > q=id:9, > > fl="id,business_email", > > sort="business_email asc"), > > get(a) > > ) > > > > > > { > > "result-set": { > > "docs": [ > > { > > "EXCEPTION": "Index: 0, Size: 0", > > "EOF": true, > > "RESPONSE_TIME": 8 > > } > > ] > > } > > } > > > > > > > > > > > > > > > > On Fri, Jun 23, 2017 at 7:44 AM, Joel Bernstein > > wrote: > > > > > Ok, I hadn't anticipated some of the scenarios that you've been trying > > out. > > > Particularly reading streams into variables and performing joins etc... > > > > > > The main idea with variables was to use them with the new statistical > > > evaluators. So you perform retrievals (search, random, nodes, knn > etc...) > > > set the results to variables and then perform statistical analysis. > > > > > > The problem with joining variables is that is doesn't scale very well > > > because all the records are read into memory. Also the parallel stream > > > won't work over variables. > > > > > > Joel Bernstein > > > http://joelsolr.blogspot.com/ > > > > > > On Thu, Jun 22, 2017 at 3:50 PM, Susheel Kumar > > > wrote: > > > > > > > Hi Joel, > > > > > > > > I am able to reproduce this in a simple way. Looks like Let Stream > is > > > > having some issues. Below complement function works fine if I > execute > > > > outside let and returns an EOF:true tuple but if a tuple with > EOF:true > > > > assigned to let variable, it gets changed to EXCEPTION "Index 0, Size > > 0" > > > > etc. > > > > > > > > So let stream not able to handle the stream/results which has only > EOF > > > > tuple and breaks the whole let expression block > > > > > > > > > > > > ===Complement inside let > > > > let( > > > > a=echo(Hello), > > > > b=complement(sort(select(tuple(id=1,email="A"),id,email),by="id > > > asc,email > > > > asc"), > > > > sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"), > > > > on="id,email"), > > > > c=get(b), > > > > get(a) > > > > ) > > > > > > > > Result > > > > === > > > > { > > > > "result-set": { > > > > "docs": [ > > > > { > > > > "EXCEPTION": "Index: 0, Size: 0", > > > > "EOF": true, > > > > "RESPONSE_TIME": 1 > > > > } > > > > ] > > > > } > > > > } > > > > > > > > ===Complement outside let > > > > > > > > complement(sort(select(tuple(id=1,email="A"),id,email),by="id > > asc,email > > > > asc"), > > > > sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"), > > > > on="id,email") > > > > > > > > Result > > > > === > > > > { "result-set": { "docs": [ { "EOF": true, "RESPONSE_TIME": 0 } ] } } > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jun 22, 2017 at 11:55 AM, Susheel Kumar < >
Re: Index 0, Size 0 - hashJoin Stream function Error
yeah, this looks like a bug in the get expression. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Jun 23, 2017 at 11:07 AM, Susheel Kumar wrote: > Hi Joel, > > As i am getting deeper, it doesn't look like a problem due to hashJoin etc. > > > Below is a simple let expr where if search would not find a match and > return 0 result. In that case, I would expect get(a) to show a EOF tuple > while it is throwing exception. It looks like something wrong/bug in the > code. Please suggest > > === > let(a=search(collection1, > q=id:9, > fl="id,business_email", > sort="business_email asc"), > get(a) > ) > > > { > "result-set": { > "docs": [ > { > "EXCEPTION": "Index: 0, Size: 0", > "EOF": true, > "RESPONSE_TIME": 8 > } > ] > } > } > > > > > > > > On Fri, Jun 23, 2017 at 7:44 AM, Joel Bernstein > wrote: > > > Ok, I hadn't anticipated some of the scenarios that you've been trying > out. > > Particularly reading streams into variables and performing joins etc... > > > > The main idea with variables was to use them with the new statistical > > evaluators. So you perform retrievals (search, random, nodes, knn etc...) > > set the results to variables and then perform statistical analysis. > > > > The problem with joining variables is that is doesn't scale very well > > because all the records are read into memory. Also the parallel stream > > won't work over variables. > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > On Thu, Jun 22, 2017 at 3:50 PM, Susheel Kumar > > wrote: > > > > > Hi Joel, > > > > > > I am able to reproduce this in a simple way. Looks like Let Stream is > > > having some issues. Below complement function works fine if I execute > > > outside let and returns an EOF:true tuple but if a tuple with EOF:true > > > assigned to let variable, it gets changed to EXCEPTION "Index 0, Size > 0" > > > etc. > > > > > > So let stream not able to handle the stream/results which has only EOF > > > tuple and breaks the whole let expression block > > > > > > > > > ===Complement inside let > > > let( > > > a=echo(Hello), > > > b=complement(sort(select(tuple(id=1,email="A"),id,email),by="id > > asc,email > > > asc"), > > > sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"), > > > on="id,email"), > > > c=get(b), > > > get(a) > > > ) > > > > > > Result > > > === > > > { > > > "result-set": { > > > "docs": [ > > > { > > > "EXCEPTION": "Index: 0, Size: 0", > > > "EOF": true, > > > "RESPONSE_TIME": 1 > > > } > > > ] > > > } > > > } > > > > > > ===Complement outside let > > > > > > complement(sort(select(tuple(id=1,email="A"),id,email),by="id > asc,email > > > asc"), > > > sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"), > > > on="id,email") > > > > > > Result > > > === > > > { "result-set": { "docs": [ { "EOF": true, "RESPONSE_TIME": 0 } ] } } > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jun 22, 2017 at 11:55 AM, Susheel Kumar > > > > wrote: > > > > > > > Sorry for typo > > > > > > > > Facing a weird behavior when using hashJoin / innerJoin etc. The > below > > > > expression display tuples from variable a shown below > > > > > > > > > > > > let(a=fetch(SMS,having(rollup(over=email, > > > > count(email), > > > > select(search(SMS, > > > > q=*:*, > > > > fl="id,dv_sv_business_email", > > > > sort="dv_sv_business_email asc"), > > > >id, > > > >dv_sv_business_email as email)), > > > > eq(count(e
Re: Index 0, Size 0 - hashJoin Stream function Error
Hi Joel, As i am getting deeper, it doesn't look like a problem due to hashJoin etc. Below is a simple let expr where if search would not find a match and return 0 result. In that case, I would expect get(a) to show a EOF tuple while it is throwing exception. It looks like something wrong/bug in the code. Please suggest === let(a=search(collection1, q=id:9, fl="id,business_email", sort="business_email asc"), get(a) ) { "result-set": { "docs": [ { "EXCEPTION": "Index: 0, Size: 0", "EOF": true, "RESPONSE_TIME": 8 } ] } } On Fri, Jun 23, 2017 at 7:44 AM, Joel Bernstein wrote: > Ok, I hadn't anticipated some of the scenarios that you've been trying out. > Particularly reading streams into variables and performing joins etc... > > The main idea with variables was to use them with the new statistical > evaluators. So you perform retrievals (search, random, nodes, knn etc...) > set the results to variables and then perform statistical analysis. > > The problem with joining variables is that is doesn't scale very well > because all the records are read into memory. Also the parallel stream > won't work over variables. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Thu, Jun 22, 2017 at 3:50 PM, Susheel Kumar > wrote: > > > Hi Joel, > > > > I am able to reproduce this in a simple way. Looks like Let Stream is > > having some issues. Below complement function works fine if I execute > > outside let and returns an EOF:true tuple but if a tuple with EOF:true > > assigned to let variable, it gets changed to EXCEPTION "Index 0, Size 0" > > etc. > > > > So let stream not able to handle the stream/results which has only EOF > > tuple and breaks the whole let expression block > > > > > > ===Complement inside let > > let( > > a=echo(Hello), > > b=complement(sort(select(tuple(id=1,email="A"),id,email),by="id > asc,email > > asc"), > > sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"), > > on="id,email"), > > c=get(b), > > get(a) > > ) > > > > Result > > === > > { > > "result-set": { > > "docs": [ > > { > > "EXCEPTION": "Index: 0, Size: 0", > > "EOF": true, > > "RESPONSE_TIME": 1 > > } > > ] > > } > > } > > > > ===Complement outside let > > > > complement(sort(select(tuple(id=1,email="A"),id,email),by="id asc,email > > asc"), > > sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"), > > on="id,email") > > > > Result > > === > > { "result-set": { "docs": [ { "EOF": true, "RESPONSE_TIME": 0 } ] } } > > > > > > > > > > > > > > > > > > On Thu, Jun 22, 2017 at 11:55 AM, Susheel Kumar > > wrote: > > > > > Sorry for typo > > > > > > Facing a weird behavior when using hashJoin / innerJoin etc. The below > > > expression display tuples from variable a shown below > > > > > > > > > let(a=fetch(SMS,having(rollup(over=email, > > > count(email), > > > select(search(SMS, > > > q=*:*, > > > fl="id,dv_sv_business_email", > > > sort="dv_sv_business_email asc"), > > >id, > > >dv_sv_business_email as email)), > > > eq(count(email),1)), > > > fl="id,dv_sv_business_email as email", > > > on="email=dv_sv_business_email"), > > > b=fetch(SMS,having(rollup(over=email, > > > count(email), > > > select(search(SMS, > > > q=*:*, > > > fl="id,dv_sv_personal_email", > > > sort="dv_sv_personal_email asc"), > > >id, > > >dv_sv_personal_email as email)), > > > eq(count(email),1)), > > > fl="id,dv_sv_personal_email as email", > > > on="email=dv_sv_personal_email"), > > > c=innerJoin(sort(get(a),by="email asc"),sort(get(b),by="email >
Re: Index 0, Size 0 - hashJoin Stream function Error
Ok, I hadn't anticipated some of the scenarios that you've been trying out. Particularly reading streams into variables and performing joins etc... The main idea with variables was to use them with the new statistical evaluators. So you perform retrievals (search, random, nodes, knn etc...) set the results to variables and then perform statistical analysis. The problem with joining variables is that is doesn't scale very well because all the records are read into memory. Also the parallel stream won't work over variables. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Jun 22, 2017 at 3:50 PM, Susheel Kumar wrote: > Hi Joel, > > I am able to reproduce this in a simple way. Looks like Let Stream is > having some issues. Below complement function works fine if I execute > outside let and returns an EOF:true tuple but if a tuple with EOF:true > assigned to let variable, it gets changed to EXCEPTION "Index 0, Size 0" > etc. > > So let stream not able to handle the stream/results which has only EOF > tuple and breaks the whole let expression block > > > ===Complement inside let > let( > a=echo(Hello), > b=complement(sort(select(tuple(id=1,email="A"),id,email),by="id asc,email > asc"), > sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"), > on="id,email"), > c=get(b), > get(a) > ) > > Result > === > { > "result-set": { > "docs": [ > { > "EXCEPTION": "Index: 0, Size: 0", > "EOF": true, > "RESPONSE_TIME": 1 > } > ] > } > } > > ===Complement outside let > > complement(sort(select(tuple(id=1,email="A"),id,email),by="id asc,email > asc"), > sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"), > on="id,email") > > Result > === > { "result-set": { "docs": [ { "EOF": true, "RESPONSE_TIME": 0 } ] } } > > > > > > > > > On Thu, Jun 22, 2017 at 11:55 AM, Susheel Kumar > wrote: > > > Sorry for typo > > > > Facing a weird behavior when using hashJoin / innerJoin etc. The below > > expression display tuples from variable a shown below > > > > > > let(a=fetch(SMS,having(rollup(over=email, > > count(email), > > select(search(SMS, > > q=*:*, > > fl="id,dv_sv_business_email", > > sort="dv_sv_business_email asc"), > >id, > >dv_sv_business_email as email)), > > eq(count(email),1)), > > fl="id,dv_sv_business_email as email", > > on="email=dv_sv_business_email"), > > b=fetch(SMS,having(rollup(over=email, > > count(email), > > select(search(SMS, > > q=*:*, > > fl="id,dv_sv_personal_email", > > sort="dv_sv_personal_email asc"), > >id, > >dv_sv_personal_email as email)), > > eq(count(email),1)), > > fl="id,dv_sv_personal_email as email", > > on="email=dv_sv_personal_email"), > > c=innerJoin(sort(get(a),by="email asc"),sort(get(b),by="email > > asc"),on="email"), > > #d=select(get(c),id,email), > > get(a) > > ) > > > > var a result > > == > > { > > "result-set": { > > "docs": [ > > { > > "count(email)": 1, > > "id": "1", > > "email": "A" > > }, > > { > > "count(email)": 1, > > "id": "2", > > "email": "C" > > }, > > { > > "EOF": true, > > "RESPONSE_TIME": 18 > > } > > ] > > } > > } > > > > And after uncomment var d above, even though we are displaying a, we get > > results shown below. I understand that join in my test data didn't find > any > > match but then it should not skew up the results of var a. When data > > matches during join then its fine but otherwise I am running into this > > issue and whole next expressions doesn't get evaluated due to this... > > > > > > after uncomment var d > > === > > { > > "re
Re: Index 0, Size 0 - hashJoin Stream function Error
Hi Joel, I am able to reproduce this in a simple way. Looks like Let Stream is having some issues. Below complement function works fine if I execute outside let and returns an EOF:true tuple but if a tuple with EOF:true assigned to let variable, it gets changed to EXCEPTION "Index 0, Size 0" etc. So let stream not able to handle the stream/results which has only EOF tuple and breaks the whole let expression block ===Complement inside let let( a=echo(Hello), b=complement(sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"), sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"), on="id,email"), c=get(b), get(a) ) Result === { "result-set": { "docs": [ { "EXCEPTION": "Index: 0, Size: 0", "EOF": true, "RESPONSE_TIME": 1 } ] } } ===Complement outside let complement(sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"), sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"), on="id,email") Result === { "result-set": { "docs": [ { "EOF": true, "RESPONSE_TIME": 0 } ] } } On Thu, Jun 22, 2017 at 11:55 AM, Susheel Kumar wrote: > Sorry for typo > > Facing a weird behavior when using hashJoin / innerJoin etc. The below > expression display tuples from variable a shown below > > > let(a=fetch(SMS,having(rollup(over=email, > count(email), > select(search(SMS, > q=*:*, > fl="id,dv_sv_business_email", > sort="dv_sv_business_email asc"), >id, >dv_sv_business_email as email)), > eq(count(email),1)), > fl="id,dv_sv_business_email as email", > on="email=dv_sv_business_email"), > b=fetch(SMS,having(rollup(over=email, > count(email), > select(search(SMS, > q=*:*, > fl="id,dv_sv_personal_email", > sort="dv_sv_personal_email asc"), >id, >dv_sv_personal_email as email)), > eq(count(email),1)), > fl="id,dv_sv_personal_email as email", > on="email=dv_sv_personal_email"), > c=innerJoin(sort(get(a),by="email asc"),sort(get(b),by="email > asc"),on="email"), > #d=select(get(c),id,email), > get(a) > ) > > var a result > == > { > "result-set": { > "docs": [ > { > "count(email)": 1, > "id": "1", > "email": "A" > }, > { > "count(email)": 1, > "id": "2", > "email": "C" > }, > { > "EOF": true, > "RESPONSE_TIME": 18 > } > ] > } > } > > And after uncomment var d above, even though we are displaying a, we get > results shown below. I understand that join in my test data didn't find any > match but then it should not skew up the results of var a. When data > matches during join then its fine but otherwise I am running into this > issue and whole next expressions doesn't get evaluated due to this... > > > after uncomment var d > === > { > "result-set": { > "docs": [ > { > "EXCEPTION": "Index: 0, Size: 0", > "EOF": true, > "RESPONSE_TIME": 44 > } > ] > } > } > > On Thu, Jun 22, 2017 at 11:51 AM, Susheel Kumar > wrote: > >> Hello Joel, >> >> Facing a weird behavior when using hashJoin / innerJoin etc. The below >> expression display tuples from variable a and the moment I use get on >> innerJoin / hashJoin expr on variable c >> >> >> let(a=fetch(SMS,having(rollup(over=email, >> count(email), >> select(search(SMS, >> q=*:*, >> fl="id,dv_sv_business_email", >> sort="dv_sv_business_email asc"), >>id, >>dv_sv_business_email as email)), >> eq(count(email),1)), >> fl="id,dv_sv_business_email as email", >> on="email=dv_sv_business_email"), >> b=fetch(SMS,having(rollup(over=email, >> count(email), >> select(search(SMS, >> q=*:*, >>
Re: Error after moving index
"They're just files, man". If you can afford a bit of down-time, you can shut your Solr down and recursively copy the data directory from your source to destingation. SCP, rsync, whatever then restart solr. Do take some care when copying between Windows and *nix that you do a _binary_ transfer. If you continually have a problem with transferring between Windows and *nix we'll have to investigate further. And I'm assuming this is stand-alone and there's only a single restore going on at a time. Best, Erick On Thu, Jun 22, 2017 at 9:13 AM, Moritz Michael wrote: > > > > > > > > > BTW, is there a better/recommended way to transfer an > index to another solr? > > > > > > > > > > On Thu, Jun 22, 2017 at 6:09 PM +0200, "Moritz Michael" < > moritz.mu...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > Hello Michael, > I used the backup functionality to create a snapshot and uploaded this > snapshot, so I feel it should be save. > I'll try it again. Maybe the copy operation wasn't successful. > BestMoritz > > > > _ > From: Michael Kuhlmann > Sent: Donnerstag, Juni 22, 2017 2:50 PM > Subject: Re: Error after moving index > To: > > > Hi Moritz, > > did you stop your local Solr sever before? Copying data from a running > instance may cause headaches. > > If yes, what happens if you copy everything again? It seems that your > copy operations wasn't successful. > > Best, > Michael > > Am 22.06.2017 um 14:37 schrieb Moritz Munte: > > Hello, > > > > > > > > I created an index on my local machine (Windows 10) and it works fine > there. > > > > After uploading the index to the production server (Linux), the server > shows > > an error: > . > > > > > > > > > >
Re: Error after moving index
Usually we index directly into Prod solr than copying from local/lower environments. If that works in your scenario, i would suggest to directly index into Prod than copying/restoring from local Windows env to Linux. On Thu, Jun 22, 2017 at 12:13 PM, Moritz Michael wrote: > > > > > > > > > BTW, is there a better/recommended way to transfer an > index to another solr? > > > > > > > > > > On Thu, Jun 22, 2017 at 6:09 PM +0200, "Moritz Michael" < > moritz.mu...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > Hello Michael, > I used the backup functionality to create a snapshot and uploaded this > snapshot, so I feel it should be save. > I'll try it again. Maybe the copy operation wasn't successful. > BestMoritz > > > > _ > From: Michael Kuhlmann > Sent: Donnerstag, Juni 22, 2017 2:50 PM > Subject: Re: Error after moving index > To: > > > Hi Moritz, > > did you stop your local Solr sever before? Copying data from a running > instance may cause headaches. > > If yes, what happens if you copy everything again? It seems that your > copy operations wasn't successful. > > Best, > Michael > > Am 22.06.2017 um 14:37 schrieb Moritz Munte: > > Hello, > > > > > > > > I created an index on my local machine (Windows 10) and it works fine > there. > > > > After uploading the index to the production server (Linux), the server > shows > > an error: > . > > > > > > > > > >
Re: Error after moving index
BTW, is there a better/recommended way to transfer an index to another solr? On Thu, Jun 22, 2017 at 6:09 PM +0200, "Moritz Michael" wrote: Hello Michael, I used the backup functionality to create a snapshot and uploaded this snapshot, so I feel it should be save. I'll try it again. Maybe the copy operation wasn't successful. BestMoritz _ From: Michael Kuhlmann Sent: Donnerstag, Juni 22, 2017 2:50 PM Subject: Re: Error after moving index To: Hi Moritz, did you stop your local Solr sever before? Copying data from a running instance may cause headaches. If yes, what happens if you copy everything again? It seems that your copy operations wasn't successful. Best, Michael Am 22.06.2017 um 14:37 schrieb Moritz Munte: > Hello, > > > > I created an index on my local machine (Windows 10) and it works fine there. > > After uploading the index to the production server (Linux), the server shows > an error: .