Re: FilterCache size should reduce as index grows?

2017-10-05 Thread Toke Eskildsen
On Wed, 2017-10-04 at 21:42 -0700, S G wrote:
> The bit-vectors in filterCache are as long as the maximum number of
> documents in a core. If there are a billion docs per core, every bit
> vector will have a billion bits making its size as 10 9 / 8 = 128 mb

The tricky part here is there are sparse (aka few hits) entries that
takes up less space. The 1 bit/hit is worst case.

This is both good and bad. The good part is of course that it saves
memory. The bad part is that it often means that people set the
filterCache size to a high number and that it works well, right until
a series of filters with many hits.

It seems that the memory limit option maxSizeMB was added in Solr 5.2:
https://issues.apache.org/jira/browse/SOLR-7372
I am not sure if it works with all caches in Solr, but in my world it
is way better to define the caches by memory instead of count.

> With such a big cache-value per entry,  the default value of 128
> values in will become 128x128mb = 16gb and would not be very good for
> a system running below 32 gb of memory.

Sure. The default values are just that. For an index with 1M documents
and a lot of different filters, 128 would probably be too low.

If someone were to create a well-researched set of config files for
different scenarios, it would be a welcome addition to our shared
knowledge pool.

> If such a use-case is anticipated, either the JVM's max memory be
> increased to beyond 40 gb or the filterCache size be reduced to 32.

Best solution: Use maxSizeMB (if it works)
Second best solution: Reduce to 32 or less
Third best, but often used, solution: Hope that most of the entries are
sparse and will remain so

- Toke Eskildsen, Royal Danish Library



FilterCache size should reduce as index grows?

2017-10-04 Thread S G
Hi,

Here is a discussion we had recently with a fellow Solr user.
It seems reasonable to me and wanted to see if this is an accepted theory.

The bit-vectors in filterCache are as long as the maximum number of
documents in a core. If there are a billion docs per core, every bit vector
will have a billion bits making its size as 10 9 / 8 = 128 mb
With such a big cache-value per entry,  the default value of 128 values in
will become 128x128mb = 16gb and would not be very good for a system
running below 32 gb of memory.

If such a use-case is anticipated, either the JVM's max memory be increased
to beyond 40 gb or the filterCache size be reduced to 32.

Thanks
SG


Re: How to Index JSON field Solr 5.3.2

2017-10-03 Thread Deeksha Sharma
Thanks Emir!


Deeksha Sharma
Software Engineer

215 2nd St #2,
San Francisco, CA 94105. United States
Desk:   6316817418
Mobile: +64 21 084 54203

dsha...@flexera.com
www.flexera.com<http://www.flexera.com>

CONFIDENTIALITY NOTICE: This email message (including any attachments) is for 
the sole use of the intended recipient and may contain proprietary, 
confidential, or trade secret information. Any unauthorized review, use, 
disclosure, or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply email and destroy all copies of 
the original message. Thank you .
If you no longer wish to receive emails from Flexera Software, you can manage 
your email preferences or unsubscribe 
here<https://info.flexerasoftware.com/Manage-Preferences>

From: Emir Arnautović 
Sent: Tuesday, October 3, 2017 12:58:57 AM
To: solr-user@lucene.apache.org
Subject: Re: How to Index JSON field Solr 5.3.2

Hi Sharma,
I guess you are looking for nested documents: 
https://lucene.apache.org/solr/guide/6_6/uploading-data-with-index-handlers.html#UploadingDatawithIndexHandlers-NestedChildDocuments
 
<https://lucene.apache.org/solr/guide/6_6/uploading-data-with-index-handlers.html#UploadingDatawithIndexHandlers-NestedChildDocuments>

It seems DIH supports it since versions 5.1: 
https://issues.apache.org/jira/browse/SOLR-5147 
<https://issues.apache.org/jira/browse/SOLR-5147>

Regards,
Emir

> On 2 Oct 2017, at 10:50, Deeksha Sharma  wrote:
>
> Hi everyone,
>
>
> I have created a core and index data in Solr using dataImportHandler.
>
>
> The schema for the core looks like this:
>
>  required="true"/>
>
> required="true"/>
>
>
>
> This is my data in mysql database:
>
>
> md5:"376463475574058bba96395bfb87"
>
>
> rules: 
> {"fileRules":[{"file_id":1321241,"md5":"376463475574058bba96395bfb87","group_id":69253,"filecdata1":{"versionId":3382858,"version":"1.2.1","detectionNotes":"Generated
>  from Ibiblio Maven2, see URL 
> (http://maven.ibiblio.org/maven2/sk/seges/acris/acris-security-hibernate).","texts":[{"shortText":null,"header":"Sample
>  from URL 
> (http://maven.ibiblio.org/maven2/sk/seges/acris/acris-os-parent/1.2.1/acris-os-parent-1.2.1.pom)","text":"
>The Apache Software License, Version 2.0
>http://www.apache.org/licenses/LICENSE-2.0.txt
>repo
> "}],"notes":[],"forge":"Ibiblio 
> Maven2"}}],"groupRules":[{"group_id":69253,"parent":-1,"component":"sk.seges.acris/acris-security-hibernate
>  - AcrIS Security with Hibernate metadata","license":"Apache 
> 2.0","groupcdata1":{"componentId":583560,"title":"sk.seges.acris/acris-security-hibernate
>  - Ibiblio 
> Maven2","licenseIds":[20],"priority":3,"url":"http://maven.ibiblio.org/maven2/sk/seges/acris/acris-security-hibernate","displayName":"AcrIS
>  Security with Hibernate 
> metadata","description":null,"texts":[],"notes":[],"forge":"Ibiblio 
> Maven2"}}]}
>
> Query results from Solr:
>
> { "responseHeader":{ "status":0, "QTime":0, "params":{ 
> "q":"md5:03bb576a6b6e001cd94e91ad4c29", "indent":"on", "wt":"json", 
> "_":"1506933082656"}}, "response":{"numFound":1,"start":0,"docs":[ { 
> "rules":"{\"fileRules\":[{\"file_id\":7328190,\"md5\":\"03bb576a6b6e001cd94e91ad4c29\",\"group_id\":241307,\"filecdata1\":{\"versionId\":15761972,\"version\":\"1.0.2\",\"detectionNotes\":null,\"texts\":[{\"shortText\":null,\"header\":\"The
>  following text is found at URL 
> (https://www.nuget.org/packages/HangFire.Redis/1.0.2)\",\"text\":\"License 
> details:\nLGPL-3.0\"}],\"notes\":[],\"forge\":\"NuGet 
> Gallery\"}}],\"groupRules\":[{\"group_id\":241307,\"parent\":-1,\"component\":\"HangFire.Redis\",\"license\":\"LGPL
>  
> 3.0\",\"groupcdata1\":{\"componentId\":3524318,\"title\":null,\"licenseIds\":[216],\"priority\":1,\"url\":\"https://www.nuget.org/p

Re: Keeping the index naturally ordered by some field

2017-10-02 Thread Erick Erickson
Have you looked at Streaming and Streaming Expressions? This is pretty
 much what they were built for.  Since you're talking a billion
documents, you're probably sharding anyway, in which case I'd guess
you're using SolrCloud.

That's what I'd be using first if at all possible.

Best,
Erick

On Mon, Oct 2, 2017 at 3:15 PM, alexpusch  wrote:
> The reason I'm interested in this is kind of unique. I'm writing a custom
> query parser and search component. These components go over the search
> results and perform some calculation over it. This calculation depends on
> input sorted by a certain value. In this scenario, regular solr sorting is
> insufficient as it's performed in post-search, and only collects needed rows
> to satisfy the query. The alternative for naturally sorted  index is to sort
> all the docs myself, and I wish to avoid this. I use docValues extensively,
> it really is a great help.
>
> Erick, I've tried using SortingMergePolicyFactory. It brings me close to my
> goal, but it's not quite there. The problem with this approach is that while
> each segment is sorted by itself there might be overlapping in ranges
> between the segments. For example, lets say that some query results lay in
> segments A, B, and C. Each one of the segments is sorted, so the docs coming
> from segment A will be sorted in the range 0-50, docs coming from segment B
> will be sorted in the range 20-70, and segment C will hold values in the
> 50-90 range. The query result will be 0-50,20-70, 50-90. Almost sorted, but
> not quite there.
>
> A helpful detail about my data is that the fields I'm interested in sorting
> the index by is a timestamp. Docs are indexed more or less in the correct
> order. As a result, if the merge policy I'm using will merge only
> consecutive segments, it should satisfy my need. TieredMergePolicy does
> merge non-consecutive segments so it's clearly a bad fit. I'm hoping to get
> some insight about some additional steps I may take so that
> SortingMergePolicyFactory could achieve perfection.
>
> Thanks!
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Keeping the index naturally ordered by some field

2017-10-02 Thread alexpusch
The reason I'm interested in this is kind of unique. I'm writing a custom
query parser and search component. These components go over the search
results and perform some calculation over it. This calculation depends on
input sorted by a certain value. In this scenario, regular solr sorting is
insufficient as it's performed in post-search, and only collects needed rows
to satisfy the query. The alternative for naturally sorted  index is to sort
all the docs myself, and I wish to avoid this. I use docValues extensively,
it really is a great help.

Erick, I've tried using SortingMergePolicyFactory. It brings me close to my
goal, but it's not quite there. The problem with this approach is that while
each segment is sorted by itself there might be overlapping in ranges
between the segments. For example, lets say that some query results lay in
segments A, B, and C. Each one of the segments is sorted, so the docs coming
from segment A will be sorted in the range 0-50, docs coming from segment B
will be sorted in the range 20-70, and segment C will hold values in the
50-90 range. The query result will be 0-50,20-70, 50-90. Almost sorted, but
not quite there. 

A helpful detail about my data is that the fields I'm interested in sorting
the index by is a timestamp. Docs are indexed more or less in the correct
order. As a result, if the merge policy I'm using will merge only
consecutive segments, it should satisfy my need. TieredMergePolicy does
merge non-consecutive segments so it's clearly a bad fit. I'm hoping to get
some insight about some additional steps I may take so that 
SortingMergePolicyFactory could achieve perfection. 

Thanks!



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: How to Index JSON field Solr 5.3.2

2017-10-02 Thread Emir Arnautović
Hi Sharma,
I guess you are looking for nested documents: 
https://lucene.apache.org/solr/guide/6_6/uploading-data-with-index-handlers.html#UploadingDatawithIndexHandlers-NestedChildDocuments
 
<https://lucene.apache.org/solr/guide/6_6/uploading-data-with-index-handlers.html#UploadingDatawithIndexHandlers-NestedChildDocuments>

It seems DIH supports it since versions 5.1: 
https://issues.apache.org/jira/browse/SOLR-5147 
<https://issues.apache.org/jira/browse/SOLR-5147>

Regards,
Emir

> On 2 Oct 2017, at 10:50, Deeksha Sharma  wrote:
> 
> Hi everyone,
> 
> 
> I have created a core and index data in Solr using dataImportHandler.
> 
> 
> The schema for the core looks like this:
> 
>  required="true"/>
> 
> required="true"/>
> 
> 
> 
> This is my data in mysql database:
> 
> 
> md5:"376463475574058bba96395bfb87"
> 
> 
> rules: 
> {"fileRules":[{"file_id":1321241,"md5":"376463475574058bba96395bfb87","group_id":69253,"filecdata1":{"versionId":3382858,"version":"1.2.1","detectionNotes":"Generated
>  from Ibiblio Maven2, see URL 
> (http://maven.ibiblio.org/maven2/sk/seges/acris/acris-security-hibernate).","texts":[{"shortText":null,"header":"Sample
>  from URL 
> (http://maven.ibiblio.org/maven2/sk/seges/acris/acris-os-parent/1.2.1/acris-os-parent-1.2.1.pom)","text":"
>The Apache Software License, Version 2.0
>http://www.apache.org/licenses/LICENSE-2.0.txt
>repo
> "}],"notes":[],"forge":"Ibiblio 
> Maven2"}}],"groupRules":[{"group_id":69253,"parent":-1,"component":"sk.seges.acris/acris-security-hibernate
>  - AcrIS Security with Hibernate metadata","license":"Apache 
> 2.0","groupcdata1":{"componentId":583560,"title":"sk.seges.acris/acris-security-hibernate
>  - Ibiblio 
> Maven2","licenseIds":[20],"priority":3,"url":"http://maven.ibiblio.org/maven2/sk/seges/acris/acris-security-hibernate","displayName":"AcrIS
>  Security with Hibernate 
> metadata","description":null,"texts":[],"notes":[],"forge":"Ibiblio 
> Maven2"}}]}
> 
> Query results from Solr:
> 
> { "responseHeader":{ "status":0, "QTime":0, "params":{ 
> "q":"md5:03bb576a6b6e001cd94e91ad4c29", "indent":"on", "wt":"json", 
> "_":"1506933082656"}}, "response":{"numFound":1,"start":0,"docs":[ { 
> "rules":"{\"fileRules\":[{\"file_id\":7328190,\"md5\":\"03bb576a6b6e001cd94e91ad4c29\",\"group_id\":241307,\"filecdata1\":{\"versionId\":15761972,\"version\":\"1.0.2\",\"detectionNotes\":null,\"texts\":[{\"shortText\":null,\"header\":\"The
>  following text is found at URL 
> (https://www.nuget.org/packages/HangFire.Redis/1.0.2)\",\"text\":\"License 
> details:\nLGPL-3.0\"}],\"notes\":[],\"forge\":\"NuGet 
> Gallery\"}}],\"groupRules\":[{\"group_id\":241307,\"parent\":-1,\"component\":\"HangFire.Redis\",\"license\":\"LGPL
>  
> 3.0\",\"groupcdata1\":{\"componentId\":3524318,\"title\":null,\"licenseIds\":[216],\"priority\":1,\"url\":\"https://www.nuget.org/packages/HangFire.Redis\",\"displayName\":\"Hangfire
>  Redis Storage [DEPRECATED]\",\"description\":\"DEPRECATED -- DO NOT INSTALL 
> OR UPDATE. Now shipped with Hangfire Pro, please read the \"Project site\" 
> (http://odinserj.net/2014/11/15/hangfire-pro/) for more 
> information.\",\"texts\":[{\"shortText\":null,\"header\":\"License details 
> history:\n(Refer to https://www.nuget.org/packages/HangFire.Redis and select 
> the desired version for more information)\",\"text\":\"LGPL-3.0 - (for 
> HangFire.Redis versions 0.7.0, 0.7.1, 0.7.3, 0.7.4, 0.7.5, 0.8.0, 0.8.1, 
> 0.8.2, 0.8.3, 0.9.0, 0.9.1, 1.0.1, 1.0.0, 1.0.2)\nNo information - (for 
> HangFire.Redis versions 1.1.1, 2.0.1, 
> 2.0.0)\"}],\"notes\":[{\"header\":null,\"text\":\"Project Site: 
> http://odinserj.net/2014/11/15/h

How to Index JSON field Solr 5.3.2

2017-10-02 Thread Deeksha Sharma
Hi everyone,


I have created a core and index data in Solr using dataImportHandler.


The schema for the core looks like this:







This is my data in mysql database:


md5:"376463475574058bba96395bfb87"


rules: 
{"fileRules":[{"file_id":1321241,"md5":"376463475574058bba96395bfb87","group_id":69253,"filecdata1":{"versionId":3382858,"version":"1.2.1","detectionNotes":"Generated
 from Ibiblio Maven2, see URL 
(http://maven.ibiblio.org/maven2/sk/seges/acris/acris-security-hibernate).","texts":[{"shortText":null,"header":"Sample
 from URL 
(http://maven.ibiblio.org/maven2/sk/seges/acris/acris-os-parent/1.2.1/acris-os-parent-1.2.1.pom)","text":"
The Apache Software License, Version 2.0
http://www.apache.org/licenses/LICENSE-2.0.txt
repo
"}],"notes":[],"forge":"Ibiblio 
Maven2"}}],"groupRules":[{"group_id":69253,"parent":-1,"component":"sk.seges.acris/acris-security-hibernate
 - AcrIS Security with Hibernate metadata","license":"Apache 
2.0","groupcdata1":{"componentId":583560,"title":"sk.seges.acris/acris-security-hibernate
 - Ibiblio 
Maven2","licenseIds":[20],"priority":3,"url":"http://maven.ibiblio.org/maven2/sk/seges/acris/acris-security-hibernate","displayName":"AcrIS
 Security with Hibernate 
metadata","description":null,"texts":[],"notes":[],"forge":"Ibiblio Maven2"}}]}

Query results from Solr:

{ "responseHeader":{ "status":0, "QTime":0, "params":{ 
"q":"md5:03bb576a6b6e001cd94e91ad4c29", "indent":"on", "wt":"json", 
"_":"1506933082656"}}, "response":{"numFound":1,"start":0,"docs":[ { 
"rules":"{\"fileRules\":[{\"file_id\":7328190,\"md5\":\"03bb576a6b6e001cd94e91ad4c29\",\"group_id\":241307,\"filecdata1\":{\"versionId\":15761972,\"version\":\"1.0.2\",\"detectionNotes\":null,\"texts\":[{\"shortText\":null,\"header\":\"The
 following text is found at URL 
(https://www.nuget.org/packages/HangFire.Redis/1.0.2)\",\"text\":\"License 
details:\nLGPL-3.0\"}],\"notes\":[],\"forge\":\"NuGet 
Gallery\"}}],\"groupRules\":[{\"group_id\":241307,\"parent\":-1,\"component\":\"HangFire.Redis\",\"license\":\"LGPL
 
3.0\",\"groupcdata1\":{\"componentId\":3524318,\"title\":null,\"licenseIds\":[216],\"priority\":1,\"url\":\"https://www.nuget.org/packages/HangFire.Redis\",\"displayName\":\"Hangfire
 Redis Storage [DEPRECATED]\",\"description\":\"DEPRECATED -- DO NOT INSTALL OR 
UPDATE. Now shipped with Hangfire Pro, please read the \"Project site\" 
(http://odinserj.net/2014/11/15/hangfire-pro/) for more 
information.\",\"texts\":[{\"shortText\":null,\"header\":\"License details 
history:\n(Refer to https://www.nuget.org/packages/HangFire.Redis and select 
the desired version for more information)\",\"text\":\"LGPL-3.0 - (for 
HangFire.Redis versions 0.7.0, 0.7.1, 0.7.3, 0.7.4, 0.7.5, 0.8.0, 0.8.1, 0.8.2, 
0.8.3, 0.9.0, 0.9.1, 1.0.1, 1.0.0, 1.0.2)\nNo information - (for HangFire.Redis 
versions 1.1.1, 2.0.1, 
2.0.0)\"}],\"notes\":[{\"header\":null,\"text\":\"Project Site: 
http://odinserj.net/2014/11/15/hangfire-pro\"},{\"header\":\"Previous Project 
Sites\",\"text\":\"https://github.com/odinserj/HangFire - (for Hangfire Redis 
Storage [DEPRECATED] version 0.7.0)\nhttp://hangfire.io - (for Hangfire Redis 
Storage [DEPRECATED] versions 0.7.1, 0.7.3, 0.7.4, 0.7.5, 0.8.0, 0.8.1, 0.8.2, 
0.8.3, 0.9.0, 0.9.1, 1.0.1, 1.0.0, 1.0.2, 1.1.1)\nNo information - (for 
Hangfire Redis Storage [DEPRECATED] versions 2.0.1, 
2.0.0)\"},{\"header\":\"License 
links\",\"text\":\"https://raw.github.com/odinserj/HangFire/master/COPYING.LESSER
 - (for HangFire.Redis version 
0.7.0)\nhttps://raw.github.com/odinserj/HangFire/master/LICENSE.md - (for 
HangFire.Redis versions 0.7.1, 0.7.3, 0.7.4, 0.7.5, 0.8.0, 0.8.1, 0.8.2, 0.8.3, 
0.9.0, 0.9.1)\nhttps://raw.github.com/odinserj/Hangfire/master/LICENSE.md - 
(for HangFire.Redis versions 1.0.1, 1.0.0, 
1.0.2)\nhttps://raw.github.com/HangfireIO/Hangfire/master/LICENSE.md - (for 
HangFire.Redis version 1.1.1)\nNo information - (for HangFire.Redis versions 
2.0.1, 2.0.0)\"}],\"forge\":\"NuGet Gallery\"}}]}", 
"md5":"03bb576a6b6e001cd94e91ad4c29", "_version_":1579807444777828352}] }}



Now when I receive the results from Solr query, it returns me the String for 
rules. How can I tell Solr to index rules as JSON and return a valid JSON 
instead of escaped String ?

Any help is greatly appreciated.
Thanks!



Re: Keeping the index naturally ordered by some field

2017-10-02 Thread alessandro.benedetti
Hi Alex,
just to explore a bit your question, why do you need that ?
Do you need to reduce query time ?
Have you tried enabling docValues for the fields of interest ?
Doc Values seem to me a pretty useful data structure when sorting is a
requirement.
I am curious to understand why that was not an option.

Regards



-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Keeping the index naturally ordered by some field

2017-10-01 Thread Erick Erickson
I think you're looking for SortingMergePolicyFactory, see:
https://issues.apache.org/jira/browse/SOLR-5730

The JIRA has some extensive discussion and the reference guide has an
example. It might take a little digging

Best,
Erick

On Sun, Oct 1, 2017 at 4:36 AM, Ahmet Arslan  wrote:
>
>
> Hi Alex,
>
> Lucene has this capability (borrowed from Nutch) under 
> org.apache.lucene.index.sorter package.I think it has been integrated into 
> Solr, but could not find the Jira issue.
>
> Ahmet
>
>
>  On Sunday, October 1, 2017, 10:22:45 AM GMT+3, alexpusch  
> wrote:
>
>
>
>
>
> Hello,
> We've got a pretty big index (~1B small docs). I'm interested in managing
> the index so that the search results would be naturally sorted by a certain
> numeric field, without specifying the actual sort field in query time.
>
> My first attempt was using SortingMergePolicyFactory. I've found that this
> provides only partial success. The results were occasionally sorted, but
> overall there where 'jumps' in the ordering.
>
> After some research I've found this excellent  blog post
> <http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html>
> that taught me that TieredMergePolicy merges non consecutive segments, and
> thus creating several segments with interlacing ordering. I've tried
> replacing the merge policy to LogByteSizeMergePolicy, but results are still
> inconsistent.
>
> The post is from 2011, and it's not clear to me whether today
> LogByteSizeMergePolicy merges only consecutive segments, or it can merge non
> consecutive segments as well.
>
> Is there an approach that will allow me achieve this goal?
>
> Solr version: 6.0
>
> Thanks, Alex.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Keeping the index naturally ordered by some field

2017-10-01 Thread Ahmet Arslan


Hi Alex,

Lucene has this capability (borrowed from Nutch) under 
org.apache.lucene.index.sorter package.I think it has been integrated into 
Solr, but could not find the Jira issue.

Ahmet
 
 
 On Sunday, October 1, 2017, 10:22:45 AM GMT+3, alexpusch  
wrote: 





Hello,
We've got a pretty big index (~1B small docs). I'm interested in managing
the index so that the search results would be naturally sorted by a certain
numeric field, without specifying the actual sort field in query time.

My first attempt was using SortingMergePolicyFactory. I've found that this
provides only partial success. The results were occasionally sorted, but
overall there where 'jumps' in the ordering.

After some research I've found this excellent  blog post
<http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html>
  
that taught me that TieredMergePolicy merges non consecutive segments, and
thus creating several segments with interlacing ordering. I've tried
replacing the merge policy to LogByteSizeMergePolicy, but results are still
inconsistent.

The post is from 2011, and it's not clear to me whether today
LogByteSizeMergePolicy merges only consecutive segments, or it can merge non
consecutive segments as well.

Is there an approach that will allow me achieve this goal?

Solr version: 6.0

Thanks, Alex.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Keeping the index naturally ordered by some field

2017-10-01 Thread alexpusch
Hello,
We've got a pretty big index (~1B small docs). I'm interested in managing
the index so that the search results would be naturally sorted by a certain
numeric field, without specifying the actual sort field in query time.

My first attempt was using SortingMergePolicyFactory. I've found that this
provides only partial success. The results were occasionally sorted, but
overall there where 'jumps' in the ordering.

After some research I've found this excellent  blog post
<http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html>
  
that taught me that TieredMergePolicy merges non consecutive segments, and
thus creating several segments with interlacing ordering. I've tried
replacing the merge policy to LogByteSizeMergePolicy, but results are still
inconsistent.

The post is from 2011, and it's not clear to me whether today
LogByteSizeMergePolicy merges only consecutive segments, or it can merge non
consecutive segments as well.

Is there an approach that will allow me achieve this goal?

Solr version: 6.0

Thanks, Alex.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr 7.0.0 -- can it use a 6.5.0 data repository (index)

2017-09-27 Thread Cassandra Targett
Regarding not finding the issue, JIRA has a problem with queries when
the user is not logged in (see also
https://jira.atlassian.com/browse/JRASERVER-38511 if you're interested
in the details). There's unfortunately not much we can do about it
besides manually edit issues to remove a security setting which gets
automatically added to issues when they are created (which I've now
done for SOLR-11406).

Your best bet in the future would be to log into JIRA before
initiating a search to be sure you aren't missing one that's "hidden"
inadvertently.

Cassandra

On Wed, Sep 27, 2017 at 1:39 PM, Wayne L. Johnson
 wrote:
> First, thanks for the quick response.  Yes, it sounds like the same problem!!
>
> I did a bunch of searching before repoting the issue, I didn't come across 
> that JIRA or I wouldn't have reported it.  My apologies for the duplication 
> (although it is a new JIRA).
>
> Is there a good place to start searching in the future?  I'm a fairly 
> experiences Solr user, and I don't mind slogging through Java code.
>
> Meanwhile I'll follow the JIRA so I know when it gets fixed.
>
> Thanks!!
>
> -Original Message-
> From: Stefan Matheis [mailto:matheis.ste...@gmail.com]
> Sent: Wednesday, September 27, 2017 12:32 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 7.0.0 -- can it use a 6.5.0 data repository (index)
>
> That sounds like 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D11406&d=DwIFaQ&c=z0adcvxXWKG6LAMN6dVEqQ&r=4gLDKHTqOXldY2aQti2VNXYWPtqa1bUKE6MA9VrIJfU&m=iYU948dQo6G0tKFQUguY6SHOZNZoCOEAEv1sCf4ukcA&s=HvPPQL--s3bFtNyBdUiz1hNIqfLEVrb4Cu-HIC71dKY&e=
>   if i'm not mistaken?
>
> -Stefan
>
> On Sep 27, 2017 8:20 PM, "Wayne L. Johnson" 
> wrote:
>
>> I’m testing Solr 7.0.0.  When I start with an empty index, Solr comes
>> up just fine, I can add documents and query documents.  However when I
>> start with an already-populated set of documents (from 6.5.0), Solr
>> will not start.  The relevant portion of the traceback seems to be:
>>
>> Caused by: java.lang.NullPointerException
>>
>> at java.util.Objects.requireNonNull(Objects.java:203)
>>
>> …
>>
>> at java.util.stream.ReferencePipeline.reduce(
>> ReferencePipeline.java:479)
>>
>> at org.apache.solr.index.SlowCompositeReaderWrapper.(
>> SlowCompositeReaderWrapper.java:76)
>>
>> at org.apache.solr.index.SlowCompositeReaderWrapper.wrap(
>> SlowCompositeReaderWrapper.java:57)
>>
>> at org.apache.solr.search.SolrIndexSearcher.(
>> SolrIndexSearcher.java:252)
>>
>> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:
>> 2034)
>>
>> ... 12 more
>>
>>
>>
>> In looking at the de-compiled code (SlowCompositeReaderWrapper), lines
>> 72-77, and it appears that one or more “leaf” files doesn’t have a
>> “min-version” set.  That’s a guess.  If so, does this mean Solr 7.0.0
>> can’t read a 6.5.0 index?
>>
>>
>>
>> Thanks
>>
>>
>>
>> Wayne Johnson
>>
>> 801-240-4024
>>
>> wjohnson...@ldschurch.org
>>
>> [image: familysearch2.JPG]
>>
>>
>>


RE: Solr 7.0.0 -- can it use a 6.5.0 data repository (index)

2017-09-27 Thread Wayne L. Johnson
First, thanks for the quick response.  Yes, it sounds like the same problem!!

I did a bunch of searching before repoting the issue, I didn't come across that 
JIRA or I wouldn't have reported it.  My apologies for the duplication 
(although it is a new JIRA).

Is there a good place to start searching in the future?  I'm a fairly 
experiences Solr user, and I don't mind slogging through Java code.

Meanwhile I'll follow the JIRA so I know when it gets fixed.

Thanks!!

-Original Message-
From: Stefan Matheis [mailto:matheis.ste...@gmail.com] 
Sent: Wednesday, September 27, 2017 12:32 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 7.0.0 -- can it use a 6.5.0 data repository (index)

That sounds like 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D11406&d=DwIFaQ&c=z0adcvxXWKG6LAMN6dVEqQ&r=4gLDKHTqOXldY2aQti2VNXYWPtqa1bUKE6MA9VrIJfU&m=iYU948dQo6G0tKFQUguY6SHOZNZoCOEAEv1sCf4ukcA&s=HvPPQL--s3bFtNyBdUiz1hNIqfLEVrb4Cu-HIC71dKY&e=
  if i'm not mistaken?

-Stefan

On Sep 27, 2017 8:20 PM, "Wayne L. Johnson" 
wrote:

> I’m testing Solr 7.0.0.  When I start with an empty index, Solr comes 
> up just fine, I can add documents and query documents.  However when I 
> start with an already-populated set of documents (from 6.5.0), Solr 
> will not start.  The relevant portion of the traceback seems to be:
>
> Caused by: java.lang.NullPointerException
>
> at java.util.Objects.requireNonNull(Objects.java:203)
>
> …
>
> at java.util.stream.ReferencePipeline.reduce(
> ReferencePipeline.java:479)
>
> at org.apache.solr.index.SlowCompositeReaderWrapper.(
> SlowCompositeReaderWrapper.java:76)
>
> at org.apache.solr.index.SlowCompositeReaderWrapper.wrap(
> SlowCompositeReaderWrapper.java:57)
>
> at org.apache.solr.search.SolrIndexSearcher.(
> SolrIndexSearcher.java:252)
>
> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:
> 2034)
>
> ... 12 more
>
>
>
> In looking at the de-compiled code (SlowCompositeReaderWrapper), lines 
> 72-77, and it appears that one or more “leaf” files doesn’t have a 
> “min-version” set.  That’s a guess.  If so, does this mean Solr 7.0.0 
> can’t read a 6.5.0 index?
>
>
>
> Thanks
>
>
>
> Wayne Johnson
>
> 801-240-4024
>
> wjohnson...@ldschurch.org
>
> [image: familysearch2.JPG]
>
>
>


Re: Solr 7.0.0 -- can it use a 6.5.0 data repository (index)

2017-09-27 Thread Stefan Matheis
That sounds like https://issues.apache.org/jira/browse/SOLR-11406 if i'm
not mistaken?

-Stefan

On Sep 27, 2017 8:20 PM, "Wayne L. Johnson" 
wrote:

> I’m testing Solr 7.0.0.  When I start with an empty index, Solr comes up
> just fine, I can add documents and query documents.  However when I start
> with an already-populated set of documents (from 6.5.0), Solr will not
> start.  The relevant portion of the traceback seems to be:
>
> Caused by: java.lang.NullPointerException
>
> at java.util.Objects.requireNonNull(Objects.java:203)
>
> …
>
> at java.util.stream.ReferencePipeline.reduce(
> ReferencePipeline.java:479)
>
> at org.apache.solr.index.SlowCompositeReaderWrapper.(
> SlowCompositeReaderWrapper.java:76)
>
> at org.apache.solr.index.SlowCompositeReaderWrapper.wrap(
> SlowCompositeReaderWrapper.java:57)
>
> at org.apache.solr.search.SolrIndexSearcher.(
> SolrIndexSearcher.java:252)
>
> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:
> 2034)
>
> ... 12 more
>
>
>
> In looking at the de-compiled code (SlowCompositeReaderWrapper), lines
> 72-77, and it appears that one or more “leaf” files doesn’t have a
> “min-version” set.  That’s a guess.  If so, does this mean Solr 7.0.0 can’t
> read a 6.5.0 index?
>
>
>
> Thanks
>
>
>
> Wayne Johnson
>
> 801-240-4024
>
> wjohnson...@ldschurch.org
>
> [image: familysearch2.JPG]
>
>
>


Solr 7.0.0 -- can it use a 6.5.0 data repository (index)

2017-09-27 Thread Wayne L. Johnson

I'm testing Solr 7.0.0.  When I start with an empty index, Solr comes up just 
fine, I can add documents and query documents.  However when I start with an 
already-populated set of documents (from 6.5.0), Solr will not start.  The 
relevant portion of the traceback seems to be:
Caused by: java.lang.NullPointerException
at java.util.Objects.requireNonNull(Objects.java:203)
...
at java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:479)
at 
org.apache.solr.index.SlowCompositeReaderWrapper.(SlowCompositeReaderWrapper.java:76)
at 
org.apache.solr.index.SlowCompositeReaderWrapper.wrap(SlowCompositeReaderWrapper.java:57)
at 
org.apache.solr.search.SolrIndexSearcher.(SolrIndexSearcher.java:252)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2034)
... 12 more

In looking at the de-compiled code (SlowCompositeReaderWrapper), lines 72-77, 
and it appears that one or more "leaf" files doesn't have a "min-version" set.  
That's a guess.  If so, does this mean Solr 7.0.0 can't read a 6.5.0 index?

Thanks

Wayne Johnson
801-240-4024
wjohnson...@ldschurch.org<mailto:wjohnson...@ldschurch.org>
[familysearch2.JPG]



Re: Solr Spatial Index and Data

2017-09-17 Thread Furkan KAMACI
Hi Can,

For your first question: you should share more information with us as Rick
indicated. Do you have any errors, do you have unique ids or not etc?

For the second one: you should read here:
https://cwiki.apache.org/confluence/display/solr/Spatial+Search and ask
your questions if you have any.

Kind Regards,
Furkan KAMACI

On Thu, Sep 14, 2017 at 1:34 PM, Rick Leir  wrote:

> hi Can Ezgi
> > First of all, i want to use spatial index for my data include polyghons
> and points. But solr indexed first 18 rows, other rows not indexed.
>
> Do all rows have a unique id field?
>
> Are there errors in the logfile?
> cheers -- Rick
>
>
> .
>


Re: Adding UniqueKey to an existing Solr 6.4 Index

2017-09-15 Thread Erick Erickson
Not really. Do note that atomic updates require
1> all _original_ fields (i.e. fields that are _not_ destinations for
copyFields) have stored=true
2> no destination of a copyField has stored=true
3> compose the original document from stored fields and re-index the
doc. This latter just means that atomic updates are actually slightly
more work than just re-indexing the doc from the system-of-record (as
far as Solr is concerned).

The decision to use atomic updates is up to you of course, the slight
extra work may be bettern than getting the docs from the original
source...

Best,
Erick

On Fri, Sep 15, 2017 at 10:38 AM, Pankaj Gurumukhi
 wrote:
> Hello,
>
> I have a single node Solr 6.4 server, with a Index of 100 Million documents. 
> The default "id" is the primary key of this index. Now, I would like to setup 
> an update process to insert new documents, and update existing documents 
> based on availability of value in another field (say ProductId), that is 
> different from the default "id". Now, to ensure that I use the Solr provided 
> De-Duplication method by having a new field SignatureField using the 
> ProductId as UniqueKey. Considering the millions of documents I have, I would 
> like to ask if its possible to setup a De-Duplication mechanism in an 
> existing solr index with the following steps:
>
> a. Add new field SignatureField, and configure it as UniqueKey in Solr 
> schema.
>
> b.Run an Atomic Update process on all documents, to update the value of 
> this new field SignatureField.
>
> Is there an easier/better way to add a SignatureField to an existing large 
> index?
>
> Thx,
> Pankaj
>


Adding UniqueKey to an existing Solr 6.4 Index

2017-09-15 Thread Pankaj Gurumukhi
Hello,

I have a single node Solr 6.4 server, with a Index of 100 Million documents. 
The default "id" is the primary key of this index. Now, I would like to setup 
an update process to insert new documents, and update existing documents based 
on availability of value in another field (say ProductId), that is different 
from the default "id". Now, to ensure that I use the Solr provided 
De-Duplication method by having a new field SignatureField using the ProductId 
as UniqueKey. Considering the millions of documents I have, I would like to ask 
if its possible to setup a De-Duplication mechanism in an existing solr index 
with the following steps:

a. Add new field SignatureField, and configure it as UniqueKey in Solr 
schema.

b.Run an Atomic Update process on all documents, to update the value of 
this new field SignatureField.

Is there an easier/better way to add a SignatureField to an existing large 
index?

Thx,
Pankaj



Re: Solr Spatial Index and Data

2017-09-14 Thread Rick Leir

hi Can Ezgi
> First of all, i want to use spatial index for my data include 
polyghons and points. But solr indexed first 18 rows, other rows not 
indexed.


Do all rows have a unique id field?

Are there errors in the logfile?
cheers -- Rick


.


Solr Spatial Index and Data

2017-09-14 Thread Can Ezgi Aydemir
Hi everyone,



First of all, i want to use spatial index for my data include polyghons and 
points. But solr indexed first 18 rows, other rows not indexed. I need sample 
datas include polyghons and points.



Other problem, i will write spatial query this datas. This spatial query 
include intersect, neighborhood, in etc. Please could you help me this query 
prepare?



Thx for interest.



Best regards.





[cid:74426A0B-010D-4871-A556-A3590DE88C60@islem.com.tr.]

Can Ezgi AYDEMİR
Oracle Veri Tabanı Yöneticisi

İşlem Coğrafi Bilgi Sistemleri Müh. & Eğitim AŞ.
2024.Cadde No:14, Beysukent 06800, Ankara, Türkiye
T : 0 312 233 50 00 .:. F : 0312 235 56 82
E :  
cayde...@islem.com.tr<https://mail.islem.com.tr/owa/redir.aspx?REF=nPSsfnBmV5Ce9vWorvlOrrYthN1Wt5jhrDrHz4IuPgJuXODmM8nUCAFtYWlsdG86Z2R1cmFuQGlzbGVtLmNvbS50cg..>
 .:. W : https://mail.islem.com.tr/owa/redir.aspx?REF=q0Pp2HH-W10G07gbyIRn7NyrFWyaL2QLhqXKE1SMNj1uXODmM8nUCAFodHRwOi8vd3d3LmlzbGVtLmNvbS50ci8.>

Bu e-posta ve ekindekiler gizli bilgiler içeriyor olabilir ve sadece adreslenen 
kişileri ilgilendirir. Eğer adreslenen kişi siz değilseniz, bu e-postayı 
yaymayınız, dağıtmayınız veya kopyalamayınız. Eğer bu e-posta yanlışlıkla size 
gönderildiyse, lütfen bu e-posta ve ekindeki dosyaları sisteminizden siliniz ve 
göndereni hemen bilgilendiriniz. Ayrıca, bu e-posta ve ekindeki dosyaları virüs 
bulaşması ihtimaline karşı taratınız. İŞLEM GIS® bu e-posta ile taşınabilecek 
herhangi bir virüsün neden olabileceği hasarın sorumluluğunu kabul etmez. Bilgi 
için:b...@islem.com.tr This message may contain confidential information and is 
intended only for recipient name. If you are not the named addressee you should 
not disseminate, distribute or copy this e-mail. Please notify the sender 
immediately if you have received this e-mail by mistake and delete this e-mail 
from your system. Finally, the recipient should check this email and any 
attachments for the presence of viruses. İŞLEM GIS® accepts no liability for 
any damage may be caused by any virus transmitted by this email.” For 
information: b...@islem.com.tr


Re: Freeze Index

2017-09-14 Thread Toke Eskildsen
On Wed, 2017-09-13 at 11:56 -0700, fabigol wrote:
> my problem is that my index freeze several time and i don't know why.
> So i lost all the data of my index.
> I have 14 million of documents from postgresql database. I have an
> only node with 31 GO for my JVM and my server has 64GO. My index make
> 6 GO on the HDD.
>
> Is it a good configuration?

If you look in the admin GUI, you can see how much memory is actually
used by the JVM. My guess is that it is _way_ lower than 31GB. A 6GB
index is quite small and unless you do special processing, you should
be fine with a 2GB JVM or something like that.

One of the symptoms for having too large a memory allocation for the
JVM are occasional long pauses due to garbage collection. However, you
should not lose anything - it is just a pause. Can you describe in more
detail what you mean by freeze and losing data?

- Toke Eskildsen, Royal Danish Library



Re: Freeze Index

2017-09-13 Thread Rick Leir
Fabien,
What do you see in the logfile at the time of the freeze?
Cheers -- Rick

On September 13, 2017 3:01:17 PM EDT, fabigol  
wrote:
>hi,
>my problem is that my index freeze several time and i don't know why.
>So i
>lost all the data of my index.
>I have 14 million of documents from postgresql database. I have an only
>node
>with 31 GO for my JVM and my server has 64GO. My index make 6 GO on the
>HDD.
>Is it a good configuration?
>
>Someone can help me.
>
>thank for advance 
>
>
>
>
>
>--
>Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Freeze Index

2017-09-13 Thread fabigol
hi,
my problem is that my index freeze several time and i don't know why. So i
lost all the data of my index.
I have 14 million of documents from postgresql database. I have an only node
with 31 GO for my JVM and my server has 64GO. My index make 6 GO on the HDD.
Is it a good configuration?

Someone can help me.

thank for advance 





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Freeze Index

2017-09-13 Thread fabigol
hi,
my problem is that my index freeze several time and i don't know why. So i
lost all the data of my index.
I have 14 million of documents from postgresql database. I have an only node
with 31 GO for my JVM and my server has 64GO. My index make 6 GO on the HDD.
Is it a good configuration?

Someone can help me.

thank for advance 





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Index relational database

2017-08-31 Thread Erick Erickson
To pile on here: When you denormalize you also get some functionality
that you do not get with Solr joins, they've been called "pseudo
joins" in Solr for a reason.

If you just use the simple approach of indexing the two tables then
joining across them you can't return fields from both tables in a
single document. To do that you need to use parent/child docs which
has its own restrictions.

So rather than worry excessively about which is faster, I'd recommend
you decide on the functionality you need as a starting point.

Best,
Erick

On Thu, Aug 31, 2017 at 7:34 AM, Walter Underwood  wrote:
> There is no way tell which is faster without trying it.
>
> Query speed depends on the size of the data (rows), the complexity of the 
> join, which database, what kind of disk, etc.
>
> Solr speed depends on the size of the documents, the complexity of your 
> analysis chains, what kind of disk, how much CPU is available, etc.
>
> We have one query that extracts 9 million documents from MySQL in about 20 
> minutes. We have another query on a different MySQL database that takes 90 
> minutes to get 7 million documents.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Aug 31, 2017, at 12:54 AM, Renuka Srishti  
>> wrote:
>>
>> Thanks Erick, Walter
>> But I think join query will reduce the performance. Denormalization will be
>> the better way than join query, am I right?
>>
>>
>>
>> On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood 
>> wrote:
>>
>>> Think about making a denormalized view, with all the fields needed in one
>>> table. That view gets sent to Solr. Each row is a Solr document.
>>>
>>> It could be implemented as a view or as SQL, but that is a useful mental
>>> model for people starting from a relational background.
>>>
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>
>>>> On Aug 30, 2017, at 9:14 AM, Erick Erickson 
>>> wrote:
>>>>
>>>> First, it's often best, by far, to denormalize the data in your solr
>>> index,
>>>> that's what I'd explore first.
>>>>
>>>> If you can't do that, the join query parser might work for you.
>>>>
>>>> On Aug 30, 2017 4:49 AM, "Renuka Srishti" 
>>>> wrote:
>>>>
>>>>> Thanks Susheel for your response.
>>>>> Here is the scenario about which I am talking:
>>>>>
>>>>>  - Let suppose there are two documents doc1 and doc2.
>>>>>  - I want to fetch the data from doc2 on the basis of doc1 fields which
>>>>>  are related to doc2.
>>>>>
>>>>> How to achieve this efficiently.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Renuka Srishti
>>>>>
>>>>>
>>>>> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar 
>>>>> wrote:
>>>>>
>>>>>> Hello Renuka,
>>>>>>
>>>>>> I would suggest to start with your use case(s). May be start with your
>>>>>> first use case with the below questions
>>>>>>
>>>>>> a) What is that you want to search (which fields like name, desc, city
>>>>>> etc.)
>>>>>> b) What is that you want to show part of search result (name, city
>>> etc.)
>>>>>>
>>>>>> Based on above two questions, you would know what data to pull in from
>>>>>> relational database and create solr schema and index the data.
>>>>>>
>>>>>> You may first try to denormalize / flatten the structure so that you
>>> deal
>>>>>> with one collection/schema and query upon it.
>>>>>>
>>>>>> HTH.
>>>>>>
>>>>>> Thanks,
>>>>>> Susheel
>>>>>>
>>>>>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti <
>>>>>> renuka.srisht...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hii,
>>>>>>>
>>>>>>> What is the best way to index relational database, and how it impacts
>>>>> on
>>>>>>> the performance?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Renuka Srishti
>>>>>>>
>>>>>>
>>>>>
>>>
>>>
>


Re: Index relational database

2017-08-31 Thread Walter Underwood
There is no way tell which is faster without trying it.

Query speed depends on the size of the data (rows), the complexity of the join, 
which database, what kind of disk, etc.

Solr speed depends on the size of the documents, the complexity of your 
analysis chains, what kind of disk, how much CPU is available, etc.

We have one query that extracts 9 million documents from MySQL in about 20 
minutes. We have another query on a different MySQL database that takes 90 
minutes to get 7 million documents.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Aug 31, 2017, at 12:54 AM, Renuka Srishti  
> wrote:
> 
> Thanks Erick, Walter
> But I think join query will reduce the performance. Denormalization will be
> the better way than join query, am I right?
> 
> 
> 
> On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood 
> wrote:
> 
>> Think about making a denormalized view, with all the fields needed in one
>> table. That view gets sent to Solr. Each row is a Solr document.
>> 
>> It could be implemented as a view or as SQL, but that is a useful mental
>> model for people starting from a relational background.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Aug 30, 2017, at 9:14 AM, Erick Erickson 
>> wrote:
>>> 
>>> First, it's often best, by far, to denormalize the data in your solr
>> index,
>>> that's what I'd explore first.
>>> 
>>> If you can't do that, the join query parser might work for you.
>>> 
>>> On Aug 30, 2017 4:49 AM, "Renuka Srishti" 
>>> wrote:
>>> 
>>>> Thanks Susheel for your response.
>>>> Here is the scenario about which I am talking:
>>>> 
>>>>  - Let suppose there are two documents doc1 and doc2.
>>>>  - I want to fetch the data from doc2 on the basis of doc1 fields which
>>>>  are related to doc2.
>>>> 
>>>> How to achieve this efficiently.
>>>> 
>>>> 
>>>> Thanks,
>>>> 
>>>> Renuka Srishti
>>>> 
>>>> 
>>>> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar 
>>>> wrote:
>>>> 
>>>>> Hello Renuka,
>>>>> 
>>>>> I would suggest to start with your use case(s). May be start with your
>>>>> first use case with the below questions
>>>>> 
>>>>> a) What is that you want to search (which fields like name, desc, city
>>>>> etc.)
>>>>> b) What is that you want to show part of search result (name, city
>> etc.)
>>>>> 
>>>>> Based on above two questions, you would know what data to pull in from
>>>>> relational database and create solr schema and index the data.
>>>>> 
>>>>> You may first try to denormalize / flatten the structure so that you
>> deal
>>>>> with one collection/schema and query upon it.
>>>>> 
>>>>> HTH.
>>>>> 
>>>>> Thanks,
>>>>> Susheel
>>>>> 
>>>>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti <
>>>>> renuka.srisht...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Hii,
>>>>>> 
>>>>>> What is the best way to index relational database, and how it impacts
>>>> on
>>>>>> the performance?
>>>>>> 
>>>>>> Thanks
>>>>>> Renuka Srishti
>>>>>> 
>>>>> 
>>>> 
>> 
>> 



Re: Solr index getting replaced instead of merged

2017-08-31 Thread David Hastings
>Can anyone tell is it possible to paginate the data using Solr UI?

use the start/rows input fields using standard array start as 0,  ie
start=0, rows=10
start=10, rows=10
start=20, rows=10


On Thu, Aug 31, 2017 at 8:21 AM, Agrawal, Harshal (GE Digital) <
harshal.agra...@ge.com> wrote:

> Hello All,
>
> If I check out clear option while indexing 2nd table it worked.Thanks
> Gurdeep :)
> Can anyone tell is it possible to paginate the data using Solr UI?
> If yes please tell me the features which I can use?
>
> Regards
> Harshal
>
> From: Agrawal, Harshal (GE Digital)
> Sent: Wednesday, August 30, 2017 4:36 PM
> To: 'solr-user@lucene.apache.org' 
> Cc: Singh, Susnata (GE Digital) 
> Subject: Solr index getting replaced instead of merged
>
> Hello Guys,
>
> I have installed solr in my local system and was able to connect to
> Teradata successfully.
> For single table I am able to index the data and query it also but when I
> am trying for multiple tables in the same schema and doing indexing one by
> one respectively.
> I can see datasets getting replaced instead of merged .
>
> Can anyone help me please:
>
> Regards
> Harshal
>
>
>


RE: Solr index getting replaced instead of merged

2017-08-31 Thread Agrawal, Harshal (GE Digital)
Hello All,

If I check out clear option while indexing 2nd table it worked.Thanks Gurdeep :)
Can anyone tell is it possible to paginate the data using Solr UI?
If yes please tell me the features which I can use?

Regards
Harshal

From: Agrawal, Harshal (GE Digital)
Sent: Wednesday, August 30, 2017 4:36 PM
To: 'solr-user@lucene.apache.org' 
Cc: Singh, Susnata (GE Digital) 
Subject: Solr index getting replaced instead of merged

Hello Guys,

I have installed solr in my local system and was able to connect to Teradata 
successfully.
For single table I am able to index the data and query it also but when I am 
trying for multiple tables in the same schema and doing indexing one by one 
respectively.
I can see datasets getting replaced instead of merged .

Can anyone help me please:

Regards
Harshal




Re: Index relational database

2017-08-31 Thread David Hastings
when indexing a relational database its generally always best to denormalize it
in a view or in your indexing code

On Thu, Aug 31, 2017 at 3:54 AM, Renuka Srishti 
wrote:

> Thanks Erick, Walter
> But I think join query will reduce the performance. Denormalization will be
> the better way than join query, am I right?
>
>
>
> On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood 
> wrote:
>
> > Think about making a denormalized view, with all the fields needed in one
> > table. That view gets sent to Solr. Each row is a Solr document.
> >
> > It could be implemented as a view or as SQL, but that is a useful mental
> > model for people starting from a relational background.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> > > On Aug 30, 2017, at 9:14 AM, Erick Erickson 
> > wrote:
> > >
> > > First, it's often best, by far, to denormalize the data in your solr
> > index,
> > > that's what I'd explore first.
> > >
> > > If you can't do that, the join query parser might work for you.
> > >
> > > On Aug 30, 2017 4:49 AM, "Renuka Srishti" 
> > > wrote:
> > >
> > >> Thanks Susheel for your response.
> > >> Here is the scenario about which I am talking:
> > >>
> > >>   - Let suppose there are two documents doc1 and doc2.
> > >>   - I want to fetch the data from doc2 on the basis of doc1 fields
> which
> > >>   are related to doc2.
> > >>
> > >> How to achieve this efficiently.
> > >>
> > >>
> > >> Thanks,
> > >>
> > >> Renuka Srishti
> > >>
> > >>
> > >> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar  >
> > >> wrote:
> > >>
> > >>> Hello Renuka,
> > >>>
> > >>> I would suggest to start with your use case(s). May be start with
> your
> > >>> first use case with the below questions
> > >>>
> > >>> a) What is that you want to search (which fields like name, desc,
> city
> > >>> etc.)
> > >>> b) What is that you want to show part of search result (name, city
> > etc.)
> > >>>
> > >>> Based on above two questions, you would know what data to pull in
> from
> > >>> relational database and create solr schema and index the data.
> > >>>
> > >>> You may first try to denormalize / flatten the structure so that you
> > deal
> > >>> with one collection/schema and query upon it.
> > >>>
> > >>> HTH.
> > >>>
> > >>> Thanks,
> > >>> Susheel
> > >>>
> > >>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti <
> > >>> renuka.srisht...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Hii,
> > >>>>
> > >>>> What is the best way to index relational database, and how it
> impacts
> > >> on
> > >>>> the performance?
> > >>>>
> > >>>> Thanks
> > >>>> Renuka Srishti
> > >>>>
> > >>>
> > >>
> >
> >
>


Re: Index relational database

2017-08-31 Thread Renuka Srishti
Thank all for sharing your thoughts  :)

On Thu, Aug 31, 2017 at 5:28 PM, Susheel Kumar 
wrote:

> Yes, if you can avoid join and work with flat/denormalized structure then
> that's the best.
>
> On Thu, Aug 31, 2017 at 3:54 AM, Renuka Srishti <
> renuka.srisht...@gmail.com>
> wrote:
>
> > Thanks Erick, Walter
> > But I think join query will reduce the performance. Denormalization will
> be
> > the better way than join query, am I right?
> >
> >
> >
> > On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood <
> wun...@wunderwood.org>
> > wrote:
> >
> > > Think about making a denormalized view, with all the fields needed in
> one
> > > table. That view gets sent to Solr. Each row is a Solr document.
> > >
> > > It could be implemented as a view or as SQL, but that is a useful
> mental
> > > model for people starting from a relational background.
> > >
> > > wunder
> > > Walter Underwood
> > > wun...@wunderwood.org
> > > http://observer.wunderwood.org/  (my blog)
> > >
> > >
> > > > On Aug 30, 2017, at 9:14 AM, Erick Erickson  >
> > > wrote:
> > > >
> > > > First, it's often best, by far, to denormalize the data in your solr
> > > index,
> > > > that's what I'd explore first.
> > > >
> > > > If you can't do that, the join query parser might work for you.
> > > >
> > > > On Aug 30, 2017 4:49 AM, "Renuka Srishti" <
> renuka.srisht...@gmail.com>
> > > > wrote:
> > > >
> > > >> Thanks Susheel for your response.
> > > >> Here is the scenario about which I am talking:
> > > >>
> > > >>   - Let suppose there are two documents doc1 and doc2.
> > > >>   - I want to fetch the data from doc2 on the basis of doc1 fields
> > which
> > > >>   are related to doc2.
> > > >>
> > > >> How to achieve this efficiently.
> > > >>
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Renuka Srishti
> > > >>
> > > >>
> > > >> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar <
> susheel2...@gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >>> Hello Renuka,
> > > >>>
> > > >>> I would suggest to start with your use case(s). May be start with
> > your
> > > >>> first use case with the below questions
> > > >>>
> > > >>> a) What is that you want to search (which fields like name, desc,
> > city
> > > >>> etc.)
> > > >>> b) What is that you want to show part of search result (name, city
> > > etc.)
> > > >>>
> > > >>> Based on above two questions, you would know what data to pull in
> > from
> > > >>> relational database and create solr schema and index the data.
> > > >>>
> > > >>> You may first try to denormalize / flatten the structure so that
> you
> > > deal
> > > >>> with one collection/schema and query upon it.
> > > >>>
> > > >>> HTH.
> > > >>>
> > > >>> Thanks,
> > > >>> Susheel
> > > >>>
> > > >>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti <
> > > >>> renuka.srisht...@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>>> Hii,
> > > >>>>
> > > >>>> What is the best way to index relational database, and how it
> > impacts
> > > >> on
> > > >>>> the performance?
> > > >>>>
> > > >>>> Thanks
> > > >>>> Renuka Srishti
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
>


Re: Index relational database

2017-08-31 Thread Susheel Kumar
Yes, if you can avoid join and work with flat/denormalized structure then
that's the best.

On Thu, Aug 31, 2017 at 3:54 AM, Renuka Srishti 
wrote:

> Thanks Erick, Walter
> But I think join query will reduce the performance. Denormalization will be
> the better way than join query, am I right?
>
>
>
> On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood 
> wrote:
>
> > Think about making a denormalized view, with all the fields needed in one
> > table. That view gets sent to Solr. Each row is a Solr document.
> >
> > It could be implemented as a view or as SQL, but that is a useful mental
> > model for people starting from a relational background.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> > > On Aug 30, 2017, at 9:14 AM, Erick Erickson 
> > wrote:
> > >
> > > First, it's often best, by far, to denormalize the data in your solr
> > index,
> > > that's what I'd explore first.
> > >
> > > If you can't do that, the join query parser might work for you.
> > >
> > > On Aug 30, 2017 4:49 AM, "Renuka Srishti" 
> > > wrote:
> > >
> > >> Thanks Susheel for your response.
> > >> Here is the scenario about which I am talking:
> > >>
> > >>   - Let suppose there are two documents doc1 and doc2.
> > >>   - I want to fetch the data from doc2 on the basis of doc1 fields
> which
> > >>   are related to doc2.
> > >>
> > >> How to achieve this efficiently.
> > >>
> > >>
> > >> Thanks,
> > >>
> > >> Renuka Srishti
> > >>
> > >>
> > >> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar  >
> > >> wrote:
> > >>
> > >>> Hello Renuka,
> > >>>
> > >>> I would suggest to start with your use case(s). May be start with
> your
> > >>> first use case with the below questions
> > >>>
> > >>> a) What is that you want to search (which fields like name, desc,
> city
> > >>> etc.)
> > >>> b) What is that you want to show part of search result (name, city
> > etc.)
> > >>>
> > >>> Based on above two questions, you would know what data to pull in
> from
> > >>> relational database and create solr schema and index the data.
> > >>>
> > >>> You may first try to denormalize / flatten the structure so that you
> > deal
> > >>> with one collection/schema and query upon it.
> > >>>
> > >>> HTH.
> > >>>
> > >>> Thanks,
> > >>> Susheel
> > >>>
> > >>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti <
> > >>> renuka.srisht...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Hii,
> > >>>>
> > >>>> What is the best way to index relational database, and how it
> impacts
> > >> on
> > >>>> the performance?
> > >>>>
> > >>>> Thanks
> > >>>> Renuka Srishti
> > >>>>
> > >>>
> > >>
> >
> >
>


Re: Index relational database

2017-08-31 Thread Renuka Srishti
Thanks Erick, Walter
But I think join query will reduce the performance. Denormalization will be
the better way than join query, am I right?



On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood 
wrote:

> Think about making a denormalized view, with all the fields needed in one
> table. That view gets sent to Solr. Each row is a Solr document.
>
> It could be implemented as a view or as SQL, but that is a useful mental
> model for people starting from a relational background.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Aug 30, 2017, at 9:14 AM, Erick Erickson 
> wrote:
> >
> > First, it's often best, by far, to denormalize the data in your solr
> index,
> > that's what I'd explore first.
> >
> > If you can't do that, the join query parser might work for you.
> >
> > On Aug 30, 2017 4:49 AM, "Renuka Srishti" 
> > wrote:
> >
> >> Thanks Susheel for your response.
> >> Here is the scenario about which I am talking:
> >>
> >>   - Let suppose there are two documents doc1 and doc2.
> >>   - I want to fetch the data from doc2 on the basis of doc1 fields which
> >>   are related to doc2.
> >>
> >> How to achieve this efficiently.
> >>
> >>
> >> Thanks,
> >>
> >> Renuka Srishti
> >>
> >>
> >> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar 
> >> wrote:
> >>
> >>> Hello Renuka,
> >>>
> >>> I would suggest to start with your use case(s). May be start with your
> >>> first use case with the below questions
> >>>
> >>> a) What is that you want to search (which fields like name, desc, city
> >>> etc.)
> >>> b) What is that you want to show part of search result (name, city
> etc.)
> >>>
> >>> Based on above two questions, you would know what data to pull in from
> >>> relational database and create solr schema and index the data.
> >>>
> >>> You may first try to denormalize / flatten the structure so that you
> deal
> >>> with one collection/schema and query upon it.
> >>>
> >>> HTH.
> >>>
> >>> Thanks,
> >>> Susheel
> >>>
> >>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti <
> >>> renuka.srisht...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hii,
> >>>>
> >>>> What is the best way to index relational database, and how it impacts
> >> on
> >>>> the performance?
> >>>>
> >>>> Thanks
> >>>> Renuka Srishti
> >>>>
> >>>
> >>
>
>


Re: Index relational database

2017-08-30 Thread Walter Underwood
Think about making a denormalized view, with all the fields needed in one 
table. That view gets sent to Solr. Each row is a Solr document.

It could be implemented as a view or as SQL, but that is a useful mental model 
for people starting from a relational background.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Aug 30, 2017, at 9:14 AM, Erick Erickson  wrote:
> 
> First, it's often best, by far, to denormalize the data in your solr index,
> that's what I'd explore first.
> 
> If you can't do that, the join query parser might work for you.
> 
> On Aug 30, 2017 4:49 AM, "Renuka Srishti" 
> wrote:
> 
>> Thanks Susheel for your response.
>> Here is the scenario about which I am talking:
>> 
>>   - Let suppose there are two documents doc1 and doc2.
>>   - I want to fetch the data from doc2 on the basis of doc1 fields which
>>   are related to doc2.
>> 
>> How to achieve this efficiently.
>> 
>> 
>> Thanks,
>> 
>> Renuka Srishti
>> 
>> 
>> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar 
>> wrote:
>> 
>>> Hello Renuka,
>>> 
>>> I would suggest to start with your use case(s). May be start with your
>>> first use case with the below questions
>>> 
>>> a) What is that you want to search (which fields like name, desc, city
>>> etc.)
>>> b) What is that you want to show part of search result (name, city etc.)
>>> 
>>> Based on above two questions, you would know what data to pull in from
>>> relational database and create solr schema and index the data.
>>> 
>>> You may first try to denormalize / flatten the structure so that you deal
>>> with one collection/schema and query upon it.
>>> 
>>> HTH.
>>> 
>>> Thanks,
>>> Susheel
>>> 
>>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti <
>>> renuka.srisht...@gmail.com>
>>> wrote:
>>> 
>>>> Hii,
>>>> 
>>>> What is the best way to index relational database, and how it impacts
>> on
>>>> the performance?
>>>> 
>>>> Thanks
>>>> Renuka Srishti
>>>> 
>>> 
>> 



Re: Index relational database

2017-08-30 Thread Erick Erickson
First, it's often best, by far, to denormalize the data in your solr index,
that's what I'd explore first.

If you can't do that, the join query parser might work for you.

On Aug 30, 2017 4:49 AM, "Renuka Srishti" 
wrote:

> Thanks Susheel for your response.
> Here is the scenario about which I am talking:
>
>- Let suppose there are two documents doc1 and doc2.
>- I want to fetch the data from doc2 on the basis of doc1 fields which
>are related to doc2.
>
> How to achieve this efficiently.
>
>
> Thanks,
>
> Renuka Srishti
>
>
> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar 
> wrote:
>
> > Hello Renuka,
> >
> > I would suggest to start with your use case(s). May be start with your
> > first use case with the below questions
> >
> > a) What is that you want to search (which fields like name, desc, city
> > etc.)
> > b) What is that you want to show part of search result (name, city etc.)
> >
> > Based on above two questions, you would know what data to pull in from
> > relational database and create solr schema and index the data.
> >
> > You may first try to denormalize / flatten the structure so that you deal
> > with one collection/schema and query upon it.
> >
> > HTH.
> >
> > Thanks,
> > Susheel
> >
> > On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti <
> > renuka.srisht...@gmail.com>
> > wrote:
> >
> > > Hii,
> > >
> > > What is the best way to index relational database, and how it impacts
> on
> > > the performance?
> > >
> > > Thanks
> > > Renuka Srishti
> > >
> >
>


Re: Solr index getting replaced instead of merged

2017-08-30 Thread Gurdeep Singh
Not sure how you are doing indexing. Try adding clean=false in your indexing 
command/script when you do second table indexing.





> On 30 Aug 2017, at 7:06 PM, Agrawal, Harshal (GE Digital) 
>  wrote:
> 
> Hello Guys,
> 
> I have installed solr in my local system and was able to connect to Teradata 
> successfully.
> For single table I am able to index the data and query it also but when I am 
> trying for multiple tables in the same schema and doing indexing one by one 
> respectively.
> I can see datasets getting replaced instead of merged .
> 
> Can anyone help me please:
> 
> Regards
> Harshal
> 
> 


Solr index getting replaced instead of merged

2017-08-30 Thread Agrawal, Harshal (GE Digital)
Hello Guys,

I have installed solr in my local system and was able to connect to Teradata 
successfully.
For single table I am able to index the data and query it also but when I am 
trying for multiple tables in the same schema and doing indexing one by one 
respectively.
I can see datasets getting replaced instead of merged .

Can anyone help me please:

Regards
Harshal




Re: Index relational database

2017-08-30 Thread Renuka Srishti
Thanks Susheel for your response.
Here is the scenario about which I am talking:

   - Let suppose there are two documents doc1 and doc2.
   - I want to fetch the data from doc2 on the basis of doc1 fields which
   are related to doc2.

How to achieve this efficiently.


Thanks,

Renuka Srishti


On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar 
wrote:

> Hello Renuka,
>
> I would suggest to start with your use case(s). May be start with your
> first use case with the below questions
>
> a) What is that you want to search (which fields like name, desc, city
> etc.)
> b) What is that you want to show part of search result (name, city etc.)
>
> Based on above two questions, you would know what data to pull in from
> relational database and create solr schema and index the data.
>
> You may first try to denormalize / flatten the structure so that you deal
> with one collection/schema and query upon it.
>
> HTH.
>
> Thanks,
> Susheel
>
> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti <
> renuka.srisht...@gmail.com>
> wrote:
>
> > Hii,
> >
> > What is the best way to index relational database, and how it impacts on
> > the performance?
> >
> > Thanks
> > Renuka Srishti
> >
>


solr index replace with index from another environment

2017-08-28 Thread Satya Marivada
Hi there,

We are using solr-6.3.0 and have the need to replace the solr index in
production with the solr index from another environment on periodical
basis. But the jvms have to be recycled for the updated index to take
effect. Is there any way this can be achieved without restarting the jvms?

Using aliases as described below, there is an alternative, but I dont think
it is useful in my case, where I have the index from other environment
ready. If I build new collection and replace index, again, the jvms need to
be restarted for the new index to take effect.

https://stackoverflow.com/questions/45158394/replacing-old-indexed-data-with-new-data-in-apache-solr-with-zero-downtime

Any other suggestions please.

Thanks,
satya


Re: Index relational database

2017-08-28 Thread Susheel Kumar
Hello Renuka,

I would suggest to start with your use case(s). May be start with your
first use case with the below questions

a) What is that you want to search (which fields like name, desc, city etc.)
b) What is that you want to show part of search result (name, city etc.)

Based on above two questions, you would know what data to pull in from
relational database and create solr schema and index the data.

You may first try to denormalize / flatten the structure so that you deal
with one collection/schema and query upon it.

HTH.

Thanks,
Susheel

On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti 
wrote:

> Hii,
>
> What is the best way to index relational database, and how it impacts on
> the performance?
>
> Thanks
> Renuka Srishti
>


Index relational database

2017-08-28 Thread Renuka Srishti
Hii,

What is the best way to index relational database, and how it impacts on
the performance?

Thanks
Renuka Srishti


Re: Correct approach to copy index between solr clouds?

2017-08-26 Thread Erick Erickson
write.lock is used whenever a core(replica) wants to, well, write to
the index. Each individual replica is sure to only write to the index
with one thread. If two threads were to write to an index, there's a
very good chance the index will be corrupt, so it's a safeguard
against two or more threads or processes writing to the same index at
the same time.

Since a dataDir can be pointed at an arbitrary directory, not only
could two replicas point to the same index within the same Solr JVM,
but you could have some completely different JVM, possibly even on a
completely different machine point at the _same_ directory (this
latter with any kind of shared filesystem).

In the default case, Java's FileChannel.tryLock(); is used to acquire
an exclusive lock. If two or more threads in the same JVM or two or
more processes point to the same write.lock file one of the replicas
will fail to open.

So I mis-spoke. Just copying the write.lock file from one place to
another along with all the rest of the index files should be OK. Since
it's a new file in a new place, FileChannel.tryLock() can succeed.

You still should be sure that the indexing is stopped on the source
and a hard commit has been performed though. If you just copy from one
to another while indexing is actively happening you might get a
mismatched segments file.

This last might need a bit of explanation. During normal indexing, new
segment(s) are written to. On hard commit (or when background merging
happens) once all the new segment(s) are successfully closed, the
segments file is updated with a list of all of them. This, by the way,
is how an indexSearcher has a "snapshot" of the directory as of the
last commit; it reads the current segments file and opens a all the
segments.

Anyway, theoretically if you just copy the current index directory
while indexing is going on, you could potentially have a mismatch
between the truly closed segments and what has been written the
segments file. This would be avoided by using fetchIndex since that's
been hardened to handle this case, but being sure indexing is stopped
would serve as well.

Best,
Erick


On Sat, Aug 26, 2017 at 6:36 PM, Wei  wrote:
> Thanks Erick. Can you explain a bit more on the write.lock file? So far I
> have been copying it over from B to A and haven't seen issue starting the
> replica.
>
> On Sat, Aug 26, 2017 at 9:25 AM, Erick Erickson 
> wrote:
>
>> Approach 2 is sufficient. You do have to insure that you don't copy
>> over the write.lock file however as you may not be able to start
>> replicas if that's there.
>>
>> There's a relatively little-known third option. You an (ab)use the
>> replication API "fetchindex" command, see:
>> https://cwiki.apache.org/confluence/display/solr/Index+Replication to
>> pull the index from Cloud B to replicas on Cloud A. That has the
>> advantage of working even if you are actively indexing to Cloud B.
>> NOTE: currently you cannot _query_ CloudA (the target) while the
>> fetchindex is going on, but I doubt you really care since you were
>> talking about having Cloud A offline anyway. So for each replica you
>> fetch to you'll send the fetchindex command directly to the replica on
>> Cloud A and the "masterURL" will be the corresponding replica on Cloud
>> B.
>>
>> Finally, what I'd really do is _only_ have one replica for each shard
>> on Cloud A active and fetch to _that_ replica. I'd also delete the
>> data dir on all the other replicas for the shard on Cloud A. Then as
>> you bring the additional replicas up they'll do a full synch from the
>> leader.
>>
>> FWIW,
>> Erick
>>
>> On Fri, Aug 25, 2017 at 6:53 PM, Wei  wrote:
>> > Hi,
>> >
>> > In our set up there are two solr clouds:
>> >
>> > Cloud A:  production cloud serves both writes and reads
>> >
>> > Cloud B:  back up cloud serves only writes
>> >
>> > Cloud A and B have the same shard configuration.
>> >
>> > Write requests are sent to both cloud A and B. In certain circumstances
>> > when Cloud A's update lags behind,  we want to bulk copy the binary index
>> > from B to A.
>> >
>> > We have tried two approaches:
>> >
>> > Approach 1.
>> >   For cloud A:
>> >   a. delete collection to wipe out everything
>> >   b. create new collection (data is empty now)
>> >   c. shut down solr server
>> >   d. copy binary index from cloud B to corresponding shard replicas
>> in
>> > cloud A
>> >   e. start solr server
>> >
>> > Approach 2.
>> >   For cloud A:
>> >   a.  shut down solr server
>> >   b.  remove the whole 'data' folder under index/  in each replica
>> >   c.  copy binary index from cloud B to corresponding shard replicas
>> in
>> > cloud A
>> >   d.  start solr server
>> >
>> > Is approach 2 sufficient?  I am wondering if delete/recreate collection
>> > each time is necessary to get cloud into a "clean" state for copy binary
>> > index between solr clouds.
>> >
>> > Thanks for your advice!
>>


Re: Correct approach to copy index between solr clouds?

2017-08-26 Thread Wei
Thanks Erick. Can you explain a bit more on the write.lock file? So far I
have been copying it over from B to A and haven't seen issue starting the
replica.

On Sat, Aug 26, 2017 at 9:25 AM, Erick Erickson 
wrote:

> Approach 2 is sufficient. You do have to insure that you don't copy
> over the write.lock file however as you may not be able to start
> replicas if that's there.
>
> There's a relatively little-known third option. You an (ab)use the
> replication API "fetchindex" command, see:
> https://cwiki.apache.org/confluence/display/solr/Index+Replication to
> pull the index from Cloud B to replicas on Cloud A. That has the
> advantage of working even if you are actively indexing to Cloud B.
> NOTE: currently you cannot _query_ CloudA (the target) while the
> fetchindex is going on, but I doubt you really care since you were
> talking about having Cloud A offline anyway. So for each replica you
> fetch to you'll send the fetchindex command directly to the replica on
> Cloud A and the "masterURL" will be the corresponding replica on Cloud
> B.
>
> Finally, what I'd really do is _only_ have one replica for each shard
> on Cloud A active and fetch to _that_ replica. I'd also delete the
> data dir on all the other replicas for the shard on Cloud A. Then as
> you bring the additional replicas up they'll do a full synch from the
> leader.
>
> FWIW,
> Erick
>
> On Fri, Aug 25, 2017 at 6:53 PM, Wei  wrote:
> > Hi,
> >
> > In our set up there are two solr clouds:
> >
> > Cloud A:  production cloud serves both writes and reads
> >
> > Cloud B:  back up cloud serves only writes
> >
> > Cloud A and B have the same shard configuration.
> >
> > Write requests are sent to both cloud A and B. In certain circumstances
> > when Cloud A's update lags behind,  we want to bulk copy the binary index
> > from B to A.
> >
> > We have tried two approaches:
> >
> > Approach 1.
> >   For cloud A:
> >   a. delete collection to wipe out everything
> >   b. create new collection (data is empty now)
> >   c. shut down solr server
> >   d. copy binary index from cloud B to corresponding shard replicas
> in
> > cloud A
> >   e. start solr server
> >
> > Approach 2.
> >   For cloud A:
> >   a.  shut down solr server
> >   b.  remove the whole 'data' folder under index/  in each replica
> >   c.  copy binary index from cloud B to corresponding shard replicas
> in
> > cloud A
> >   d.  start solr server
> >
> > Is approach 2 sufficient?  I am wondering if delete/recreate collection
> > each time is necessary to get cloud into a "clean" state for copy binary
> > index between solr clouds.
> >
> > Thanks for your advice!
>


Re: Correct approach to copy index between solr clouds?

2017-08-26 Thread Erick Erickson
Approach 2 is sufficient. You do have to insure that you don't copy
over the write.lock file however as you may not be able to start
replicas if that's there.

There's a relatively little-known third option. You an (ab)use the
replication API "fetchindex" command, see:
https://cwiki.apache.org/confluence/display/solr/Index+Replication to
pull the index from Cloud B to replicas on Cloud A. That has the
advantage of working even if you are actively indexing to Cloud B.
NOTE: currently you cannot _query_ CloudA (the target) while the
fetchindex is going on, but I doubt you really care since you were
talking about having Cloud A offline anyway. So for each replica you
fetch to you'll send the fetchindex command directly to the replica on
Cloud A and the "masterURL" will be the corresponding replica on Cloud
B.

Finally, what I'd really do is _only_ have one replica for each shard
on Cloud A active and fetch to _that_ replica. I'd also delete the
data dir on all the other replicas for the shard on Cloud A. Then as
you bring the additional replicas up they'll do a full synch from the
leader.

FWIW,
Erick

On Fri, Aug 25, 2017 at 6:53 PM, Wei  wrote:
> Hi,
>
> In our set up there are two solr clouds:
>
> Cloud A:  production cloud serves both writes and reads
>
> Cloud B:  back up cloud serves only writes
>
> Cloud A and B have the same shard configuration.
>
> Write requests are sent to both cloud A and B. In certain circumstances
> when Cloud A's update lags behind,  we want to bulk copy the binary index
> from B to A.
>
> We have tried two approaches:
>
> Approach 1.
>   For cloud A:
>   a. delete collection to wipe out everything
>   b. create new collection (data is empty now)
>   c. shut down solr server
>   d. copy binary index from cloud B to corresponding shard replicas in
> cloud A
>   e. start solr server
>
> Approach 2.
>   For cloud A:
>   a.  shut down solr server
>   b.  remove the whole 'data' folder under index/  in each replica
>   c.  copy binary index from cloud B to corresponding shard replicas in
> cloud A
>   d.  start solr server
>
> Is approach 2 sufficient?  I am wondering if delete/recreate collection
> each time is necessary to get cloud into a "clean" state for copy binary
> index between solr clouds.
>
> Thanks for your advice!


Correct approach to copy index between solr clouds?

2017-08-25 Thread Wei
Hi,

In our set up there are two solr clouds:

Cloud A:  production cloud serves both writes and reads

Cloud B:  back up cloud serves only writes

Cloud A and B have the same shard configuration.

Write requests are sent to both cloud A and B. In certain circumstances
when Cloud A's update lags behind,  we want to bulk copy the binary index
from B to A.

We have tried two approaches:

Approach 1.
  For cloud A:
  a. delete collection to wipe out everything
  b. create new collection (data is empty now)
  c. shut down solr server
  d. copy binary index from cloud B to corresponding shard replicas in
cloud A
  e. start solr server

Approach 2.
  For cloud A:
  a.  shut down solr server
  b.  remove the whole 'data' folder under index/  in each replica
  c.  copy binary index from cloud B to corresponding shard replicas in
cloud A
  d.  start solr server

Is approach 2 sufficient?  I am wondering if delete/recreate collection
each time is necessary to get cloud into a "clean" state for copy binary
index between solr clouds.

Thanks for your advice!


Re: Solr caching the index file make server refuse serving

2017-08-24 Thread Erick Erickson
10 billion documents on 12 cores is over 800M documents/shard at best.
This is _very_ aggressive for a shard. Could you give more information
about your setup?

I've seen 250M docs fit in 12G memory. I've also seen 10M documents
strain 32G of memory. Details matter a lot. The only way I've been
able to determine what a reasonable number of docs with my queries on
my data is to do "the sizing exercise", which I've outlined here:

https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

While this was written over 5 years ago, it's still accurate.

Best,
Erick

On Thu, Aug 24, 2017 at 6:10 PM, 陈永龙  wrote:
> Hello,
>
> ENV:  solrcloud 6.3
>
> 3*dell server
>
> 128G 12cores 4.3T /server
>
> 3 solr node /server
>
> 20G /node (with parameter –m 20G)
>
> 10 billlion documents totle
>
> Problem:
>
>  When we start solrcloud ,the cached index will make memory 98% or
> more used . And if we continue to index document (batch commit 10 000
> documents),one or more server will refuse serving.Cannot login wia ssh,even
> refuse the monitor.
>
> So,how can I limit the solr’s caching index to memory behavior?
>
> Anyone thanks!
>


Solr caching the index file make server refuse serving

2017-08-24 Thread 陈永龙
Hello,

ENV:  solrcloud 6.3  

3*dell server

128G 12cores 4.3T /server

3 solr node /server

20G /node (with parameter �Cm 20G)

10 billlion documents totle

Problem:

 When we start solrcloud ,the cached index will make memory 98% or
more used . And if we continue to index document (batch commit 10 000
documents),one or more server will refuse serving.Cannot login wia ssh,even
refuse the monitor.

So,how can I limit the solr’s caching index to memory behavior?

Anyone thanks!



Re: Move index directory to another partition

2017-08-10 Thread Mahmoud Almokadem
Thanks all for your commits.

I followed Shawn steps (rsync) cause everything on that volume (ZooKeeper,
Solr home and data) and everything went great.

Thanks again,
Mahmoud


On Sun, Aug 6, 2017 at 12:47 AM, Erick Erickson 
wrote:

> bq: I was envisioning a scenario where the entire solr home is on the old
> volume that's going away.  If I were setting up a Solr install where the
> large/fast storage was a separate filesystem, I would put the solr home
> (or possibly even the entire install) under that mount point.  It would
> be a lot easier than setting dataDir in core.properties for every core,
> especially in a cloud install.
>
> Agreed. Nothing in what I said precludes this. If you don't specify
> dataDir,
> then the index for a new replica goes in the default place, i.e. under
> your install
> directory usually. In your case under your new mount point. I usually don't
> recommend trying to take control of where dataDir points, just let it
> default.
> I only mentioned it so you'd be aware it exists. So if your new install
> is associated with a bigger/better/larger EBS it's all automatic.
>
> bq: If the dataDir property is already in use to relocate index data, then
> ADDREPLICA and DELETEREPLICA would be a great way to go.  I would not
> expect most SolrCloud users to use that method.
>
> I really don't understand this. Each Solr replica has an associated
> dataDir whether you specified it or not (the default is relative to
> the core.properties file). ADDREPLICA creates a new replica in a new
> place, initially the data directory and index are empty. The new
> replica goes into recovery and uses the standard replication process
> to copy the index via HTTP from a healthy replica and write it to its
> data directory. Once that's done, the replica becomes live. There's
> nothing about dataDir already being in use here at all.
>
> When you start Solr there's the default place Solr expects to find the
> replicas. This is not necessarily where Solr is executing from, see
> the "-s" option in bin/solr start -s.
>
> If you're talking about using dataDir to point to an existing index,
> yes that would be a problem and not something I meant to imply at all.
>
> Why wouldn't most SolrCloud users use ADDREPLICA/DELTEREPLICA? It's
> commonly used to more replicas around a cluster.
>
> Best,
> Erick
>
> On Fri, Aug 4, 2017 at 11:15 AM, Shawn Heisey  wrote:
> > On 8/2/2017 9:17 AM, Erick Erickson wrote:
> >> Not entirely sure about AWS intricacies, but getting a new replica to
> >> use a particular index directory in the general case is just
> >> specifying dataDir=some_directory on the ADDREPLICA command. The index
> >> just needs an HTTP connection (uses the old replication process) so
> >> nothing huge there. Then DELETEREPLICA for the old one. There's
> >> nothing that ZK has to know about to make this work, it's all local to
> >> the Solr instance.
> >
> > I was envisioning a scenario where the entire solr home is on the old
> > volume that's going away.  If I were setting up a Solr install where the
> > large/fast storage was a separate filesystem, I would put the solr home
> > (or possibly even the entire install) under that mount point.  It would
> > be a lot easier than setting dataDir in core.properties for every core,
> > especially in a cloud install.
> >
> > If the dataDir property is already in use to relocate index data, then
> > ADDREPLICA and DELETEREPLICA would be a great way to go.  I would not
> > expect most SolrCloud users to use that method.
> >
> > Thanks,
> > Shawn
> >
>


Building Solr index from AEM using and ELB

2017-08-09 Thread Wahlgren Peter
I am looking for lessons learned or problems seen when building a Solr index 
from AEM using a Solr cluster with content passing through an ELB.

Our configuration is AEM 6.1 indexing to a cluster of Solr servers running 
version 4.7.1. When building an index with a smaller data set - 4 million 
items, AEM sends the content in about 3 minutes and the index is built without 
issue. When building an index for 14 million items, AEM sends the content in 
about 9 minutes. The Solr server error log records errors of EofException. When 
a single Solr server is used and the ELB bypassed, the index is built in about 
1.75 hours with no errors.

Thanks for your comments and suggestions.

Pete


Re: Move index directory to another partition

2017-08-05 Thread Erick Erickson
bq: I was envisioning a scenario where the entire solr home is on the old
volume that's going away.  If I were setting up a Solr install where the
large/fast storage was a separate filesystem, I would put the solr home
(or possibly even the entire install) under that mount point.  It would
be a lot easier than setting dataDir in core.properties for every core,
especially in a cloud install.

Agreed. Nothing in what I said precludes this. If you don't specify dataDir,
then the index for a new replica goes in the default place, i.e. under
your install
directory usually. In your case under your new mount point. I usually don't
recommend trying to take control of where dataDir points, just let it default.
I only mentioned it so you'd be aware it exists. So if your new install
is associated with a bigger/better/larger EBS it's all automatic.

bq: If the dataDir property is already in use to relocate index data, then
ADDREPLICA and DELETEREPLICA would be a great way to go.  I would not
expect most SolrCloud users to use that method.

I really don't understand this. Each Solr replica has an associated
dataDir whether you specified it or not (the default is relative to
the core.properties file). ADDREPLICA creates a new replica in a new
place, initially the data directory and index are empty. The new
replica goes into recovery and uses the standard replication process
to copy the index via HTTP from a healthy replica and write it to its
data directory. Once that's done, the replica becomes live. There's
nothing about dataDir already being in use here at all.

When you start Solr there's the default place Solr expects to find the
replicas. This is not necessarily where Solr is executing from, see
the "-s" option in bin/solr start -s.

If you're talking about using dataDir to point to an existing index,
yes that would be a problem and not something I meant to imply at all.

Why wouldn't most SolrCloud users use ADDREPLICA/DELTEREPLICA? It's
commonly used to more replicas around a cluster.

Best,
Erick

On Fri, Aug 4, 2017 at 11:15 AM, Shawn Heisey  wrote:
> On 8/2/2017 9:17 AM, Erick Erickson wrote:
>> Not entirely sure about AWS intricacies, but getting a new replica to
>> use a particular index directory in the general case is just
>> specifying dataDir=some_directory on the ADDREPLICA command. The index
>> just needs an HTTP connection (uses the old replication process) so
>> nothing huge there. Then DELETEREPLICA for the old one. There's
>> nothing that ZK has to know about to make this work, it's all local to
>> the Solr instance.
>
> I was envisioning a scenario where the entire solr home is on the old
> volume that's going away.  If I were setting up a Solr install where the
> large/fast storage was a separate filesystem, I would put the solr home
> (or possibly even the entire install) under that mount point.  It would
> be a lot easier than setting dataDir in core.properties for every core,
> especially in a cloud install.
>
> If the dataDir property is already in use to relocate index data, then
> ADDREPLICA and DELETEREPLICA would be a great way to go.  I would not
> expect most SolrCloud users to use that method.
>
> Thanks,
> Shawn
>


Re: Move index directory to another partition

2017-08-04 Thread Shawn Heisey
On 8/2/2017 9:17 AM, Erick Erickson wrote:
> Not entirely sure about AWS intricacies, but getting a new replica to
> use a particular index directory in the general case is just
> specifying dataDir=some_directory on the ADDREPLICA command. The index
> just needs an HTTP connection (uses the old replication process) so
> nothing huge there. Then DELETEREPLICA for the old one. There's
> nothing that ZK has to know about to make this work, it's all local to
> the Solr instance.

I was envisioning a scenario where the entire solr home is on the old
volume that's going away.  If I were setting up a Solr install where the
large/fast storage was a separate filesystem, I would put the solr home
(or possibly even the entire install) under that mount point.  It would
be a lot easier than setting dataDir in core.properties for every core,
especially in a cloud install.

If the dataDir property is already in use to relocate index data, then
ADDREPLICA and DELETEREPLICA would be a great way to go.  I would not
expect most SolrCloud users to use that method.

Thanks,
Shawn



Re: mixed index with commongrams

2017-08-03 Thread David Hastings
Haven't really looked much into that, here is a snipped form todays gc log,
if you wouldn't mind shedding any details on it:

2017-08-03T11:46:16.265-0400: 3200938.383: [GC (Allocation Failure)
2017-08-03T11:46:16.265-0400: 3200938.383: [ParNew
Desired survivor size 1966060336 bytes, new threshold 8 (max 8)
- age   1:  128529184 bytes,  128529184 total
- age   2:   43075632 bytes,  171604816 total
- age   3:   64402592 bytes,  236007408 total
- age   4:   35621704 bytes,  271629112 total
- age   5:   44285584 bytes,  315914696 total
- age   6:   45372512 bytes,  361287208 total
- age   7:   41975368 bytes,  403262576 total
- age   8:   72959688 bytes,  47664 total
: 9133992K->577219K(1088K), 0.2730329 secs]
23200886K->14693007K(49066688K), 0.2732690 secs] [Times: user=2.01
sys=0.01, real=0.28 secs]
Heap after GC invocations=12835 (full 109):
 par new generation   total 1088K, used 577219K [0x7f802300,
0x7f833040, 0x7f833040)
  eden space 8533376K,   0% used [0x7f802300, 0x7f802300,
0x7f822bd6)
  from space 2133312K,  27% used [0x7f82ae0b, 0x7f82d1460d98,
0x7f833040)
  to   space 2133312K,   0% used [0x7f822bd6, 0x7f822bd6,
0x7f82ae0b)
 concurrent mark-sweep generation total 3840K, used 14115788K
[0x7f833040, 0x7f8c5800, 0x7f8c5800)
 Metaspace   used 36698K, capacity 37169K, committed 37512K, reserved
38912K
}





On Thu, Aug 3, 2017 at 11:58 AM, Walter Underwood 
wrote:

> How long are your GC pauses? Those affect all queries, so they make the
> 99th percentile slow with queries that should be fast.
>
> The G1 collector has helped our 99th percentile.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Aug 3, 2017, at 8:48 AM, David Hastings 
> wrote:
> >
> > Thanks, thats what i kind of expected.  still debating whether the space
> > increase is worth it, right now Im at .7% of searches taking longer than
> 10
> > seconds, and 6% taking longer than 1, so when i see things like this in
> the
> > morning it bugs me a bit:
> >
> > 2017-08-02 11:50:48 : 58979/1000 secs : ("Rules of Practice for the
> Courts
> > of Equity of the United States")
> > 2017-08-02 02:16:36 : 54749/1000 secs : ("The American Cause")
> > 2017-08-02 19:27:58 : 54561/1000 secs : ("register of the department of
> > justice")
> >
> > which could all be annihilated with CG's, at the expense, according to
> HT,
> > of a 40% increase in index size.
> >
> >
> >
> > On Thu, Aug 3, 2017 at 11:21 AM, Erick Erickson  >
> > wrote:
> >
> >> bq: will that search still return results form the earlier documents
> >> as well as the new ones
> >>
> >> In a word, "no". By definition the analysis chain applied at index
> >> time puts tokens in the index and that's all you have to search
> >> against for the doc unless and until you re-index the document.
> >>
> >> You really have two choices here:
> >> 1> live with the differing results until you get done re-indexing
> >> 2> index to an offline collection and then use, say, collection
> >> aliasing to make the switch atomically.
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Aug 3, 2017 at 8:07 AM, David Hastings
> >>  wrote:
> >>> Hey all, I have yet to run an experiment to test this but was wondering
> >> if
> >>> anyone knows the answer ahead of time.
> >>> If i have an index built with documents before implementing the
> >> commongrams
> >>> filter, then enable it, and start adding documents that have the
> >>> filter/tokenizer applied, will searches that fit the criteria, for
> >> example:
> >>> "to be or not to be"
> >>> will that search still return results form the earlier documents as
> well
> >> as
> >>> the new ones?  The idea is that a full re-index is going to be
> difficult,
> >>> so would rather do it over time by replacing large numbers of documents
> >>> incrementally.  Thanks,
> >>> Dave
> >>
>
>


Re: mixed index with commongrams

2017-08-03 Thread Walter Underwood
How long are your GC pauses? Those affect all queries, so they make the 99th 
percentile slow with queries that should be fast.

The G1 collector has helped our 99th percentile.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Aug 3, 2017, at 8:48 AM, David Hastings  
> wrote:
> 
> Thanks, thats what i kind of expected.  still debating whether the space
> increase is worth it, right now Im at .7% of searches taking longer than 10
> seconds, and 6% taking longer than 1, so when i see things like this in the
> morning it bugs me a bit:
> 
> 2017-08-02 11:50:48 : 58979/1000 secs : ("Rules of Practice for the Courts
> of Equity of the United States")
> 2017-08-02 02:16:36 : 54749/1000 secs : ("The American Cause")
> 2017-08-02 19:27:58 : 54561/1000 secs : ("register of the department of
> justice")
> 
> which could all be annihilated with CG's, at the expense, according to HT,
> of a 40% increase in index size.
> 
> 
> 
> On Thu, Aug 3, 2017 at 11:21 AM, Erick Erickson 
> wrote:
> 
>> bq: will that search still return results form the earlier documents
>> as well as the new ones
>> 
>> In a word, "no". By definition the analysis chain applied at index
>> time puts tokens in the index and that's all you have to search
>> against for the doc unless and until you re-index the document.
>> 
>> You really have two choices here:
>> 1> live with the differing results until you get done re-indexing
>> 2> index to an offline collection and then use, say, collection
>> aliasing to make the switch atomically.
>> 
>> Best,
>> Erick
>> 
>> On Thu, Aug 3, 2017 at 8:07 AM, David Hastings
>>  wrote:
>>> Hey all, I have yet to run an experiment to test this but was wondering
>> if
>>> anyone knows the answer ahead of time.
>>> If i have an index built with documents before implementing the
>> commongrams
>>> filter, then enable it, and start adding documents that have the
>>> filter/tokenizer applied, will searches that fit the criteria, for
>> example:
>>> "to be or not to be"
>>> will that search still return results form the earlier documents as well
>> as
>>> the new ones?  The idea is that a full re-index is going to be difficult,
>>> so would rather do it over time by replacing large numbers of documents
>>> incrementally.  Thanks,
>>> Dave
>> 



Re: mixed index with commongrams

2017-08-03 Thread David Hastings
Thanks, thats what i kind of expected.  still debating whether the space
increase is worth it, right now Im at .7% of searches taking longer than 10
seconds, and 6% taking longer than 1, so when i see things like this in the
morning it bugs me a bit:

2017-08-02 11:50:48 : 58979/1000 secs : ("Rules of Practice for the Courts
of Equity of the United States")
2017-08-02 02:16:36 : 54749/1000 secs : ("The American Cause")
2017-08-02 19:27:58 : 54561/1000 secs : ("register of the department of
justice")

which could all be annihilated with CG's, at the expense, according to HT,
of a 40% increase in index size.



On Thu, Aug 3, 2017 at 11:21 AM, Erick Erickson 
wrote:

> bq: will that search still return results form the earlier documents
> as well as the new ones
>
> In a word, "no". By definition the analysis chain applied at index
> time puts tokens in the index and that's all you have to search
> against for the doc unless and until you re-index the document.
>
> You really have two choices here:
> 1> live with the differing results until you get done re-indexing
> 2> index to an offline collection and then use, say, collection
> aliasing to make the switch atomically.
>
> Best,
> Erick
>
> On Thu, Aug 3, 2017 at 8:07 AM, David Hastings
>  wrote:
> > Hey all, I have yet to run an experiment to test this but was wondering
> if
> > anyone knows the answer ahead of time.
> > If i have an index built with documents before implementing the
> commongrams
> > filter, then enable it, and start adding documents that have the
> > filter/tokenizer applied, will searches that fit the criteria, for
> example:
> > "to be or not to be"
> > will that search still return results form the earlier documents as well
> as
> > the new ones?  The idea is that a full re-index is going to be difficult,
> > so would rather do it over time by replacing large numbers of documents
> > incrementally.  Thanks,
> > Dave
>


Re: mixed index with commongrams

2017-08-03 Thread Erick Erickson
bq: will that search still return results form the earlier documents
as well as the new ones

In a word, "no". By definition the analysis chain applied at index
time puts tokens in the index and that's all you have to search
against for the doc unless and until you re-index the document.

You really have two choices here:
1> live with the differing results until you get done re-indexing
2> index to an offline collection and then use, say, collection
aliasing to make the switch atomically.

Best,
Erick

On Thu, Aug 3, 2017 at 8:07 AM, David Hastings
 wrote:
> Hey all, I have yet to run an experiment to test this but was wondering if
> anyone knows the answer ahead of time.
> If i have an index built with documents before implementing the commongrams
> filter, then enable it, and start adding documents that have the
> filter/tokenizer applied, will searches that fit the criteria, for example:
> "to be or not to be"
> will that search still return results form the earlier documents as well as
> the new ones?  The idea is that a full re-index is going to be difficult,
> so would rather do it over time by replacing large numbers of documents
> incrementally.  Thanks,
> Dave


mixed index with commongrams

2017-08-03 Thread David Hastings
Hey all, I have yet to run an experiment to test this but was wondering if
anyone knows the answer ahead of time.
If i have an index built with documents before implementing the commongrams
filter, then enable it, and start adding documents that have the
filter/tokenizer applied, will searches that fit the criteria, for example:
"to be or not to be"
will that search still return results form the earlier documents as well as
the new ones?  The idea is that a full re-index is going to be difficult,
so would rather do it over time by replacing large numbers of documents
incrementally.  Thanks,
Dave


Re: Custom Sort option to apply at SOLR index

2017-08-02 Thread Erick Erickson
I guess I don't see the problem, just store it as a string and sort on
the field.

# sorts before numbers which sort before characters. Or I'm reading
the ASCII chart wrong.

Best,
Erick

On Wed, Aug 2, 2017 at 6:55 AM, padmanabhan
 wrote:
> Hello Solr Geeks,
>
> Am newbie to SOLR. I have a requirement as given below, Could any one please
> provide some insights on how to go about on this.
>
> "Ascending by name" (#, 0 - 9, A - Z)
>
> "Descending by name" (Z - A, 9 - 0, #)
>
> Sample name value can be
>
> ABCD5678
> 1234ABCD
> #2345ABCD
> #1234ABCD
> 5678ABCD
> #2345ACBD
> 5678EFGH
> #2345DBCA
> ABCD1234
> 1234#ABCD
>
> *Expected Ascending order*
>
> #2345ABCD
> #2345ACBD
> #2345DBCA
> 1234#ABCD
> 1234ABCD
> 5678ABCD
> ABCD1234
> ABCD5678
>
> *Expected Descending order*
>
> ABCD5678
> ABCD1234
> 5678ABCD
> 1234ABCD
> 1234#ABCD
> #2345DBCA
> #2345ACBD
> #2345ABCD
>
> Thanks & Regards,
> Paddy
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Custom-Sort-option-to-apply-at-SOLR-index-tp4348787.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Move index directory to another partition

2017-08-02 Thread Erick Erickson
Shawn:

Not entirely sure about AWS intricacies, but getting a new replica to
use a particular index directory in the general case is just
specifying dataDir=some_directory on the ADDREPLICA command. The index
just needs an HTTP connection (uses the old replication process) so
nothing huge there. Then DELETEREPLICA for the old one. There's
nothing that ZK has to know about to make this work, it's all local to
the Solr instance.

Or I'm completely out in the weeds.

Best,
Erick

On Tue, Aug 1, 2017 at 7:52 PM, Dave  wrote:
> To add to this, not sure of solr cloud uses it, but you're going to want to 
> destroy the wrote.lock file as well
>
>> On Aug 1, 2017, at 9:31 PM, Shawn Heisey  wrote:
>>
>>> On 8/1/2017 7:09 PM, Erick Erickson wrote:
>>> WARNING: what I currently understand about the limitations of AWS
>>> could fill volumes so I might be completely out to lunch.
>>>
>>> If you ADDREPLICA with the new replica's  data residing on the new EBS
>>> volume, then wait for it to sync (which it'll do all by itself) then
>>> DELETEREPLICA on the original you'll be all set.
>>>
>>> In recent Solr's, theres also the MOVENODE collections API call.
>>
>> I did consider mentioning that as a possible way forward, but I hate to
>> rely on special configurations with core.properties, particularly if the
>> newly built replica core instanceDirs aren't in the solr home (or
>> coreRootDirectory) at all.  I didn't want to try and explain the precise
>> steps required to get that plan to work.  I would expect to need some
>> arcane Collections API work or manual ZK modification to reach a correct
>> state -- steps that would be prone to error.
>>
>> The idea I mentioned seemed to me to be the way forward that would
>> require the least specialized knowledge.  Here's a simplified stating of
>> the steps:
>>
>> * Mount the new volume somewhere.
>> * Use multiple rsync passes to get the data copied.
>> * Stop Solr.
>> * Do a final rsync pass.
>> * Unmount the original volume.
>> * Remount the new volume in the original location.
>> * Start Solr.
>>
>> Thanks,
>> Shawn
>>


RE: Solr Index issue on string type while querying

2017-08-02 Thread padmanabhan
Thank you Matt for the reply. my apologize on the clarity about the problem
statement. 

The problem was with the source attribute value defined at the source
system.

Source system with the 

heightSquareTube_string_mv: > 90 - 100 mm

Solr index converts the xml or html code to its symbol equivalent. 

heightSquareTube_string_mv: > 90 - 100 mm



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Index-issue-on-string-type-while-querying-tp4335340p4348788.html
Sent from the Solr - User mailing list archive at Nabble.com.


Custom Sort option to apply at SOLR index

2017-08-02 Thread padmanabhan
Hello Solr Geeks,

Am newbie to SOLR. I have a requirement as given below, Could any one please
provide some insights on how to go about on this. 

"Ascending by name" (#, 0 - 9, A - Z)

"Descending by name" (Z - A, 9 - 0, #)

Sample name value can be 

ABCD5678
1234ABCD
#2345ABCD
#1234ABCD
5678ABCD
#2345ACBD
5678EFGH
#2345DBCA
ABCD1234
1234#ABCD

*Expected Ascending order*

#2345ABCD
#2345ACBD
#2345DBCA
1234#ABCD
1234ABCD
5678ABCD
ABCD1234
ABCD5678

*Expected Descending order*

ABCD5678
ABCD1234
5678ABCD
1234ABCD
1234#ABCD
#2345DBCA
#2345ACBD
#2345ABCD

Thanks & Regards,
Paddy



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-Sort-option-to-apply-at-SOLR-index-tp4348787.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Move index directory to another partition

2017-08-01 Thread Dave
To add to this, not sure of solr cloud uses it, but you're going to want to 
destroy the wrote.lock file as well

> On Aug 1, 2017, at 9:31 PM, Shawn Heisey  wrote:
> 
>> On 8/1/2017 7:09 PM, Erick Erickson wrote:
>> WARNING: what I currently understand about the limitations of AWS
>> could fill volumes so I might be completely out to lunch.
>> 
>> If you ADDREPLICA with the new replica's  data residing on the new EBS
>> volume, then wait for it to sync (which it'll do all by itself) then
>> DELETEREPLICA on the original you'll be all set.
>> 
>> In recent Solr's, theres also the MOVENODE collections API call.
> 
> I did consider mentioning that as a possible way forward, but I hate to
> rely on special configurations with core.properties, particularly if the
> newly built replica core instanceDirs aren't in the solr home (or
> coreRootDirectory) at all.  I didn't want to try and explain the precise
> steps required to get that plan to work.  I would expect to need some
> arcane Collections API work or manual ZK modification to reach a correct
> state -- steps that would be prone to error.
> 
> The idea I mentioned seemed to me to be the way forward that would
> require the least specialized knowledge.  Here's a simplified stating of
> the steps:
> 
> * Mount the new volume somewhere.
> * Use multiple rsync passes to get the data copied.
> * Stop Solr.
> * Do a final rsync pass.
> * Unmount the original volume.
> * Remount the new volume in the original location.
> * Start Solr.
> 
> Thanks,
> Shawn
> 


Re: Move index directory to another partition

2017-08-01 Thread Shawn Heisey
On 8/1/2017 7:09 PM, Erick Erickson wrote:
> WARNING: what I currently understand about the limitations of AWS
> could fill volumes so I might be completely out to lunch.
>
> If you ADDREPLICA with the new replica's  data residing on the new EBS
> volume, then wait for it to sync (which it'll do all by itself) then
> DELETEREPLICA on the original you'll be all set.
>
> In recent Solr's, theres also the MOVENODE collections API call.

I did consider mentioning that as a possible way forward, but I hate to
rely on special configurations with core.properties, particularly if the
newly built replica core instanceDirs aren't in the solr home (or
coreRootDirectory) at all.  I didn't want to try and explain the precise
steps required to get that plan to work.  I would expect to need some
arcane Collections API work or manual ZK modification to reach a correct
state -- steps that would be prone to error.

The idea I mentioned seemed to me to be the way forward that would
require the least specialized knowledge.  Here's a simplified stating of
the steps:

* Mount the new volume somewhere.
* Use multiple rsync passes to get the data copied.
* Stop Solr.
* Do a final rsync pass.
* Unmount the original volume.
* Remount the new volume in the original location.
* Start Solr.

Thanks,
Shawn



Re: Move index directory to another partition

2017-08-01 Thread Erick Erickson
WARNING: what I currently understand about the limitations of AWS
could fill volumes so I might be completely out to lunch.

If you ADDREPLICA with the new replica's  data residing on the new EBS
volume, then wait for it to sync (which it'll do all by itself) then
DELETEREPLICA on the original you'll be all set.

In recent Solr's, theres also the MOVENODE collections API call.

Best,
Erick

On Tue, Aug 1, 2017 at 6:03 PM, Shawn Heisey  wrote:
> On 8/1/2017 4:00 PM, Mahmoud Almokadem wrote:
>> I'm using ubuntu and I'll try rsync command. Unfortunately I'm using one
>> replication factor but I think the downtime will be less than five minutes 
>> after following your steps.
>>
>> But how can I start Solr backup or why should I run it although I copied
>> the index and changed theo path?
>>
>> And what do you mean with "Using multiple passes with rsync"?
>
> The first time you copy the data, which you could do with cp if you
> want, the time required will be limited by the size of the data and the
> speed of the disks.  Depending on the size, it could take several hours
> like you estimated.  I would suggest using rsync for the first copy just
> because you're going to need the same command again for the later passes.
>
> Doing a second pass with rsync should go very quickly.  How fast would
> depend on the rate that the index data is changing.  You might need to
> do this step more than once just so that it gets faster each time, in
> preparation for the final pass.
>
> A final pass with rsync might only take a few seconds, and if Solr is
> stopped before that final copy is started, then there's no way the index
> data can change.
>
> Thanks,
> Shawn
>


Re: Move index directory to another partition

2017-08-01 Thread Shawn Heisey
On 8/1/2017 4:00 PM, Mahmoud Almokadem wrote:
> I'm using ubuntu and I'll try rsync command. Unfortunately I'm using one
> replication factor but I think the downtime will be less than five minutes 
> after following your steps.
>
> But how can I start Solr backup or why should I run it although I copied
> the index and changed theo path?
>
> And what do you mean with "Using multiple passes with rsync"?

The first time you copy the data, which you could do with cp if you
want, the time required will be limited by the size of the data and the
speed of the disks.  Depending on the size, it could take several hours
like you estimated.  I would suggest using rsync for the first copy just
because you're going to need the same command again for the later passes.

Doing a second pass with rsync should go very quickly.  How fast would
depend on the rate that the index data is changing.  You might need to
do this step more than once just so that it gets faster each time, in
preparation for the final pass.

A final pass with rsync might only take a few seconds, and if Solr is
stopped before that final copy is started, then there's no way the index
data can change.

Thanks,
Shawn



Re: Move index directory to another partition

2017-08-01 Thread Mahmoud Almokadem
Thanks Shawn,

I'm using ubuntu and I'll try rsync command. Unfortunately I'm using one
replication factor but I think the downtime will be less than five minutes
after following your steps.

But how can I start Solr backup or why should I run it although I copied
the index and changed theo path?

And what do you mean with "Using multiple passes with rsync"?

Thanks,
Mahmoud


On Tuesday, August 1, 2017, Shawn Heisey  wrote:

> On 7/31/2017 12:28 PM, Mahmoud Almokadem wrote:
> > I've a SolrCloud of four instances on Amazon and the EBS volumes that
> > contain the data on everynode is going to be full, unfortunately Amazon
> > doesn't support expanding the EBS. So, I'll attach larger EBS volumes to
> > move the index to.
> >
> > I can stop the updates on the index, but I'm afraid to use "cp" command
> to
> > copy the files that are "on merge" operation.
> >
> > The copy operation may take several  hours.
> >
> > How can I move the data directory without stopping the instance?
>
> Use rsync to do the copy.  Do an initial copy while Solr is running,
> then do a second copy, which should be pretty fast because rsync will
> see the data from the first copy.  Then shut Solr down and do a third
> rsync which will only copy a VERY small changeset.  Reconfigure Solr
> and/or the OS to use the new location, and start Solr back up.  Because
> you mentioned "cp" I am assuming that you're NOT on Windows, and that
> the OS will most likely allow you to do anything you need with index
> files while Solr has them open.
>
> If you have set up your replicas with SolrCloud properly, then your
> collections will not go offline when one Solr instance is shut down, and
> that instance will be brought back into sync with the rest of the
> cluster when it starts back up.  Using multiple passes with rsync should
> mean that Solr will not need to be shutdown for very long.
>
> The options I typically use for this kind of copy with rsync are "-avH
> --delete".  I would recommend that you research rsync options so that
> you fully understand what I have suggested.
>
> Thanks,
> Shawn
>
>


Re: Move index directory to another partition

2017-08-01 Thread Walter Underwood
Way back in the 1.x days, replication was done with shell scripts and rsync, 
right?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Aug 1, 2017, at 2:45 PM, Shawn Heisey  wrote:
> 
> On 7/31/2017 12:28 PM, Mahmoud Almokadem wrote:
>> I've a SolrCloud of four instances on Amazon and the EBS volumes that
>> contain the data on everynode is going to be full, unfortunately Amazon
>> doesn't support expanding the EBS. So, I'll attach larger EBS volumes to
>> move the index to.
>> 
>> I can stop the updates on the index, but I'm afraid to use "cp" command to
>> copy the files that are "on merge" operation.
>> 
>> The copy operation may take several  hours.
>> 
>> How can I move the data directory without stopping the instance?
> 
> Use rsync to do the copy.  Do an initial copy while Solr is running,
> then do a second copy, which should be pretty fast because rsync will
> see the data from the first copy.  Then shut Solr down and do a third
> rsync which will only copy a VERY small changeset.  Reconfigure Solr
> and/or the OS to use the new location, and start Solr back up.  Because
> you mentioned "cp" I am assuming that you're NOT on Windows, and that
> the OS will most likely allow you to do anything you need with index
> files while Solr has them open.
> 
> If you have set up your replicas with SolrCloud properly, then your
> collections will not go offline when one Solr instance is shut down, and
> that instance will be brought back into sync with the rest of the
> cluster when it starts back up.  Using multiple passes with rsync should
> mean that Solr will not need to be shutdown for very long.
> 
> The options I typically use for this kind of copy with rsync are "-avH
> --delete".  I would recommend that you research rsync options so that
> you fully understand what I have suggested.
> 
> Thanks,
> Shawn
> 



Re: Move index directory to another partition

2017-08-01 Thread Shawn Heisey
On 7/31/2017 12:28 PM, Mahmoud Almokadem wrote:
> I've a SolrCloud of four instances on Amazon and the EBS volumes that
> contain the data on everynode is going to be full, unfortunately Amazon
> doesn't support expanding the EBS. So, I'll attach larger EBS volumes to
> move the index to.
>
> I can stop the updates on the index, but I'm afraid to use "cp" command to
> copy the files that are "on merge" operation.
>
> The copy operation may take several  hours.
>
> How can I move the data directory without stopping the instance?

Use rsync to do the copy.  Do an initial copy while Solr is running,
then do a second copy, which should be pretty fast because rsync will
see the data from the first copy.  Then shut Solr down and do a third
rsync which will only copy a VERY small changeset.  Reconfigure Solr
and/or the OS to use the new location, and start Solr back up.  Because
you mentioned "cp" I am assuming that you're NOT on Windows, and that
the OS will most likely allow you to do anything you need with index
files while Solr has them open.

If you have set up your replicas with SolrCloud properly, then your
collections will not go offline when one Solr instance is shut down, and
that instance will be brought back into sync with the rest of the
cluster when it starts back up.  Using multiple passes with rsync should
mean that Solr will not need to be shutdown for very long.

The options I typically use for this kind of copy with rsync are "-avH
--delete".  I would recommend that you research rsync options so that
you fully understand what I have suggested.

Thanks,
Shawn



Move index directory to another partition

2017-07-31 Thread Mahmoud Almokadem
Hello,

I've a SolrCloud of four instances on Amazon and the EBS volumes that
contain the data on everynode is going to be full, unfortunately Amazon
doesn't support expanding the EBS. So, I'll attach larger EBS volumes to
move the index to.

I can stop the updates on the index, but I'm afraid to use "cp" command to
copy the files that are "on merge" operation.

The copy operation may take several  hours.

How can I move the data directory without stopping the instance?

Thanks,
Mahmoud


Re: index version - replicable versus searching

2017-07-25 Thread Erick Erickson
Ronald:

Actually, people generally don't search on master ;). The idea is that
master is configured for heavy indexing and then people search on the
slaves which are configured for heavy query loads (e.g. memory,
autowarming, whatever may be different). Which is it's own problem
since the time the slaves poll won't necessarily be the exact same
wall-clock time.

SolrCloud doesn't use replication except in certain recovery
scenarios. In normal operations, documents are forwarded to each
replica and indexed separately on all nodes. That's about the only way
to support Near Real Time.

Best,
Erick

On Tue, Jul 25, 2017 at 9:39 AM, Stanonik, Ronald  wrote:
> Bingo!  Right on both counts!  opensearcher was false.  When I changed it to 
> true, then I could see that master(searching) and master(replicable) both 
> changed.  And autocommit.maxtime is causing a commit on the master.
>
> Who uses master(replicable)?  It seems for my simple master/slave 
> configuration master(searching) is the relevant version.  Maybe solr cloud 
> uses master(replicable)?
>
> Thanks,
>
> Ron
>
>


RE: index version - replicable versus searching

2017-07-25 Thread Stanonik, Ronald
Bingo!  Right on both counts!  opensearcher was false.  When I changed it to 
true, then I could see that master(searching) and master(replicable) both 
changed.  And autocommit.maxtime is causing a commit on the master.

Who uses master(replicable)?  It seems for my simple master/slave configuration 
master(searching) is the relevant version.  Maybe solr cloud uses 
master(replicable)?

Thanks,

Ron




Re: Lucene index corruption and recovery

2017-07-25 Thread sputul
Another sanity check. With deletion, only option would be to reindex those
documents. Could someone please let me know if I am missing anything or if I
am on track here. Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Lucene-index-corruption-and-recovery-tp4347439p4347528.html
Sent from the Solr - User mailing list archive at Nabble.com.


Lucene index corruption and recovery

2017-07-24 Thread Putul S
While trying to upgrade 100G index from Solr 4 to 5, check index (actually
updater) indicates that the index is corrupted. Hence, I ran  check index
to fix the index which showed broken segment warning and then deleted those
documents. I then ran index update on the fixed index which upgraded fine
without any error (need to setup Solr/ZK to test though).

WARNING: 2 broken segments (containing 5 documents) detected

Is there an easy way to figure out which documents (by ID) got deleted, or
I need to compare document IDs in old and new index?

Also, what does broken segments mean with respect to querying documents?
Are those documents still searchable in corrupted index as long as the
segments are not deleted?

Note a few small test indexes had no issues with corruption or upgrade.
Large index problem could be related to memory or network issues.

Thanks in advance.


Re: index version - replicable versus searching

2017-07-24 Thread Erick Erickson
 Actually, I'm surprised that the slave returns the new document and I
suspect that there's actually a commit on the master, but no new
searcher is being opened.

On replication, the slave copies all _closed_ segments from the master
whether or not they have been opened for searching. Hmmm, a little
arcane.

Here's a long blog on the subject:
https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

But...
Whenever you hard commit (often configured in solrconfig.xml) you have
a choice whether opensearcher=true|false. _IF_ opensearcher=false, the
current segment is closed but the docs are not searchable yet.

When the slave does a replication, it copies all closed segments and
opens a new searcher on them. So here's one possibility:

1> you added some docs on the master but your solrconfig has an
autocommit setting that tripped in and has openSearcher=false. This
closed all open segments (i.e. the segments with the new docs)

2> the slave replicated the closed segments and opened a new searcher
on the index, so it shows the new docs

3> the master still hasn't opened a new searcher so continues to not
be able to see the new documents.

Is that possible?
Erick

On Mon, Jul 24, 2017 at 3:04 PM, Stanonik, Ronald  wrote:
> I'm testing replication on solr 5.5.0.
>
> I set up one master and one slave.
>
> The index versions match; that is, master(replicable), master(searching), and 
> slave(searching) are the same.
>
> I make a change to the index on the master, but do not commit yet.
>
> As expected, the version master(replicable) changes, but not 
> master(searching).
>
> If I "replicate now" on the slave, then slave(searching) matches 
> master(replicable), which seems wrong because the slave now returns answers 
> from master(replicable), while the master returns answers from 
> master(searching).
>
> Shouldn't the slave continue to return answers from master(searching), so 
> that master and slave return the same answers?
>
> What do I not understand?  The documentation I found about replication 
> doesn't seem to explain in depth how the versions are affected by changes and 
> commit.
>
> Thanks,
>
> Ron


index version - replicable versus searching

2017-07-24 Thread Stanonik, Ronald
I'm testing replication on solr 5.5.0.

I set up one master and one slave.

The index versions match; that is, master(replicable), master(searching), and 
slave(searching) are the same.

I make a change to the index on the master, but do not commit yet.

As expected, the version master(replicable) changes, but not master(searching).

If I "replicate now" on the slave, then slave(searching) matches 
master(replicable), which seems wrong because the slave now returns answers 
from master(replicable), while the master returns answers from 
master(searching).

Shouldn't the slave continue to return answers from master(searching), so that 
master and slave return the same answers?

What do I not understand?  The documentation I found about replication doesn't 
seem to explain in depth how the versions are affected by changes and commit.

Thanks,

Ron


Re: index new discovered fileds of different types

2017-07-10 Thread Jan Høydahl
I think Thaer’s answer clarify how they do it.
So at the time they assemble the full Solr doc to index, there may be a new 
field name not known in advance,
but to my understanding the RDF source contains information on the type (else 
they could not do the mapping
to dynamic field either) and so adding a field to the managed schema on the fly 
once an unknown field is detected
should work just fine!

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 10. jul. 2017 kl. 02.08 skrev Rick Leir :
> 
> Jan
> 
> I hope this is not off-topic, but I am curious: if you do not use the three 
> fields, subject, predicate, and object for indexing RDF
> then what is your algorithm? Maybe document nesting is appropriate for this? 
> cheers -- Rick
> 
> 
> On 2017-07-09 05:52 PM, Jan Høydahl wrote:
>> Hi,
>> 
>> I have personally written a Python script to parse RDF files into an 
>> in-memory graph structure and then pull data from that structure to index to 
>> Solr.
>> I.e. you may perfectly well have RDF (nt, turtle, whatever) as source but 
>> index sub structures in very specific ways.
>> Anyway, as Erick points out, that’s probably where in your code that you 
>> should use Managed Schema REST API in order to
>> 1. Query Solr for what fields are defined
>> 2. If you need to index a field that is not yet in Solr, add it, using the 
>> correct field type (your app should know)
>> 3. Push the data
>> 4. Repeat
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>>> 8. jul. 2017 kl. 02.36 skrev Rick Leir :
>>> 
>>> Thaer
>>> Whoa, hold everything! You said RDF, meaning resource description 
>>> framework? If so, you have exactly​ three fields: subject, predicate, and 
>>> object. Maybe they are text type, or for exact matches you might want 
>>> string fields. Add an ID field, which could be automatically generated by 
>>> Solr, so now you have four fields. Or am I on a tangent again? Cheers -- 
>>> Rick
>>> 
>>> On July 7, 2017 6:01:00 AM EDT, Thaer Sammar  wrote:
>>>> Hi Jan,
>>>> 
>>>> Thanks!, I am exploring the schemaless option based on Furkan
>>>> suggestion. I
>>>> need the the flexibility because not all fields are known. We get the
>>>> data
>>>> from RDF database (which changes continuously). To be more specific, we
>>>> have a database and all changes on it are sent to a kafka queue. and we
>>>> have a consumer which listen to the queue and update the Solr index.
>>>> 
>>>> regards,
>>>> Thaer
>>>> 
>>>> On 7 July 2017 at 10:53, Jan Høydahl  wrote:
>>>> 
>>>>> If you do not need the flexibility of dynamic fields, don’t use them.
>>>>> Sounds to me that you really want a field “price” to be float and a
>>>> field
>>>>> “birthdate” to be of type date etc.
>>>>> If so, simply create your schema (either manually, through Schema API
>>>> or
>>>>> using schemaless) up front and index each field as correct type
>>>> without
>>>>> messing with field name prefixes.
>>>>> 
>>>>> --
>>>>> Jan Høydahl, search solution architect
>>>>> Cominvent AS - www.cominvent.com
>>>>> 
>>>>>> 5. jul. 2017 kl. 15.23 skrev Thaer Sammar :
>>>>>> 
>>>>>> Hi,
>>>>>> We are trying to index documents of different types. Document have
>>>>> different fields. fields are known at indexing time. We run a query
>>>> on a
>>>>> database and we index what comes using query variables as field names
>>>> in
>>>>> solr. Our current solution: we use dynamic fields with prefix, for
>>>> example
>>>>> feature_i_*, the issue with that
>>>>>> 1) we need to define the type of the dynamic field and to be able
>>>> to
>>>>> cover the type of discovered fields we define the following
>>>>>> feature_i_* for integers, feature_t_* for string, feature_d_* for
>>>>> double, 
>>>>>> 1.a) this means we need to check the type of the discovered field
>>>> and
>>>>> then put in the corresponding dynamic field
>>>>>> 2) at search time, we need to know the right prefix
>>>>>> We are looking for help to find away to ignore the prefix and check
>>>> of
>>>>> the type
>>>>>> regards,
>>>>>> Thaer
>>>>> 
>>> -- 
>>> Sorry for being brief. Alternate email is rickleir at yahoo dot com
> 



Re: index new discovered fileds of different types

2017-07-10 Thread Thaer Sammar
Hi Rick,

yes the RDF structure has subject, predicate and object. The object data
type is not only text, it can be integer or double as well or other data
types. The structure of our solar document doesn't only contain these three
fields. We compose one document per subject and we use all found objects as
fields. Currently, in the schema we define two static fields uri (subject)
and geo filed which contain the geographic point. When we find a message in
the kafka queue, which means something change in the DB, we query DB to get
all subject,predicate,object of the found subjects, based on that we create
the document. For example, for subjects s1 and s2, we might get the
following from the DB

s1,geo,(latitude, longitude)
s1,are,200.0
s1,type,office
s2,geo,(latitude, longitude)

for s1, there are more information available and we like to include it in
the solr doc, therefore we used the dynamic filed
feature_double_*, and feature_text_*. based on the object data type we add
to appropriate dynamic field


s1
(latitude,longitude)
200.0
office

 we appended the predicate name with dynamic filed prefix, and we used pdf
data type to decide which dynamic filed to use

regards,
Thaer

On 8 July 2017 at 02:36, Rick Leir  wrote:

> Thaer
> Whoa, hold everything! You said RDF, meaning resource description
> framework? If so, you have exactly​ three fields: subject, predicate, and
> object. Maybe they are text type, or for exact matches you might want
> string fields. Add an ID field, which could be automatically generated by
> Solr, so now you have four fields. Or am I on a tangent again? Cheers --
> Rick
>
> On July 7, 2017 6:01:00 AM EDT, Thaer Sammar  wrote:
> >Hi Jan,
> >
> >Thanks!, I am exploring the schemaless option based on Furkan
> >suggestion. I
> >need the the flexibility because not all fields are known. We get the
> >data
> >from RDF database (which changes continuously). To be more specific, we
> >have a database and all changes on it are sent to a kafka queue. and we
> >have a consumer which listen to the queue and update the Solr index.
> >
> >regards,
> >Thaer
> >
> >On 7 July 2017 at 10:53, Jan Høydahl  wrote:
> >
> >> If you do not need the flexibility of dynamic fields, don’t use them.
> >> Sounds to me that you really want a field “price” to be float and a
> >field
> >> “birthdate” to be of type date etc.
> >> If so, simply create your schema (either manually, through Schema API
> >or
> >> using schemaless) up front and index each field as correct type
> >without
> >> messing with field name prefixes.
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >>
> >> > 5. jul. 2017 kl. 15.23 skrev Thaer Sammar :
> >> >
> >> > Hi,
> >> > We are trying to index documents of different types. Document have
> >> different fields. fields are known at indexing time. We run a query
> >on a
> >> database and we index what comes using query variables as field names
> >in
> >> solr. Our current solution: we use dynamic fields with prefix, for
> >example
> >> feature_i_*, the issue with that
> >> > 1) we need to define the type of the dynamic field and to be able
> >to
> >> cover the type of discovered fields we define the following
> >> > feature_i_* for integers, feature_t_* for string, feature_d_* for
> >> double, 
> >> > 1.a) this means we need to check the type of the discovered field
> >and
> >> then put in the corresponding dynamic field
> >> > 2) at search time, we need to know the right prefix
> >> > We are looking for help to find away to ignore the prefix and check
> >of
> >> the type
> >> >
> >> > regards,
> >> > Thaer
> >>
> >>
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com


Re: index new discovered fileds of different types

2017-07-09 Thread Rick Leir

Jan

I hope this is not off-topic, but I am curious: if you do not use the 
three fields, subject, predicate, and object for indexing RDF
then what is your algorithm? Maybe document nesting is appropriate for 
this? cheers -- Rick



On 2017-07-09 05:52 PM, Jan Høydahl wrote:

Hi,

I have personally written a Python script to parse RDF files into an in-memory 
graph structure and then pull data from that structure to index to Solr.
I.e. you may perfectly well have RDF (nt, turtle, whatever) as source but index 
sub structures in very specific ways.
Anyway, as Erick points out, that’s probably where in your code that you should 
use Managed Schema REST API in order to
1. Query Solr for what fields are defined
2. If you need to index a field that is not yet in Solr, add it, using the 
correct field type (your app should know)
3. Push the data
4. Repeat

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com


8. jul. 2017 kl. 02.36 skrev Rick Leir :

Thaer
Whoa, hold everything! You said RDF, meaning resource description framework? If 
so, you have exactly​ three fields: subject, predicate, and object. Maybe they 
are text type, or for exact matches you might want string fields. Add an ID 
field, which could be automatically generated by Solr, so now you have four 
fields. Or am I on a tangent again? Cheers -- Rick

On July 7, 2017 6:01:00 AM EDT, Thaer Sammar  wrote:

Hi Jan,

Thanks!, I am exploring the schemaless option based on Furkan
suggestion. I
need the the flexibility because not all fields are known. We get the
data
from RDF database (which changes continuously). To be more specific, we
have a database and all changes on it are sent to a kafka queue. and we
have a consumer which listen to the queue and update the Solr index.

regards,
Thaer

On 7 July 2017 at 10:53, Jan Høydahl  wrote:


If you do not need the flexibility of dynamic fields, don’t use them.
Sounds to me that you really want a field “price” to be float and a

field

“birthdate” to be of type date etc.
If so, simply create your schema (either manually, through Schema API

or

using schemaless) up front and index each field as correct type

without

messing with field name prefixes.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com


5. jul. 2017 kl. 15.23 skrev Thaer Sammar :

Hi,
We are trying to index documents of different types. Document have

different fields. fields are known at indexing time. We run a query

on a

database and we index what comes using query variables as field names

in

solr. Our current solution: we use dynamic fields with prefix, for

example

feature_i_*, the issue with that

1) we need to define the type of the dynamic field and to be able

to

cover the type of discovered fields we define the following

feature_i_* for integers, feature_t_* for string, feature_d_* for

double, 

1.a) this means we need to check the type of the discovered field

and

then put in the corresponding dynamic field

2) at search time, we need to know the right prefix
We are looking for help to find away to ignore the prefix and check

of

the type

regards,
Thaer



--
Sorry for being brief. Alternate email is rickleir at yahoo dot com




Re: index new discovered fileds of different types

2017-07-09 Thread Jan Høydahl
Hi,

I have personally written a Python script to parse RDF files into an in-memory 
graph structure and then pull data from that structure to index to Solr.
I.e. you may perfectly well have RDF (nt, turtle, whatever) as source but index 
sub structures in very specific ways.
Anyway, as Erick points out, that’s probably where in your code that you should 
use Managed Schema REST API in order to
1. Query Solr for what fields are defined
2. If you need to index a field that is not yet in Solr, add it, using the 
correct field type (your app should know)
3. Push the data
4. Repeat

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 8. jul. 2017 kl. 02.36 skrev Rick Leir :
> 
> Thaer
> Whoa, hold everything! You said RDF, meaning resource description framework? 
> If so, you have exactly​ three fields: subject, predicate, and object. Maybe 
> they are text type, or for exact matches you might want string fields. Add an 
> ID field, which could be automatically generated by Solr, so now you have 
> four fields. Or am I on a tangent again? Cheers -- Rick
> 
> On July 7, 2017 6:01:00 AM EDT, Thaer Sammar  wrote:
>> Hi Jan,
>> 
>> Thanks!, I am exploring the schemaless option based on Furkan
>> suggestion. I
>> need the the flexibility because not all fields are known. We get the
>> data
>> from RDF database (which changes continuously). To be more specific, we
>> have a database and all changes on it are sent to a kafka queue. and we
>> have a consumer which listen to the queue and update the Solr index.
>> 
>> regards,
>> Thaer
>> 
>> On 7 July 2017 at 10:53, Jan Høydahl  wrote:
>> 
>>> If you do not need the flexibility of dynamic fields, don’t use them.
>>> Sounds to me that you really want a field “price” to be float and a
>> field
>>> “birthdate” to be of type date etc.
>>> If so, simply create your schema (either manually, through Schema API
>> or
>>> using schemaless) up front and index each field as correct type
>> without
>>> messing with field name prefixes.
>>> 
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>> 
>>>> 5. jul. 2017 kl. 15.23 skrev Thaer Sammar :
>>>> 
>>>> Hi,
>>>> We are trying to index documents of different types. Document have
>>> different fields. fields are known at indexing time. We run a query
>> on a
>>> database and we index what comes using query variables as field names
>> in
>>> solr. Our current solution: we use dynamic fields with prefix, for
>> example
>>> feature_i_*, the issue with that
>>>> 1) we need to define the type of the dynamic field and to be able
>> to
>>> cover the type of discovered fields we define the following
>>>> feature_i_* for integers, feature_t_* for string, feature_d_* for
>>> double, 
>>>> 1.a) this means we need to check the type of the discovered field
>> and
>>> then put in the corresponding dynamic field
>>>> 2) at search time, we need to know the right prefix
>>>> We are looking for help to find away to ignore the prefix and check
>> of
>>> the type
>>>> 
>>>> regards,
>>>> Thaer
>>> 
>>> 
> 
> -- 
> Sorry for being brief. Alternate email is rickleir at yahoo dot com



Re: index new discovered fileds of different types

2017-07-07 Thread Rick Leir
Thaer
Whoa, hold everything! You said RDF, meaning resource description framework? If 
so, you have exactly​ three fields: subject, predicate, and object. Maybe they 
are text type, or for exact matches you might want string fields. Add an ID 
field, which could be automatically generated by Solr, so now you have four 
fields. Or am I on a tangent again? Cheers -- Rick

On July 7, 2017 6:01:00 AM EDT, Thaer Sammar  wrote:
>Hi Jan,
>
>Thanks!, I am exploring the schemaless option based on Furkan
>suggestion. I
>need the the flexibility because not all fields are known. We get the
>data
>from RDF database (which changes continuously). To be more specific, we
>have a database and all changes on it are sent to a kafka queue. and we
>have a consumer which listen to the queue and update the Solr index.
>
>regards,
>Thaer
>
>On 7 July 2017 at 10:53, Jan Høydahl  wrote:
>
>> If you do not need the flexibility of dynamic fields, don’t use them.
>> Sounds to me that you really want a field “price” to be float and a
>field
>> “birthdate” to be of type date etc.
>> If so, simply create your schema (either manually, through Schema API
>or
>> using schemaless) up front and index each field as correct type
>without
>> messing with field name prefixes.
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> > 5. jul. 2017 kl. 15.23 skrev Thaer Sammar :
>> >
>> > Hi,
>> > We are trying to index documents of different types. Document have
>> different fields. fields are known at indexing time. We run a query
>on a
>> database and we index what comes using query variables as field names
>in
>> solr. Our current solution: we use dynamic fields with prefix, for
>example
>> feature_i_*, the issue with that
>> > 1) we need to define the type of the dynamic field and to be able
>to
>> cover the type of discovered fields we define the following
>> > feature_i_* for integers, feature_t_* for string, feature_d_* for
>> double, 
>> > 1.a) this means we need to check the type of the discovered field
>and
>> then put in the corresponding dynamic field
>> > 2) at search time, we need to know the right prefix
>> > We are looking for help to find away to ignore the prefix and check
>of
>> the type
>> >
>> > regards,
>> > Thaer
>>
>>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: index new discovered fileds of different types

2017-07-07 Thread Erick Erickson
I'd recommend "managed schema" rather than schemaless. They're related
but distinct.

The problem is that schemaless makes assumptions based on the first
field it finds. So if it finds a field with a "1" in it, it guesses
"int". That'll break if the next doc has a 1.0 since it doesn't parse
to an int.

Managed schema uses the same underlying mechanism to change the
schema, it just let's you control exactly what gets changed.

Best,
Erick

On Fri, Jul 7, 2017 at 3:01 AM, Thaer Sammar  wrote:
> Hi Jan,
>
> Thanks!, I am exploring the schemaless option based on Furkan suggestion. I
> need the the flexibility because not all fields are known. We get the data
> from RDF database (which changes continuously). To be more specific, we
> have a database and all changes on it are sent to a kafka queue. and we
> have a consumer which listen to the queue and update the Solr index.
>
> regards,
> Thaer
>
> On 7 July 2017 at 10:53, Jan Høydahl  wrote:
>
>> If you do not need the flexibility of dynamic fields, don’t use them.
>> Sounds to me that you really want a field “price” to be float and a field
>> “birthdate” to be of type date etc.
>> If so, simply create your schema (either manually, through Schema API or
>> using schemaless) up front and index each field as correct type without
>> messing with field name prefixes.
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> > 5. jul. 2017 kl. 15.23 skrev Thaer Sammar :
>> >
>> > Hi,
>> > We are trying to index documents of different types. Document have
>> different fields. fields are known at indexing time. We run a query on a
>> database and we index what comes using query variables as field names in
>> solr. Our current solution: we use dynamic fields with prefix, for example
>> feature_i_*, the issue with that
>> > 1) we need to define the type of the dynamic field and to be able to
>> cover the type of discovered fields we define the following
>> > feature_i_* for integers, feature_t_* for string, feature_d_* for
>> double, 
>> > 1.a) this means we need to check the type of the discovered field and
>> then put in the corresponding dynamic field
>> > 2) at search time, we need to know the right prefix
>> > We are looking for help to find away to ignore the prefix and check of
>> the type
>> >
>> > regards,
>> > Thaer
>>
>>


Re: index new discovered fileds of different types

2017-07-07 Thread Thaer Sammar
Hi Jan,

Thanks!, I am exploring the schemaless option based on Furkan suggestion. I
need the the flexibility because not all fields are known. We get the data
from RDF database (which changes continuously). To be more specific, we
have a database and all changes on it are sent to a kafka queue. and we
have a consumer which listen to the queue and update the Solr index.

regards,
Thaer

On 7 July 2017 at 10:53, Jan Høydahl  wrote:

> If you do not need the flexibility of dynamic fields, don’t use them.
> Sounds to me that you really want a field “price” to be float and a field
> “birthdate” to be of type date etc.
> If so, simply create your schema (either manually, through Schema API or
> using schemaless) up front and index each field as correct type without
> messing with field name prefixes.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 5. jul. 2017 kl. 15.23 skrev Thaer Sammar :
> >
> > Hi,
> > We are trying to index documents of different types. Document have
> different fields. fields are known at indexing time. We run a query on a
> database and we index what comes using query variables as field names in
> solr. Our current solution: we use dynamic fields with prefix, for example
> feature_i_*, the issue with that
> > 1) we need to define the type of the dynamic field and to be able to
> cover the type of discovered fields we define the following
> > feature_i_* for integers, feature_t_* for string, feature_d_* for
> double, 
> > 1.a) this means we need to check the type of the discovered field and
> then put in the corresponding dynamic field
> > 2) at search time, we need to know the right prefix
> > We are looking for help to find away to ignore the prefix and check of
> the type
> >
> > regards,
> > Thaer
>
>


Re: index new discovered fileds of different types

2017-07-07 Thread Jan Høydahl
If you do not need the flexibility of dynamic fields, don’t use them.
Sounds to me that you really want a field “price” to be float and a field 
“birthdate” to be of type date etc.
If so, simply create your schema (either manually, through Schema API or using 
schemaless) up front and index each field as correct type without messing with 
field name prefixes.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 5. jul. 2017 kl. 15.23 skrev Thaer Sammar :
> 
> Hi,
> We are trying to index documents of different types. Document have different 
> fields. fields are known at indexing time. We run a query on a database and 
> we index what comes using query variables as field names in solr. Our current 
> solution: we use dynamic fields with prefix, for example feature_i_*, the 
> issue with that
> 1) we need to define the type of the dynamic field and to be able to cover 
> the type of discovered fields we define the following
> feature_i_* for integers, feature_t_* for string, feature_d_* for double, 
> 1.a) this means we need to check the type of the discovered field and then 
> put in the corresponding dynamic field
> 2) at search time, we need to know the right prefix
> We are looking for help to find away to ignore the prefix and check of the 
> type
> 
> regards,
> Thaer



Re: index new discovered fileds of different types

2017-07-05 Thread Erick Erickson
I really have no idea what "to ignore the prefix and check of the type" means.

When? How? Can you give an example of inputs and outputs? You might
want to review:
https://wiki.apache.org/solr/UsingMailingLists

And to add to what Furkan mentioned, in addition to schemaless you can
use "managed schema"
which will allow you to add fields and types on the fly.

Best,
Erick

On Wed, Jul 5, 2017 at 8:12 AM, Thaer Sammar  wrote:
> Hi Furkan,
>
> No, In the schema we also defined some static fields such as uri and geo
> field.
>
> On 5 July 2017 at 17:07, Furkan KAMACI  wrote:
>
>> Hi Thaer,
>>
>> Do you use schemeless mode [1] ?
>>
>> Kind Regards,
>> Furkan KAMACI
>>
>> [1] https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
>>
>> On Wed, Jul 5, 2017 at 4:23 PM, Thaer Sammar  wrote:
>>
>> > Hi,
>> > We are trying to index documents of different types. Document have
>> > different fields. fields are known at indexing time. We run a query on a
>> > database and we index what comes using query variables as field names in
>> > solr. Our current solution: we use dynamic fields with prefix, for
>> example
>> > feature_i_*, the issue with that
>> > 1) we need to define the type of the dynamic field and to be able to
>> cover
>> > the type of discovered fields we define the following
>> > feature_i_* for integers, feature_t_* for string, feature_d_* for double,
>> > 
>> > 1.a) this means we need to check the type of the discovered field and
>> then
>> > put in the corresponding dynamic field
>> > 2) at search time, we need to know the right prefix
>> > We are looking for help to find away to ignore the prefix and check of
>> the
>> > type
>> >
>> > regards,
>> > Thaer
>>


Re: index new discovered fileds of different types

2017-07-05 Thread Thaer Sammar
Hi Furkan,

No, In the schema we also defined some static fields such as uri and geo
field.

On 5 July 2017 at 17:07, Furkan KAMACI  wrote:

> Hi Thaer,
>
> Do you use schemeless mode [1] ?
>
> Kind Regards,
> Furkan KAMACI
>
> [1] https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
>
> On Wed, Jul 5, 2017 at 4:23 PM, Thaer Sammar  wrote:
>
> > Hi,
> > We are trying to index documents of different types. Document have
> > different fields. fields are known at indexing time. We run a query on a
> > database and we index what comes using query variables as field names in
> > solr. Our current solution: we use dynamic fields with prefix, for
> example
> > feature_i_*, the issue with that
> > 1) we need to define the type of the dynamic field and to be able to
> cover
> > the type of discovered fields we define the following
> > feature_i_* for integers, feature_t_* for string, feature_d_* for double,
> > 
> > 1.a) this means we need to check the type of the discovered field and
> then
> > put in the corresponding dynamic field
> > 2) at search time, we need to know the right prefix
> > We are looking for help to find away to ignore the prefix and check of
> the
> > type
> >
> > regards,
> > Thaer
>


Re: index new discovered fileds of different types

2017-07-05 Thread Furkan KAMACI
Hi Thaer,

Do you use schemeless mode [1] ?

Kind Regards,
Furkan KAMACI

[1] https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode

On Wed, Jul 5, 2017 at 4:23 PM, Thaer Sammar  wrote:

> Hi,
> We are trying to index documents of different types. Document have
> different fields. fields are known at indexing time. We run a query on a
> database and we index what comes using query variables as field names in
> solr. Our current solution: we use dynamic fields with prefix, for example
> feature_i_*, the issue with that
> 1) we need to define the type of the dynamic field and to be able to cover
> the type of discovered fields we define the following
> feature_i_* for integers, feature_t_* for string, feature_d_* for double,
> 
> 1.a) this means we need to check the type of the discovered field and then
> put in the corresponding dynamic field
> 2) at search time, we need to know the right prefix
> We are looking for help to find away to ignore the prefix and check of the
> type
>
> regards,
> Thaer


index new discovered fileds of different types

2017-07-05 Thread Thaer Sammar
Hi,
We are trying to index documents of different types. Document have different 
fields. fields are known at indexing time. We run a query on a database and we 
index what comes using query variables as field names in solr. Our current 
solution: we use dynamic fields with prefix, for example feature_i_*, the issue 
with that
1) we need to define the type of the dynamic field and to be able to cover the 
type of discovered fields we define the following
feature_i_* for integers, feature_t_* for string, feature_d_* for double, 
1.a) this means we need to check the type of the discovered field and then put 
in the corresponding dynamic field
2) at search time, we need to know the right prefix
We are looking for help to find away to ignore the prefix and check of the type

regards,
Thaer

Re: Solr 6.4. Can't index MS Visio vsdx files

2017-07-04 Thread Charlie Hull

On 11/04/2017 20:48, Allison, Timothy B. wrote:

It depends.  We've been trying to make parsers more, erm, flexible, but there 
are some problems from which we cannot recover.

Tl;dr there isn't a short answer.  :(

My sense is that DIH/ExtractingDocumentHandler is intended to get people up and 
running with Solr easily but it is not really a great idea for production.  See 
Erick's gem: https://lucidworks.com/2012/02/14/indexing-with-solrj/


+1. Tika extraction should happen *outside* Solr in production. A 
colleague even wrote a simple wrapper for Tika to help build this sort 
of thing: https://github.com/mattflax/dropwizard-tika-server


Charlie




As for the Tika portion... at the very least, Tika _shouldn't_ cause the 
ingesting process to crash.  At most, it should fail at the file level and not 
cause greater havoc.  In practice, if you're processing millions of files from 
the wild, you'll run into bad behavior and need to defend against permanent 
hangs, oom, memory leaks.

Also, at the least, if there's an exception with an embedded file, Tika should 
catch it and keep going with the rest of the file.  If this doesn't happen let 
us know!  We are aware that some types of embedded file stream problems were 
causing parse failures on the entire file, and we now catch those in Tika 
1.15-SNAPSHOT and don't let them percolate up through the parent file (they're 
reported in the metadata though).

Specifically for your stack traces:

For your initial problem with the missing class exceptions -- I thought we used 
to catch those in docx and log them.  I haven't been able to track this down, 
though.  I can look more if you have a need.

For "Caused by: org.apache.poi.POIXMLException: Invalid 'Row_Type' name 'PolylineTo' 
", this problem might go away if we implemented a pure SAX parser for vsdx.  We just 
did this for docx and pptx (coming in 1.15) and these are more robust to variation 
because they aren't requiring a match with the ooxml schema.  I haven't looked much at 
vsdx, but that _might_ help.

For "TODO Support v5 Pointers", this isn't supported and would require 
contributions.  However, I agree that POI shouldn't throw a Runtime exception.  Perhaps 
open an issue in POI, or maybe we should catch this special example at the Tika level?

For "Caused by: java.lang.ArrayIndexOutOfBoundsException:", the POI team 
_might_ be able to modify the parser to ignore a stream if there's an exception, but 
that's often a sign that something needs to be fixed with the parser.  In short, the 
solution will come from POI.

Best,

 Tim

-Original Message-
From: Gytis Mikuciunas [mailto:gyt...@gmail.com]
Sent: Tuesday, April 11, 2017 1:56 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr 6.4. Can't index MS Visio vsdx files

Thanks for your responses.
Are there any posibilities to ignore parsing errors and continue indexing?
because now solr/tika stops parsing whole document if it finds any exception

On Apr 11, 2017 19:51, "Allison, Timothy B."  wrote:


You might want to drop a note to the dev or user's list on Apache POI.

I'm not extremely familiar with the vsd(x) portion of our code base.

The first item ("PolylineTo") may be caused by a mismatch btwn your
doc and the ooxml spec.

The second item appears to be an unsupported feature.

The third item may be an area for improvement within our codebase...I
can't tell just from the stacktrace.

You'll probably get more helpful answers over on POI.  Sorry, I can't
help with this...

Best,

   Tim

P.S.

 3.1. ooxml-schemas-1.3.jar instead of poi-ooxml-schemas-3.15.jar


You shouldn't need both. Ooxml-schemas-1.3.jar should be a super set
of poi-ooxml-schemas-3.15.jar






---
This email has been checked for viruses by AVG.
http://www.avg.com




--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


RE: Solr 6.4. Can't index MS Visio vsdx files

2017-07-03 Thread Allison, Timothy B.
Sorry.  Y, you'll have to update commons-compress to 1.14.

-Original Message-
From: Gytis Mikuciunas [mailto:gyt...@gmail.com] 
Sent: Monday, July 3, 2017 9:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 6.4. Can't index MS Visio vsdx files

hi,

So I'm back from my long vacations :)

I'm trying to bring-up a fresh solr 6.6 standalone instance on windows
2012R2 server.

Replaced:

poi-*3.15-beta1 ---> poi-*3.16
tika-*1.13 ---> tika-*1.15


Tried to index one txt file and got (with poi and tika files that come out of 
the box, it indexes this txt file without errors):


SimplePostTool: WARNING: Response:   
Error 500 Server Error

HTTP ERROR 500
Problem accessing /solr/v20170703xxx/update/extract. Reason:
Server ErrorCaused
by:java.lang.NoClassDefFoundError:
org/apache/commons/compress/archivers/ArchiveStreamProvider
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(Unknown Source)
at java.security.SecureClassLoader.defineClass(Unknown Source)
at java.net.URLClassLoader.defineClass(Unknown Source)
at java.net.URLClassLoader.access$100(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at
org.apache.tika.parser.pkg.ZipContainerDetector.detectArchiveFormat(ZipContainerDetector.java:112)
at
org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:83)
at
org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:115)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceCo

Re: Solr 6.4. Can't index MS Visio vsdx files

2017-07-03 Thread Gytis Mikuciunas
hi,

So I'm back from my long vacations :)

I'm trying to bring-up a fresh solr 6.6 standalone instance on windows
2012R2 server.

Replaced:

poi-*3.15-beta1 ---> poi-*3.16
tika-*1.13 ---> tika-*1.15


Tried to index one txt file and got (with poi and tika files that come out
of the box, it indexes this txt file without errors):


SimplePostTool: WARNING: Response: 


Error 500 Server Error

HTTP ERROR 500
Problem accessing /solr/v20170703xxx/update/extract. Reason:
Server ErrorCaused
by:java.lang.NoClassDefFoundError:
org/apache/commons/compress/archivers/ArchiveStreamProvider
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(Unknown Source)
at java.security.SecureClassLoader.defineClass(Unknown Source)
at java.net.URLClassLoader.defineClass(Unknown Source)
at java.net.URLClassLoader.access$100(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at
org.apache.tika.parser.pkg.ZipContainerDetector.detectArchiveFormat(ZipContainerDetector.java:112)
at
org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:83)
at
org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:115)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassN

How to index binary files from ftp Servers using Solr DIH?

2017-06-29 Thread Alejandro Rivas Martinez
I need a way to index binary files from ftp servers, using UrlDataSource.
I’m doing this locally but I need to do the same from remote sources (Ftp
servers). I read a lot and I can’t find any example of indexing binary
files from ftps. Is it possible to achieve that? How can I use Data Import
Handler to index binary files from ftp servers? This is what I’m doing
locally and I need your help to achieve the same requirements but from a
remote ftp server

https://i.stack.imgur.com/biSlL.jpg


Sharding of index data takes long time.

2017-06-27 Thread chandrushanmugasundaram
I am just trying to shard my index data of size 22GB(1.7M documents) into
three shards.

The total time for splitting takes about 7 hours.

In used the same query that is mentioned in solr collections API.

Is there anyway to do that quicker.

Can i use REBALANCE API . is that secured??

Is there any benchmark that is available for sharding the index .



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sharding-of-index-data-takes-long-time-tp4343029.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Index 0, Size 0 - hashJoin Stream function Error

2017-06-27 Thread Joel Bernstein
Ok, I'll take a look. Thanks!

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Jun 27, 2017 at 10:01 AM, Susheel Kumar 
wrote:

> Hi Joel,
>
> I have submitted a patch to handle this.  Please review.
>
> https://issues.apache.org/jira/secure/attachment/12874681/SOLR-10944.patch
>
> Thanks,
> Susheel
>
> On Fri, Jun 23, 2017 at 12:32 PM, Susheel Kumar 
> wrote:
>
> > Thanks for confirming.  Here is the JIRA
> >
> > https://issues.apache.org/jira/browse/SOLR-10944
> >
> > On Fri, Jun 23, 2017 at 11:20 AM, Joel Bernstein 
> > wrote:
> >
> >> yeah, this looks like a bug in the get expression.
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >> On Fri, Jun 23, 2017 at 11:07 AM, Susheel Kumar 
> >> wrote:
> >>
> >> > Hi Joel,
> >> >
> >> > As i am getting deeper, it doesn't look like a problem due to hashJoin
> >> etc.
> >> >
> >> >
> >> > Below is a simple let expr where if search would not find a match and
> >> > return 0 result.  In that case, I would expect get(a) to show a EOF
> >> tuple
> >> > while it is throwing exception. It looks like something wrong/bug in
> the
> >> > code.  Please suggest
> >> >
> >> > ===
> >> > let(a=search(collection1,
> >> > q=id:9,
> >> > fl="id,business_email",
> >> > sort="business_email asc"),
> >> > get(a)
> >> > )
> >> >
> >> >
> >> > {
> >> >   "result-set": {
> >> > "docs": [
> >> >   {
> >> > "EXCEPTION": "Index: 0, Size: 0",
> >> > "EOF": true,
> >> > "RESPONSE_TIME": 8
> >> >   }
> >> > ]
> >> >   }
> >> > }
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Fri, Jun 23, 2017 at 7:44 AM, Joel Bernstein 
> >> > wrote:
> >> >
> >> > > Ok, I hadn't anticipated some of the scenarios that you've been
> trying
> >> > out.
> >> > > Particularly reading streams into variables and performing joins
> >> etc...
> >> > >
> >> > > The main idea with variables was to use them with the new
> statistical
> >> > > evaluators. So you perform retrievals (search, random, nodes, knn
> >> etc...)
> >> > > set the results to variables and then perform statistical analysis.
> >> > >
> >> > > The problem with joining variables is that is doesn't scale very
> well
> >> > > because all the records are read into memory. Also the parallel
> stream
> >> > > won't work over variables.
> >> > >
> >> > > Joel Bernstein
> >> > > http://joelsolr.blogspot.com/
> >> > >
> >> > > On Thu, Jun 22, 2017 at 3:50 PM, Susheel Kumar <
> susheel2...@gmail.com
> >> >
> >> > > wrote:
> >> > >
> >> > > > Hi Joel,
> >> > > >
> >> > > > I am able to reproduce this in a simple way.  Looks like Let
> Stream
> >> is
> >> > > > having some issues.  Below complement function works fine if I
> >> execute
> >> > > > outside let and returns an EOF:true tuple but if a tuple with
> >> EOF:true
> >> > > > assigned to let variable, it gets changed to EXCEPTION "Index 0,
> >> Size
> >> > 0"
> >> > > > etc.
> >> > > >
> >> > > > So let stream not able to handle the stream/results which has only
> >> EOF
> >> > > > tuple and breaks the whole let expression block
> >> > > >
> >> > > >
> >> > > > ===Complement inside let
> >> > > > let(
> >> > > > a=echo(Hello),
> >> > > > b=complement(sort(select(tuple(id=1,email="A"),id,email),by="id
> >> > > asc,email
> >> > > > asc"),
> >> > > > sort(select(tuple(id=1,email="A"),id,email),by="id asc,email
> asc"),
> >> > > > on=&q

Re: Index 0, Size 0 - hashJoin Stream function Error

2017-06-27 Thread Susheel Kumar
Hi Joel,

I have submitted a patch to handle this.  Please review.

https://issues.apache.org/jira/secure/attachment/12874681/SOLR-10944.patch

Thanks,
Susheel

On Fri, Jun 23, 2017 at 12:32 PM, Susheel Kumar 
wrote:

> Thanks for confirming.  Here is the JIRA
>
> https://issues.apache.org/jira/browse/SOLR-10944
>
> On Fri, Jun 23, 2017 at 11:20 AM, Joel Bernstein 
> wrote:
>
>> yeah, this looks like a bug in the get expression.
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Fri, Jun 23, 2017 at 11:07 AM, Susheel Kumar 
>> wrote:
>>
>> > Hi Joel,
>> >
>> > As i am getting deeper, it doesn't look like a problem due to hashJoin
>> etc.
>> >
>> >
>> > Below is a simple let expr where if search would not find a match and
>> > return 0 result.  In that case, I would expect get(a) to show a EOF
>> tuple
>> > while it is throwing exception. It looks like something wrong/bug in the
>> > code.  Please suggest
>> >
>> > ===
>> > let(a=search(collection1,
>> > q=id:9,
>> > fl="id,business_email",
>> > sort="business_email asc"),
>> > get(a)
>> > )
>> >
>> >
>> > {
>> >   "result-set": {
>> > "docs": [
>> >   {
>> > "EXCEPTION": "Index: 0, Size: 0",
>> > "EOF": true,
>> > "RESPONSE_TIME": 8
>> >   }
>> > ]
>> >   }
>> > }
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Jun 23, 2017 at 7:44 AM, Joel Bernstein 
>> > wrote:
>> >
>> > > Ok, I hadn't anticipated some of the scenarios that you've been trying
>> > out.
>> > > Particularly reading streams into variables and performing joins
>> etc...
>> > >
>> > > The main idea with variables was to use them with the new statistical
>> > > evaluators. So you perform retrievals (search, random, nodes, knn
>> etc...)
>> > > set the results to variables and then perform statistical analysis.
>> > >
>> > > The problem with joining variables is that is doesn't scale very well
>> > > because all the records are read into memory. Also the parallel stream
>> > > won't work over variables.
>> > >
>> > > Joel Bernstein
>> > > http://joelsolr.blogspot.com/
>> > >
>> > > On Thu, Jun 22, 2017 at 3:50 PM, Susheel Kumar > >
>> > > wrote:
>> > >
>> > > > Hi Joel,
>> > > >
>> > > > I am able to reproduce this in a simple way.  Looks like Let Stream
>> is
>> > > > having some issues.  Below complement function works fine if I
>> execute
>> > > > outside let and returns an EOF:true tuple but if a tuple with
>> EOF:true
>> > > > assigned to let variable, it gets changed to EXCEPTION "Index 0,
>> Size
>> > 0"
>> > > > etc.
>> > > >
>> > > > So let stream not able to handle the stream/results which has only
>> EOF
>> > > > tuple and breaks the whole let expression block
>> > > >
>> > > >
>> > > > ===Complement inside let
>> > > > let(
>> > > > a=echo(Hello),
>> > > > b=complement(sort(select(tuple(id=1,email="A"),id,email),by="id
>> > > asc,email
>> > > > asc"),
>> > > > sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"),
>> > > > on="id,email"),
>> > > > c=get(b),
>> > > > get(a)
>> > > > )
>> > > >
>> > > > Result
>> > > > ===
>> > > > {
>> > > >   "result-set": {
>> > > > "docs": [
>> > > >   {
>> > > > "EXCEPTION": "Index: 0, Size: 0",
>> > > > "EOF": true,
>> > > > "RESPONSE_TIME": 1
>> > > >   }
>> > > > ]
>> > > >   }
>> > > > }
>> > > >
>> > > > ===Complement outside let
>> > > >
>> &g

Re: Index 0, Size 0 - hashJoin Stream function Error

2017-06-23 Thread Susheel Kumar
Thanks for confirming.  Here is the JIRA

https://issues.apache.org/jira/browse/SOLR-10944

On Fri, Jun 23, 2017 at 11:20 AM, Joel Bernstein  wrote:

> yeah, this looks like a bug in the get expression.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, Jun 23, 2017 at 11:07 AM, Susheel Kumar 
> wrote:
>
> > Hi Joel,
> >
> > As i am getting deeper, it doesn't look like a problem due to hashJoin
> etc.
> >
> >
> > Below is a simple let expr where if search would not find a match and
> > return 0 result.  In that case, I would expect get(a) to show a EOF tuple
> > while it is throwing exception. It looks like something wrong/bug in the
> > code.  Please suggest
> >
> > ===
> > let(a=search(collection1,
> > q=id:9,
> > fl="id,business_email",
> > sort="business_email asc"),
> > get(a)
> > )
> >
> >
> > {
> >   "result-set": {
> > "docs": [
> >   {
> > "EXCEPTION": "Index: 0, Size: 0",
> > "EOF": true,
> > "RESPONSE_TIME": 8
> >   }
> > ]
> >   }
> > }
> >
> >
> >
> >
> >
> >
> >
> > On Fri, Jun 23, 2017 at 7:44 AM, Joel Bernstein 
> > wrote:
> >
> > > Ok, I hadn't anticipated some of the scenarios that you've been trying
> > out.
> > > Particularly reading streams into variables and performing joins etc...
> > >
> > > The main idea with variables was to use them with the new statistical
> > > evaluators. So you perform retrievals (search, random, nodes, knn
> etc...)
> > > set the results to variables and then perform statistical analysis.
> > >
> > > The problem with joining variables is that is doesn't scale very well
> > > because all the records are read into memory. Also the parallel stream
> > > won't work over variables.
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Thu, Jun 22, 2017 at 3:50 PM, Susheel Kumar 
> > > wrote:
> > >
> > > > Hi Joel,
> > > >
> > > > I am able to reproduce this in a simple way.  Looks like Let Stream
> is
> > > > having some issues.  Below complement function works fine if I
> execute
> > > > outside let and returns an EOF:true tuple but if a tuple with
> EOF:true
> > > > assigned to let variable, it gets changed to EXCEPTION "Index 0, Size
> > 0"
> > > > etc.
> > > >
> > > > So let stream not able to handle the stream/results which has only
> EOF
> > > > tuple and breaks the whole let expression block
> > > >
> > > >
> > > > ===Complement inside let
> > > > let(
> > > > a=echo(Hello),
> > > > b=complement(sort(select(tuple(id=1,email="A"),id,email),by="id
> > > asc,email
> > > > asc"),
> > > > sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"),
> > > > on="id,email"),
> > > > c=get(b),
> > > > get(a)
> > > > )
> > > >
> > > > Result
> > > > ===
> > > > {
> > > >   "result-set": {
> > > > "docs": [
> > > >   {
> > > > "EXCEPTION": "Index: 0, Size: 0",
> > > > "EOF": true,
> > > > "RESPONSE_TIME": 1
> > > >   }
> > > > ]
> > > >   }
> > > > }
> > > >
> > > > ===Complement outside let
> > > >
> > > > complement(sort(select(tuple(id=1,email="A"),id,email),by="id
> > asc,email
> > > > asc"),
> > > > sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"),
> > > > on="id,email")
> > > >
> > > > Result
> > > > ===
> > > > { "result-set": { "docs": [ { "EOF": true, "RESPONSE_TIME": 0 } ] } }
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Jun 22, 2017 at 11:55 AM, Susheel Kumar <
> 

Re: Index 0, Size 0 - hashJoin Stream function Error

2017-06-23 Thread Joel Bernstein
yeah, this looks like a bug in the get expression.

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Jun 23, 2017 at 11:07 AM, Susheel Kumar 
wrote:

> Hi Joel,
>
> As i am getting deeper, it doesn't look like a problem due to hashJoin etc.
>
>
> Below is a simple let expr where if search would not find a match and
> return 0 result.  In that case, I would expect get(a) to show a EOF tuple
> while it is throwing exception. It looks like something wrong/bug in the
> code.  Please suggest
>
> ===
> let(a=search(collection1,
> q=id:9,
> fl="id,business_email",
> sort="business_email asc"),
> get(a)
> )
>
>
> {
>   "result-set": {
> "docs": [
>   {
> "EXCEPTION": "Index: 0, Size: 0",
> "EOF": true,
> "RESPONSE_TIME": 8
>   }
> ]
>   }
> }
>
>
>
>
>
>
>
> On Fri, Jun 23, 2017 at 7:44 AM, Joel Bernstein 
> wrote:
>
> > Ok, I hadn't anticipated some of the scenarios that you've been trying
> out.
> > Particularly reading streams into variables and performing joins etc...
> >
> > The main idea with variables was to use them with the new statistical
> > evaluators. So you perform retrievals (search, random, nodes, knn etc...)
> > set the results to variables and then perform statistical analysis.
> >
> > The problem with joining variables is that is doesn't scale very well
> > because all the records are read into memory. Also the parallel stream
> > won't work over variables.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Thu, Jun 22, 2017 at 3:50 PM, Susheel Kumar 
> > wrote:
> >
> > > Hi Joel,
> > >
> > > I am able to reproduce this in a simple way.  Looks like Let Stream is
> > > having some issues.  Below complement function works fine if I execute
> > > outside let and returns an EOF:true tuple but if a tuple with EOF:true
> > > assigned to let variable, it gets changed to EXCEPTION "Index 0, Size
> 0"
> > > etc.
> > >
> > > So let stream not able to handle the stream/results which has only EOF
> > > tuple and breaks the whole let expression block
> > >
> > >
> > > ===Complement inside let
> > > let(
> > > a=echo(Hello),
> > > b=complement(sort(select(tuple(id=1,email="A"),id,email),by="id
> > asc,email
> > > asc"),
> > > sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"),
> > > on="id,email"),
> > > c=get(b),
> > > get(a)
> > > )
> > >
> > > Result
> > > ===
> > > {
> > >   "result-set": {
> > > "docs": [
> > >   {
> > > "EXCEPTION": "Index: 0, Size: 0",
> > > "EOF": true,
> > > "RESPONSE_TIME": 1
> > >   }
> > > ]
> > >   }
> > > }
> > >
> > > ===Complement outside let
> > >
> > > complement(sort(select(tuple(id=1,email="A"),id,email),by="id
> asc,email
> > > asc"),
> > > sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"),
> > > on="id,email")
> > >
> > > Result
> > > ===
> > > { "result-set": { "docs": [ { "EOF": true, "RESPONSE_TIME": 0 } ] } }
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Jun 22, 2017 at 11:55 AM, Susheel Kumar  >
> > > wrote:
> > >
> > > > Sorry for typo
> > > >
> > > > Facing a weird behavior when using hashJoin / innerJoin etc. The
> below
> > > > expression display tuples from variable a shown below
> > > >
> > > >
> > > > let(a=fetch(SMS,having(rollup(over=email,
> > > >  count(email),
> > > > select(search(SMS,
> > > > q=*:*,
> > > > fl="id,dv_sv_business_email",
> > > > sort="dv_sv_business_email asc"),
> > > >id,
> > > >dv_sv_business_email as email)),
> > > > eq(count(e

Re: Index 0, Size 0 - hashJoin Stream function Error

2017-06-23 Thread Susheel Kumar
Hi Joel,

As i am getting deeper, it doesn't look like a problem due to hashJoin etc.


Below is a simple let expr where if search would not find a match and
return 0 result.  In that case, I would expect get(a) to show a EOF tuple
while it is throwing exception. It looks like something wrong/bug in the
code.  Please suggest

===
let(a=search(collection1,
q=id:9,
fl="id,business_email",
sort="business_email asc"),
get(a)
)


{
  "result-set": {
"docs": [
  {
"EXCEPTION": "Index: 0, Size: 0",
"EOF": true,
"RESPONSE_TIME": 8
  }
]
  }
}







On Fri, Jun 23, 2017 at 7:44 AM, Joel Bernstein  wrote:

> Ok, I hadn't anticipated some of the scenarios that you've been trying out.
> Particularly reading streams into variables and performing joins etc...
>
> The main idea with variables was to use them with the new statistical
> evaluators. So you perform retrievals (search, random, nodes, knn etc...)
> set the results to variables and then perform statistical analysis.
>
> The problem with joining variables is that is doesn't scale very well
> because all the records are read into memory. Also the parallel stream
> won't work over variables.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Jun 22, 2017 at 3:50 PM, Susheel Kumar 
> wrote:
>
> > Hi Joel,
> >
> > I am able to reproduce this in a simple way.  Looks like Let Stream is
> > having some issues.  Below complement function works fine if I execute
> > outside let and returns an EOF:true tuple but if a tuple with EOF:true
> > assigned to let variable, it gets changed to EXCEPTION "Index 0, Size 0"
> > etc.
> >
> > So let stream not able to handle the stream/results which has only EOF
> > tuple and breaks the whole let expression block
> >
> >
> > ===Complement inside let
> > let(
> > a=echo(Hello),
> > b=complement(sort(select(tuple(id=1,email="A"),id,email),by="id
> asc,email
> > asc"),
> > sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"),
> > on="id,email"),
> > c=get(b),
> > get(a)
> > )
> >
> > Result
> > ===
> > {
> >   "result-set": {
> > "docs": [
> >   {
> > "EXCEPTION": "Index: 0, Size: 0",
> > "EOF": true,
> > "RESPONSE_TIME": 1
> >   }
> > ]
> >   }
> > }
> >
> > ===Complement outside let
> >
> > complement(sort(select(tuple(id=1,email="A"),id,email),by="id asc,email
> > asc"),
> > sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"),
> > on="id,email")
> >
> > Result
> > ===
> > { "result-set": { "docs": [ { "EOF": true, "RESPONSE_TIME": 0 } ] } }
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Jun 22, 2017 at 11:55 AM, Susheel Kumar 
> > wrote:
> >
> > > Sorry for typo
> > >
> > > Facing a weird behavior when using hashJoin / innerJoin etc. The below
> > > expression display tuples from variable a shown below
> > >
> > >
> > > let(a=fetch(SMS,having(rollup(over=email,
> > >  count(email),
> > > select(search(SMS,
> > > q=*:*,
> > > fl="id,dv_sv_business_email",
> > > sort="dv_sv_business_email asc"),
> > >id,
> > >dv_sv_business_email as email)),
> > > eq(count(email),1)),
> > > fl="id,dv_sv_business_email as email",
> > > on="email=dv_sv_business_email"),
> > > b=fetch(SMS,having(rollup(over=email,
> > >  count(email),
> > > select(search(SMS,
> > > q=*:*,
> > > fl="id,dv_sv_personal_email",
> > > sort="dv_sv_personal_email asc"),
> > >id,
> > >dv_sv_personal_email as email)),
> > > eq(count(email),1)),
> > > fl="id,dv_sv_personal_email as email",
> > > on="email=dv_sv_personal_email"),
> > > c=innerJoin(sort(get(a),by="email asc"),sort(get(b),by="email
>

Re: Index 0, Size 0 - hashJoin Stream function Error

2017-06-23 Thread Joel Bernstein
Ok, I hadn't anticipated some of the scenarios that you've been trying out.
Particularly reading streams into variables and performing joins etc...

The main idea with variables was to use them with the new statistical
evaluators. So you perform retrievals (search, random, nodes, knn etc...)
set the results to variables and then perform statistical analysis.

The problem with joining variables is that is doesn't scale very well
because all the records are read into memory. Also the parallel stream
won't work over variables.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Jun 22, 2017 at 3:50 PM, Susheel Kumar 
wrote:

> Hi Joel,
>
> I am able to reproduce this in a simple way.  Looks like Let Stream is
> having some issues.  Below complement function works fine if I execute
> outside let and returns an EOF:true tuple but if a tuple with EOF:true
> assigned to let variable, it gets changed to EXCEPTION "Index 0, Size 0"
> etc.
>
> So let stream not able to handle the stream/results which has only EOF
> tuple and breaks the whole let expression block
>
>
> ===Complement inside let
> let(
> a=echo(Hello),
> b=complement(sort(select(tuple(id=1,email="A"),id,email),by="id asc,email
> asc"),
> sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"),
> on="id,email"),
> c=get(b),
> get(a)
> )
>
> Result
> ===
> {
>   "result-set": {
> "docs": [
>   {
> "EXCEPTION": "Index: 0, Size: 0",
> "EOF": true,
> "RESPONSE_TIME": 1
>   }
> ]
>   }
> }
>
> ===Complement outside let
>
> complement(sort(select(tuple(id=1,email="A"),id,email),by="id asc,email
> asc"),
> sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"),
> on="id,email")
>
> Result
> ===
> { "result-set": { "docs": [ { "EOF": true, "RESPONSE_TIME": 0 } ] } }
>
>
>
>
>
>
>
>
> On Thu, Jun 22, 2017 at 11:55 AM, Susheel Kumar 
> wrote:
>
> > Sorry for typo
> >
> > Facing a weird behavior when using hashJoin / innerJoin etc. The below
> > expression display tuples from variable a shown below
> >
> >
> > let(a=fetch(SMS,having(rollup(over=email,
> >  count(email),
> > select(search(SMS,
> > q=*:*,
> > fl="id,dv_sv_business_email",
> > sort="dv_sv_business_email asc"),
> >id,
> >dv_sv_business_email as email)),
> > eq(count(email),1)),
> > fl="id,dv_sv_business_email as email",
> > on="email=dv_sv_business_email"),
> > b=fetch(SMS,having(rollup(over=email,
> >  count(email),
> > select(search(SMS,
> > q=*:*,
> > fl="id,dv_sv_personal_email",
> > sort="dv_sv_personal_email asc"),
> >id,
> >dv_sv_personal_email as email)),
> > eq(count(email),1)),
> > fl="id,dv_sv_personal_email as email",
> > on="email=dv_sv_personal_email"),
> > c=innerJoin(sort(get(a),by="email asc"),sort(get(b),by="email
> > asc"),on="email"),
> > #d=select(get(c),id,email),
> > get(a)
> > )
> >
> > var a result
> > ==
> > {
> >   "result-set": {
> > "docs": [
> >   {
> > "count(email)": 1,
> > "id": "1",
> > "email": "A"
> >   },
> >   {
> > "count(email)": 1,
> > "id": "2",
> > "email": "C"
> >   },
> >   {
> > "EOF": true,
> > "RESPONSE_TIME": 18
> >   }
> > ]
> >   }
> > }
> >
> > And after uncomment var d above, even though we are displaying a, we get
> > results shown below. I understand that join in my test data didn't find
> any
> > match but then it should not skew up the results of var a.  When data
> > matches during join then its fine but otherwise I am running into this
> > issue and whole next expressions doesn't get evaluated due to this...
> >
> >
> > after uncomment var d
> > ===
> > {
> >   "re

Re: Index 0, Size 0 - hashJoin Stream function Error

2017-06-22 Thread Susheel Kumar
Hi Joel,

I am able to reproduce this in a simple way.  Looks like Let Stream is
having some issues.  Below complement function works fine if I execute
outside let and returns an EOF:true tuple but if a tuple with EOF:true
assigned to let variable, it gets changed to EXCEPTION "Index 0, Size 0"
etc.

So let stream not able to handle the stream/results which has only EOF
tuple and breaks the whole let expression block


===Complement inside let
let(
a=echo(Hello),
b=complement(sort(select(tuple(id=1,email="A"),id,email),by="id asc,email
asc"),
sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"),
on="id,email"),
c=get(b),
get(a)
)

Result
===
{
  "result-set": {
"docs": [
  {
"EXCEPTION": "Index: 0, Size: 0",
"EOF": true,
"RESPONSE_TIME": 1
  }
]
  }
}

===Complement outside let

complement(sort(select(tuple(id=1,email="A"),id,email),by="id asc,email
asc"),
sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"),
on="id,email")

Result
===
{ "result-set": { "docs": [ { "EOF": true, "RESPONSE_TIME": 0 } ] } }








On Thu, Jun 22, 2017 at 11:55 AM, Susheel Kumar 
wrote:

> Sorry for typo
>
> Facing a weird behavior when using hashJoin / innerJoin etc. The below
> expression display tuples from variable a shown below
>
>
> let(a=fetch(SMS,having(rollup(over=email,
>  count(email),
> select(search(SMS,
> q=*:*,
> fl="id,dv_sv_business_email",
> sort="dv_sv_business_email asc"),
>id,
>dv_sv_business_email as email)),
> eq(count(email),1)),
> fl="id,dv_sv_business_email as email",
> on="email=dv_sv_business_email"),
> b=fetch(SMS,having(rollup(over=email,
>  count(email),
> select(search(SMS,
> q=*:*,
> fl="id,dv_sv_personal_email",
> sort="dv_sv_personal_email asc"),
>id,
>dv_sv_personal_email as email)),
> eq(count(email),1)),
> fl="id,dv_sv_personal_email as email",
> on="email=dv_sv_personal_email"),
> c=innerJoin(sort(get(a),by="email asc"),sort(get(b),by="email
> asc"),on="email"),
> #d=select(get(c),id,email),
> get(a)
> )
>
> var a result
> ==
> {
>   "result-set": {
> "docs": [
>   {
> "count(email)": 1,
> "id": "1",
> "email": "A"
>   },
>   {
> "count(email)": 1,
> "id": "2",
> "email": "C"
>   },
>   {
> "EOF": true,
> "RESPONSE_TIME": 18
>   }
> ]
>   }
> }
>
> And after uncomment var d above, even though we are displaying a, we get
> results shown below. I understand that join in my test data didn't find any
> match but then it should not skew up the results of var a.  When data
> matches during join then its fine but otherwise I am running into this
> issue and whole next expressions doesn't get evaluated due to this...
>
>
> after uncomment var d
> ===
> {
>   "result-set": {
> "docs": [
>   {
> "EXCEPTION": "Index: 0, Size: 0",
> "EOF": true,
> "RESPONSE_TIME": 44
>   }
> ]
>   }
> }
>
> On Thu, Jun 22, 2017 at 11:51 AM, Susheel Kumar 
> wrote:
>
>> Hello Joel,
>>
>> Facing a weird behavior when using hashJoin / innerJoin etc. The below
>> expression display tuples from variable a   and the moment I use get on
>> innerJoin / hashJoin expr on variable c
>>
>>
>> let(a=fetch(SMS,having(rollup(over=email,
>>  count(email),
>> select(search(SMS,
>> q=*:*,
>> fl="id,dv_sv_business_email",
>> sort="dv_sv_business_email asc"),
>>id,
>>dv_sv_business_email as email)),
>> eq(count(email),1)),
>> fl="id,dv_sv_business_email as email",
>> on="email=dv_sv_business_email"),
>> b=fetch(SMS,having(rollup(over=email,
>>  count(email),
>> select(search(SMS,
>> q=*:*,
>>  

Re: Error after moving index

2017-06-22 Thread Erick Erickson
"They're just files, man". If you can afford a bit of down-time, you can
shut your Solr down and recursively copy the data directory from your
source to destingation. SCP, rsync, whatever then restart solr.

Do take some care when copying between Windows and *nix that you do a
_binary_ transfer.

If you continually have a problem with transferring between Windows and
*nix we'll have to investigate further. And I'm assuming this is
stand-alone and there's only a single restore going on at a time.

Best,
Erick

On Thu, Jun 22, 2017 at 9:13 AM, Moritz Michael 
wrote:

>
>
>
>
>
>
>
>
> BTW, is there a better/recommended way to transfer an
> index to another solr?
>
>
>
>
>
>
>
>
>
> On Thu, Jun 22, 2017 at 6:09 PM +0200, "Moritz Michael" <
> moritz.mu...@gmail.com> wrote:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Hello Michael,
> I used the backup functionality to create a snapshot and uploaded this
> snapshot, so I feel it should be save.
> I'll try it again. Maybe the copy operation wasn't successful.
> BestMoritz
>
>
>
> _
> From: Michael Kuhlmann 
> Sent: Donnerstag, Juni 22, 2017 2:50 PM
> Subject: Re: Error after moving index
> To:  
>
>
> Hi Moritz,
>
> did you stop your local Solr sever before? Copying data from a running
> instance may cause headaches.
>
> If yes, what happens if you copy everything again? It seems that your
> copy operations wasn't successful.
>
> Best,
> Michael
>
> Am 22.06.2017 um 14:37 schrieb Moritz Munte:
> > Hello,
> >
> >
> >
> > I created an index on my local machine (Windows 10) and it works fine
> there.
> >
> > After uploading the index to the production server (Linux), the server
> shows
> > an error:
> .
>
>
>
>
>
>
>
>
>
>


Re: Error after moving index

2017-06-22 Thread Susheel Kumar
Usually we index directly into Prod solr than copying from local/lower
environments.  If that works in your scenario, i would suggest to directly
index into Prod than copying/restoring from local Windows env to Linux.

On Thu, Jun 22, 2017 at 12:13 PM, Moritz Michael 
wrote:

>
>
>
>
>
>
>
>
> BTW, is there a better/recommended way to transfer an
> index to another solr?
>
>
>
>
>
>
>
>
>
> On Thu, Jun 22, 2017 at 6:09 PM +0200, "Moritz Michael" <
> moritz.mu...@gmail.com> wrote:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Hello Michael,
> I used the backup functionality to create a snapshot and uploaded this
> snapshot, so I feel it should be save.
> I'll try it again. Maybe the copy operation wasn't successful.
> BestMoritz
>
>
>
> _
> From: Michael Kuhlmann 
> Sent: Donnerstag, Juni 22, 2017 2:50 PM
> Subject: Re: Error after moving index
> To:  
>
>
> Hi Moritz,
>
> did you stop your local Solr sever before? Copying data from a running
> instance may cause headaches.
>
> If yes, what happens if you copy everything again? It seems that your
> copy operations wasn't successful.
>
> Best,
> Michael
>
> Am 22.06.2017 um 14:37 schrieb Moritz Munte:
> > Hello,
> >
> >
> >
> > I created an index on my local machine (Windows 10) and it works fine
> there.
> >
> > After uploading the index to the production server (Linux), the server
> shows
> > an error:
> .
>
>
>
>
>
>
>
>
>
>


Re: Error after moving index

2017-06-22 Thread Moritz Michael








BTW, is there a better/recommended way to transfer an index to 
another solr?









On Thu, Jun 22, 2017 at 6:09 PM +0200, "Moritz Michael" 
 wrote:


















Hello Michael,
I used the backup functionality to create a snapshot and uploaded this 
snapshot, so I feel it should be save. 
I'll try it again. Maybe the copy operation wasn't successful. 
BestMoritz



_
From: Michael Kuhlmann 
Sent: Donnerstag, Juni 22, 2017 2:50 PM
Subject: Re: Error after moving index
To:  


Hi Moritz,

did you stop your local Solr sever before? Copying data from a running
instance may cause headaches.

If yes, what happens if you copy everything again? It seems that your
copy operations wasn't successful.

Best,
Michael

Am 22.06.2017 um 14:37 schrieb Moritz Munte:
> Hello,
>
>  
>
> I created an index on my local machine (Windows 10) and it works fine there.
>
> After uploading the index to the production server (Linux), the server shows
> an error:
.











<    4   5   6   7   8   9   10   11   12   13   >