Re: Indexing Approach

2018-06-26 Thread solrnoobie
1. We have 5 nodes and 3 zookeepers (will autoscale if needed) 2. We use java with the help of solrj / spring data for indexing. 3. We see the exception in our application so this is probably our fault and not solr's so I'm asking what is the best approach for documents with a lot of child docume

Approach for Merge Database and Files

2018-06-26 Thread angeladdati
Hi: I have two sources to indexing: Database: MetadataDB1, MetadataDB2, File Url... Files: MetadataF1, MetadataF2, File Url, Contain... I index the data base and the files. When I search, I need search and show the merge result: Database + Files (MetadataDb1, MetadataDB2, MetadataF1, MetadataF2,

Re: Approach for Merge Database and Files

2018-06-26 Thread Peter Gylling Jørgensen
Hi, I would create a search alias, that contains the latest versions of the different collections. See: https://lucene.apache.org/solr/guide/7_3/collections-api.html#collections-api Then you use this alias to search for results You get better results if you define the same schema for all colle

Re: Solr Default query parser

2018-06-26 Thread Jason Gerlowski
The "Standard Query Parser" _is_ the lucene query parser. They're the same parser. As Shawn pointed out above, they're also the default, so if you don't specify any defType, they will be used. Though if you want to be explicit and specify it anyway, the value is defType=lucene Jason On Mon, Jun

Re: Indexing Approach

2018-06-26 Thread Shawn Heisey
On 6/26/2018 12:06 AM, solrnoobie wrote: We are having errors such as heap space error in our indexing so we decided to lower the batch size to 50. The problem with this is that sometimes it really does not help since 1 document can contain 1000 child documents and it will still have the heap err

Re: Indexing part of Binary Documents and not the entire contents

2018-06-26 Thread neotorand
Thanks Erick, Though i saw this article in several places but never went through it seriously. Dont you think the below method is very exepensive autoParser.parse(input, textHandler, metadata, context); If the document size if bigger than it will need enough memory to hold the document(ie Cont

Re: Indexing part of Binary Documents and not the entire contents

2018-06-26 Thread neotorand
Thanks Shawn, Yes I agree ERH is never suggested in production. I am writing my custom ones. Any pointer with this? What exactly i am looking is a custom indexing program to compile precisely the information that you need and send that to Solr. On the other hand i see the below method is very ex

Adding tag to fq makes query return child docs instead of parent docs

2018-06-26 Thread Florian Fankhauser
} acquisition_date_i:20180626 This works as expected. Now for some reason I want to exclude the above filter-query from a facet-query. Therefore I need to add a tag to the filter-query: q={!tag=datefilter}{!parent which=doc_type_s:book} acquisition_date_i:20180626 And now the error occures: Just by

Re: Indexing part of Binary Documents and not the entire contents

2018-06-26 Thread Shawn Heisey
On 6/26/2018 7:13 AM, neotorand wrote: Dont you think the below method is very exepensive autoParser.parse(input, textHandler, metadata, context); If the document size if bigger than it will need enough memory to hold the document(ie ContentHandler). Any other alternative? I did find this: h

Re: Adding tag to fq makes query return child docs instead of parent docs

2018-06-26 Thread Shawn Heisey
On 6/26/2018 7:22 AM, Florian Fankhauser wrote: Now for some reason I want to exclude the above filter-query from a facet-query. Therefore I need to add a tag to the filter-query: q={!tag=datefilter}{!parent which=doc_type_s:book} acquisition_date_i:20180626 According to the

Re: Indexing part of Binary Documents and not the entire contents

2018-06-26 Thread Erick Erickson
Well, if you were using ERH you'd have the same problem as it uses Tika. At least if you run Tika on some client somewhere, if you do have a document that blows out memory or has some other problem, your client can crash without taking Solr with it. That's one of the reasons, in fact, that we don'

Re: Approach for Merge Database and Files

2018-06-26 Thread Erick Erickson
>From your problem description, it looks like you want to gather the data from the DB and filesystem and combine them into a Solr document at index time, then index that document. Put enough information in Solr to fetch the document as necessary, often people don't put the entire file in Solr espe

Re: Total Collection Size in Solr 7

2018-06-26 Thread Erick Erickson
Some work is being done on the admin UI, there are several JIRAs. Perhaps you'd like to join that conversation? We need to have input, especially in terms of what kinds of information would be useful from a practitioner's standpoint. Best, Erick On Mon, Jun 25, 2018 at 11:26 PM, Aroop Ganguly wr

Re: Approach for Merge Database and Files

2018-06-26 Thread Angel Addati
Thank both. *"From your problem description, it looks like you want to gather the data from the DB and filesystem and combine them into a Solr document at index time, then index that document. " * Exactly. I don't know if the best approach is combine in index time or in query time. But I need sea

Configuring load balancer for Kerberised Solr cluster

2018-06-26 Thread mosheB
We are trying to enable authentication mechanism in our Solr cluster using Kerberos authentication plugin. We use Active Directory as our KDC, each Solr node has its own SPN in the form of HTTP/@ and things are working as expected. Things are getting complicated while trying to configure our load b

Re: Indexing Approach

2018-06-26 Thread solrnoobie
Thanks for the tip. Although we have increased our application's heap to 4g and it is still not enough. I guess here are the things we think we did wrong: - Each SP call will return 15 result sets. - Each document can contain 300-1000 child documents. - If the batch size is 1000, the child docum

Re: Approach for Merge Database and Files

2018-06-26 Thread Erick Erickson
bq. I don't know if the best approach is combine in index time or in query time It Depends (tm). What is your goal? Let's say you have db_f1 and fm_f2 (db == from the database and fm = file data). If you want to form a Solr query like db_f1:something fm_f2:something_else you don't have much ch

AW: Adding tag to fq makes query return child docs instead of parent docs

2018-06-26 Thread Florian Fankhauser
_type_s:book} > acquisition_date_i:20180626 According to the documentation: https://lucene.apache.org/solr/guide/6_6/local-parameters-in-queries.html#LocalParametersinQueries-BasicSyntaxofLocalParameters You can't specify multiple localparams like that - it says "You ma

Re: Indexing Approach

2018-06-26 Thread Shawn Heisey
On 6/26/2018 8:24 AM, solrnoobie wrote: > - Each SP call will return 15 result sets. > - Each document can contain 300-1000 child documents. > - If the batch size is 1000, the child documents for each can contain > 300-1000 documents so that will eat up the 4g's allocated to the > application. If

Re: Total Collection Size in Solr 7

2018-06-26 Thread Aroop Ganguly
Hi Erick Sure I will look those jiras up. In the interim, is what Susmit suggested the only way to get the size info? Or is there something else you can recommend? Thanks Aroop > On Jun 26, 2018, at 6:53 AM, Erick Erickson wrote: > > Some work is being done on the admin UI, there are seve

Create an index field of type dictionary

2018-06-26 Thread Ritesh Kumar (Avanade)
Hello, Is it possible to create an index field of type dictionary. I have seen stringarry, datetime, bool etc. but I am looking for a field type like list of objects. Thanks [OCP Logo] Ritesh Avanade Infrastructure Team +1 (425) 588-7853 v-kur...@micrsoft.com

Re: Total Collection Size in Solr 7

2018-06-26 Thread Erick Erickson
Aroop: Not that I know of. You could do a reasonable approximation by 1> check the index size (manually) with, say, 10M docs 2> check it again with 20M docs 3> use a match all docs query and do the math. That's clumsy but do-able. The reason I start with 10M and 20M is that index size does not go

Re: Create an index field of type dictionary

2018-06-26 Thread Erick Erickson
Well, there's a multiValued field that's just a list of whatever (string, date, numeric, etc). What's the use-case? This feels like an "XY" problem. a "dictionary" type is usually some kind of structure that how want to have operate in a specific manner. Solr doesn't really deal at that level, it

Re: Total Collection Size in Solr 7

2018-06-26 Thread Aroop Ganguly
Hi Eric Thanks for the advice. One open question still, about point 1 below: how to get that magic number of size in GBs :) ? As I am mostly using streaming expressions, most of my fields are DocValues and not stored. I will look at the health endpoint to see what it gives me in connection wit

RE: Create an index field of type dictionary

2018-06-26 Thread Ritesh Kumar (Avanade)
Hey Eric, Thanks for response, it was a Sitecore related modifications we had to do to make it work. Thanks Ritesh -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, June 26, 2018 10:52 AM To: solr-user Subject: Re: Create an index field of type d

Linux command to print top slow performing query (/get) from solr logs

2018-06-26 Thread Ganesh Sethuraman
Is there a way to print using Linux commands to print top slow performing queries from Solr 7 logs (/get handler or /select handler)? In the reverse sorted order across log files will be very useful and handy to trouble shoot Regards Ganesh

Change/Override Solrconfig.xml across collections

2018-06-26 Thread Ganesh Sethuraman
I would like to implement the Slow Query logging feature ( https://lucene.apache.org/solr/guide/6_6/configuring-logging.html#ConfiguringLogging-LoggingSlowQueries) across multiple collection without changing solrconfig.xml in each and every collection. Is that possible? I am using solr 7.2.1 If th