Re: [EXTERNAL] Re: How to merge child documents using DataImportHandler
? Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone Original message From: Mikhail Khludnev Date: 5/27/18 3:23 PM (GMT-05:00) To: solr-user Subject: [EXTERNAL] Re: How to merge child documents using DataImportHandler Hello, Abhijit. Have you tried to drop some of child=true? They usually cause slicing to separate documents, rather than default "merge to root" mode. On Sun, May 27, 2018 at 9:48 PM, Abhijit Pawar wrote: > > Hello, > > I am using DataImportHandler to index data from mongoDB. > > Here's how my data-source-config file looks like: > > > driver="com.mongodb.jdbc.MongoDriver" url="mongodb://< Address>>:27017/<>"/> > > entityA(Root Entity) - *products* >entityB (child=true,pk=unique field) - *skus* > entityC - *attributevalues* > entityD - *attributenames* > entityE(child=true,pk=unique field) - *skupricelist* > > > When data is indexed separate *skupricelist* documents are created for > each attribute (since *skupricelist* is child of *skus* and under > *attributenames*).How can I merge / join the all those skupricelist > documents with all attributes in same document? > > example : > Right now the documents created are as follows: > > Separate document 1 > { > 'PRODUCT NAME':'ABC', > 'SKU NAME':'ABC-1', > 'Color':'Red', > 'SKUPricelist':'SKUPricelistA' > } > > Separate document 2 > { > 'PRODUCT NAME':'ABC', > 'SKU':'ABC-1', > 'Size':'10', > 'SKUPricelist':'SKUPricelistA' > } > > Separate document 3 > { > 'PRODUCT NAME':'ABC', > 'SKU':'ABC-1', > 'Type':'Leather', > 'SKUPricelist':'SKUPricelistA' > } > > Is there a way I can join them like this? > > { > 'PRODUCT NAME':'ABC', > 'SKU':'ABC-1', > 'Color':'Red', > 'Size':'10', > 'Type':'Leather', > 'SKUPricelist':'SKUPricelistA' > } > > Thank You. > Regards, > > Abhijit > -- Sincerely yours Mikhail Khludnev Nothing in this message is intended to constitute an electronic signature unless a specific statement to the contrary is included in this message. Confidentiality Note: This message is intended only for the person or entity to which it is addressed. It may contain confidential and/or privileged material. Any review, transmission, dissemination or other use, or taking of any action in reliance upon this message by persons or entities other than the intended recipient is prohibited and may be unlawful. If you received this message in error, please contact the sender and delete it from your computer.
Re: Slower queries with 7.3.1?
Thanks Deepak. I think I understand the cause of the slowdown. There are some flamegraphs (from stack sampling) on SOLR-12407. I also captured some traces using yourkit. On Sun, May 27, 2018 at 1:21 PM, Deepak Goel wrote: > Is it possible to profile the code to find the exact points which are > taking more time comparatively? > > On Sun, 27 May 2018, 06:02 Will Currie, wrote: > > > I raised https://issues.apache.org/jira/browse/SOLR-12407. In case > anybody > > else sees a similar slowdown with boosts. > > > > On Sat, May 26, 2018 at 4:10 PM, Will Currie wrote: > > > > > I did some more (micro)benchmarking with a single query. Setting the > > query > > > cache size to zero I see 400ms response time on 7.2 and 600ms on 7.3. > > > Running curl in a loop on my laptop. ~4M docs. ~3G index. 1M total hits > > > for the query.. Yup. I'm reluctant to post the query. It has multiple > > 300+ > > > character streams of if,product,map calls in multiple boost parameters. > > > > > > I realise my query is likely ridiculous (inefficient, better done > another > > > way, etc) but LUCENE-8099 mentions: > > > "Re performance: there shouldn't be any reason for things to be slower > > ... > > > It might be useful to add some examples of these queries to the > benchmark > > > tests though." > > > > > > Maybe I have such a benchmark.. Grasping at straws guess, I noticed 7.2 > > > sticks with floats. 7.3 does a few frames of math with doubles before > > > returning to floats. > > > > > > jstack from 7.2: > > > > > > "qtp2136344592-24" #24 prio=5 os_prio=31 tid=0x7f80630e5000 > > nid=0x7103 > > > runnable [0x749bb000] > > >java.lang.Thread.State: RUNNABLE > > > at org.apache.lucene.queries.function.valuesource. > > > ProductFloatFunction.func(ProductFloatFunction.java:41) > > > at org.apache.lucene.queries.function.valuesource. > > > MultiFloatFunction$1.floatVal(MultiFloatFunction.java:82) > > > at org.apache.lucene.queries.function.valuesource. > IfFunction$1.floatVal( > > > IfFunction.java:64) > > > at org.apache.lucene.queries.function.valuesource. > > > ProductFloatFunction.func(ProductFloatFunction.java:41) > > > at org.apache.lucene.queries.function.valuesource. > > > MultiFloatFunction$1.floatVal(MultiFloatFunction.java:82) > > > at org.apache.lucene.queries.function.valuesource. > IfFunction$1.floatVal( > > > IfFunction.java:64) > > > at org.apache.lucene.queries.function.valuesource. > IfFunction$1.floatVal( > > > IfFunction.java:64) > > > at org.apache.lucene.queries.function.valuesource. > > > ProductFloatFunction.func(ProductFloatFunction.java:41) > > > at org.apache.lucene.queries.function.valuesource. > > > MultiFloatFunction$1.floatVal(MultiFloatFunction.java:82) > > > * at > > > > > org.apache.lucene.queries.function.BoostedQuery$CustomScorer.score( > BoostedQuery.java:124)* > > > at org.apache.lucene.search.TopScoreDocCollector$ > > > SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:64) > > > at org.apache.lucene.search.Weight$DefaultBulkScorer. > > > scoreAll(Weight.java:233) > > > at org.apache.lucene.search.Weight$DefaultBulkScorer. > > > score(Weight.java:184) > > > at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39) > > > at org.apache.lucene.search.IndexSearcher.search( > IndexSearcher.java:660) > > > at org.apache.lucene.search.IndexSearcher.search( > IndexSearcher.java:462) > > > at org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain( > > > SolrIndexSearcher.java:215) > > > > > > jstack from 7.3.1: > > > > > > "qtp559670971-25" #25 prio=5 os_prio=31 tid=0x7fe23fa0c000 > nid=0x7303 > > > runnable [0x7b024000] > > >java.lang.Thread.State: RUNNABLE > > > at org.apache.lucene.queries.function.valuesource. > IfFunction$1.floatVal( > > > IfFunction.java:64) > > > at org.apache.lucene.queries.function.valuesource. > > > ProductFloatFunction.func(ProductFloatFunction.java:41) > > > at org.apache.lucene.queries.function.valuesource. > > > MultiFloatFunction$1.floatVal(MultiFloatFunction.java:82) > > > at org.apache.lucene.queries.function.valuesource. > IfFunction$1.floatVal( > > > IfFunction.java:64) > > > at org.apache.lucene.queries.function.valuesource. > > > ProductFloatFunction.func(ProductFloatFunction.java:41) > > > at org.apache.lucene.queries.function.valuesource. > > > MultiFloatFunction$1.floatVal(MultiFloatFunction.java:82) > > > at org.apache.lucene.queries.function.valuesource. > IfFunction$1.floatVal( > > > IfFunction.java:64) > > > at org.apache.lucene.queries.function.valuesource. > IfFunction$1.floatVal( > > > IfFunction.java:64) > > > at org.apache.lucene.queries.function.valuesource. > > > ProductFloatFunction.func(ProductFloatFunction.java:41) > > > at org.apache.lucene.queries.function.valuesource. > > > MultiFloatFunction$1.floatVal(MultiFloatFunction.java:82) > > > * at > > > > > org.apache.lucene.queries.function.docvalues.FloatDocValues.doubleVal( > FloatDocValues.java:67)* > > > * at > > > > > org.apache.lu
Re: How to merge child documents using DataImportHandler
Hi Mikhail, Yes I already tried that dropping child=true for skupricelist document. However then it does not index data from that collection at all. I need it as I am inheriting some properties from skus collection and some from attributevalues and attributenames collection. Also here data from skus, attributevalues and attributenames collecitions is already merged under same document. However data from skupricelist data is split into separate documents for every attribute. Regards, Abhijit On Sun, May 27, 2018 at 2:24 PM Mikhail Khludnev wrote: > Hello, Abhijit. > Have you tried to drop some of child=true? They usually cause slicing to > separate documents, rather than default "merge to root" mode. > > On Sun, May 27, 2018 at 9:48 PM, Abhijit Pawar > > wrote: > > > > > Hello, > > > > I am using DataImportHandler to index data from mongoDB. > > > > Here's how my data-source-config file looks like: > > > > > > > driver="com.mongodb.jdbc.MongoDriver" url="mongodb://< > Address>>:27017/<>"/> > > > > entityA(Root Entity) - *products* > >entityB (child=true,pk=unique field) - *skus* > > entityC - *attributevalues* > > entityD - *attributenames* > > entityE(child=true,pk=unique field) - *skupricelist* > > > > > > When data is indexed separate *skupricelist* documents are created for > > each attribute (since *skupricelist* is child of *skus* and under > > *attributenames*).How can I merge / join the all those skupricelist > > documents with all attributes in same document? > > > > example : > > Right now the documents created are as follows: > > > > Separate document 1 > > { > > 'PRODUCT NAME':'ABC', > > 'SKU NAME':'ABC-1', > > 'Color':'Red', > > 'SKUPricelist':'SKUPricelistA' > > } > > > > Separate document 2 > > { > > 'PRODUCT NAME':'ABC', > > 'SKU':'ABC-1', > > 'Size':'10', > > 'SKUPricelist':'SKUPricelistA' > > } > > > > Separate document 3 > > { > > 'PRODUCT NAME':'ABC', > > 'SKU':'ABC-1', > > 'Type':'Leather', > > 'SKUPricelist':'SKUPricelistA' > > } > > > > Is there a way I can join them like this? > > > > { > > 'PRODUCT NAME':'ABC', > > 'SKU':'ABC-1', > > 'Color':'Red', > > 'Size':'10', > > 'Type':'Leather', > > 'SKUPricelist':'SKUPricelistA' > > } > > > > Thank You. > > Regards, > > > > Abhijit > > > > > > -- > Sincerely yours > Mikhail Khludnev >
Re: delta-update alternative on filechanges when using FileListEntityProcessor
The best practice is not to use DIH in production. It is great for several rounds of prototyping but then things get messy and uneven as you found. The delete logic is always extra messy. So, this may be a good point to switch to an external client and implement the monitoring logic there. Regards, Alex P.s. or you could reindex everything periodically in a separate collection and swap it into production. No delete logic. On Sun, May 27, 2018, 2:48 PM Thomas Lustig, wrote: > I configured a DataImportHandler using a FileListEntityProcessor to import > files from a folder. > This setup works really great, but i do not now how i should handle changes > on the filesystem (e.g. files added, deleted,...) > Should I always do a "full-import"? As far as i read "delta-import" is only > supported by SqlEntityProcessor. > Is there a best practise, that is recommended? > Thanks in advance for helping me > > Br > Tom >
Re: How to merge child documents using DataImportHandler
Hello, Abhijit. Have you tried to drop some of child=true? They usually cause slicing to separate documents, rather than default "merge to root" mode. On Sun, May 27, 2018 at 9:48 PM, Abhijit Pawar wrote: > > Hello, > > I am using DataImportHandler to index data from mongoDB. > > Here's how my data-source-config file looks like: > > > driver="com.mongodb.jdbc.MongoDriver" url="mongodb://< Address>>:27017/<>"/> > > entityA(Root Entity) - *products* >entityB (child=true,pk=unique field) - *skus* > entityC - *attributevalues* > entityD - *attributenames* > entityE(child=true,pk=unique field) - *skupricelist* > > > When data is indexed separate *skupricelist* documents are created for > each attribute (since *skupricelist* is child of *skus* and under > *attributenames*).How can I merge / join the all those skupricelist > documents with all attributes in same document? > > example : > Right now the documents created are as follows: > > Separate document 1 > { > 'PRODUCT NAME':'ABC', > 'SKU NAME':'ABC-1', > 'Color':'Red', > 'SKUPricelist':'SKUPricelistA' > } > > Separate document 2 > { > 'PRODUCT NAME':'ABC', > 'SKU':'ABC-1', > 'Size':'10', > 'SKUPricelist':'SKUPricelistA' > } > > Separate document 3 > { > 'PRODUCT NAME':'ABC', > 'SKU':'ABC-1', > 'Type':'Leather', > 'SKUPricelist':'SKUPricelistA' > } > > Is there a way I can join them like this? > > { > 'PRODUCT NAME':'ABC', > 'SKU':'ABC-1', > 'Color':'Red', > 'Size':'10', > 'Type':'Leather', > 'SKUPricelist':'SKUPricelistA' > } > > Thank You. > Regards, > > Abhijit > -- Sincerely yours Mikhail Khludnev
How to merge child documents using DataImportHandler
Hello, I am using DataImportHandler to index data from mongoDB. Here's how my data-source-config file looks like: entityA(Root Entity) - *products* entityB (child=true,pk=unique field) - *skus* entityC - *attributevalues* entityD - *attributenames* entityE(child=true,pk=unique field) - *skupricelist* When data is indexed separate *skupricelist* documents are created for each attribute (since *skupricelist* is child of *skus* and under *attributenames*).How can I merge / join the all those skupricelist documents with all attributes in same document? example : Right now the documents created are as follows: Separate document 1 { 'PRODUCT NAME':'ABC', 'SKU NAME':'ABC-1', 'Color':'Red', 'SKUPricelist':'SKUPricelistA' } Separate document 2 { 'PRODUCT NAME':'ABC', 'SKU':'ABC-1', 'Size':'10', 'SKUPricelist':'SKUPricelistA' } Separate document 3 { 'PRODUCT NAME':'ABC', 'SKU':'ABC-1', 'Type':'Leather', 'SKUPricelist':'SKUPricelistA' } Is there a way I can join them like this? { 'PRODUCT NAME':'ABC', 'SKU':'ABC-1', 'Color':'Red', 'Size':'10', 'Type':'Leather', 'SKUPricelist':'SKUPricelistA' } Thank You. Regards, Abhijit
delta-update alternative on filechanges when using FileListEntityProcessor
I configured a DataImportHandler using a FileListEntityProcessor to import files from a folder. This setup works really great, but i do not now how i should handle changes on the filesystem (e.g. files added, deleted,...) Should I always do a "full-import"? As far as i read "delta-import" is only supported by SqlEntityProcessor. Is there a best practise, that is recommended? Thanks in advance for helping me Br Tom
Weird behavioural differences between pf in dismax and edismax
Hello, I experienced a weird behaviour with dismax and edismax query parsers. Dismax will include pf boosts when we query something that has just a single word, edismax on the other hand will not include pf boosts. The result is that a dismax and an edismax handler with the same set of defaults, return different results for single word queries (eg. "Hello") but the same results for multi word queries (eg. "Hello Wold") Is this expected? Regards, Sam