Re: DIH: Create Child Documents in ScriptTransformer
Hi, thanks for all the feedback. The context parameter in the ScriptTransformer is new to me - thanks for this insight. I could not find it in any docs. So just for people that also did not know it: you can have the ScriptTransformer with 2 parameters, e.g. function mytransformer(row,context){ } The following Javadoc gives some hints on what you can do with the context: https://lucene.apache.org/solr/8_2_0/solr-dataimporthandler/org/apache/solr/handler/dataimport/Context.html Despite all this, I came to the conclusion that adding child docs in a ScriptTransformer in DIH are not supported. One can though use a StatelessScriptUpdateProcessFactory, see https://lucene.apache.org/solr/8_2_0//solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html and https://cwiki.apache.org/confluence/display/solr/ScriptUpdateProcessor#ScriptUpdateProcessor-JavaScript Hint on how to add child documents to a SolrInputDocument: http://lucene.apache.org/solr/8_2_0/solr-solrj/index.html?org/apache/solr/common/SolrInputDocument.html Nevertheless, I agree that one should use an external tool, which depending on the needs can though also mean some complexity (e.g. supporting individual transformations per collection without code, but configuration/plugins etc.). While this is not a problem, it might be good to start an open source loader that goes beyond the post tool ( https://lucene.apache.org/solr/guide/8_1/post-tool.html). best regards On Thu, Sep 19, 2019 at 8:54 AM Mikhail Khludnev wrote: > Hello, Jörn. > Have you tried to find a parent doc in the context which is passed as a > second argument into ScriptTransformer? > > On Wed, Sep 18, 2019 at 9:56 PM Jörn Franke wrote: > > > > Hi, > > > > I load a set of documents. Based on these documents some logic needs to > be > > applied to split them into chapters (this is done). One whole document is > > loaded as a parent. Chapters of the whole document + metadata should be > > loaded as child documents of this parent. > > I want to now collect information on how this can be done: > > * Use a custom loader - this is possible and works > > * Use DIH and extract the chapters in a ScriptTransformer and add them as > > child documents there. However, the scripttransformer receives as input > > only a HashMap and while it works to transform field values etc. It does > > not seem possible to add childdocuments within the DIH scripttransformer. > I > > tried adding a JavaArray with SolrInputDocuments, but this does not seem > to > > work. I see in debug/verbose mode that indeed the transformer adds them > to > > the HashMap correctly, but they don't end up in the document. Maybe here > it > > could be possible somehow via nested entities? > > * Use DIH+ an UpdateProcessor (Script): there i get the SolrInputDocument > > as a parameter and it seems feasible to extract chapters and add them as > > child documents. > > > > thank you. > > > > best regards > > > > -- > Sincerely yours > Mikhail Khludnev >
Re: DIH: Create Child Documents in ScriptTransformer
Hello, Jörn. Have you tried to find a parent doc in the context which is passed as a second argument into ScriptTransformer? On Wed, Sep 18, 2019 at 9:56 PM Jörn Franke wrote: > > Hi, > > I load a set of documents. Based on these documents some logic needs to be > applied to split them into chapters (this is done). One whole document is > loaded as a parent. Chapters of the whole document + metadata should be > loaded as child documents of this parent. > I want to now collect information on how this can be done: > * Use a custom loader - this is possible and works > * Use DIH and extract the chapters in a ScriptTransformer and add them as > child documents there. However, the scripttransformer receives as input > only a HashMap and while it works to transform field values etc. It does > not seem possible to add childdocuments within the DIH scripttransformer. I > tried adding a JavaArray with SolrInputDocuments, but this does not seem to > work. I see in debug/verbose mode that indeed the transformer adds them to > the HashMap correctly, but they don't end up in the document. Maybe here it > could be possible somehow via nested entities? > * Use DIH+ an UpdateProcessor (Script): there i get the SolrInputDocument > as a parameter and it seems feasible to extract chapters and add them as > child documents. > > thank you. > > best regards -- Sincerely yours Mikhail Khludnev
Re: DIH: Create Child Documents in ScriptTransformer
I fully agree. However, I am just curious to see the limits. > Am 18.09.2019 um 23:33 schrieb Erick Erickson : > > When it starts getting complex, I usually move to SolrJ. You say > you're loading documents, so I assume Tika is in the mix too. > > Here's a blog on the topic so you an see how to get started... > > https://lucidworks.com/post/indexing-with-solrj/ > > Best, > Erick > >> On Wed, Sep 18, 2019 at 2:56 PM Jörn Franke wrote: >> >> Hi, >> >> I load a set of documents. Based on these documents some logic needs to be >> applied to split them into chapters (this is done). One whole document is >> loaded as a parent. Chapters of the whole document + metadata should be >> loaded as child documents of this parent. >> I want to now collect information on how this can be done: >> * Use a custom loader - this is possible and works >> * Use DIH and extract the chapters in a ScriptTransformer and add them as >> child documents there. However, the scripttransformer receives as input >> only a HashMap and while it works to transform field values etc. It does >> not seem possible to add childdocuments within the DIH scripttransformer. I >> tried adding a JavaArray with SolrInputDocuments, but this does not seem to >> work. I see in debug/verbose mode that indeed the transformer adds them to >> the HashMap correctly, but they don't end up in the document. Maybe here it >> could be possible somehow via nested entities? >> * Use DIH+ an UpdateProcessor (Script): there i get the SolrInputDocument >> as a parameter and it seems feasible to extract chapters and add them as >> child documents. >> >> thank you. >> >> best regards
Re: DIH: Create Child Documents in ScriptTransformer
When it starts getting complex, I usually move to SolrJ. You say you're loading documents, so I assume Tika is in the mix too. Here's a blog on the topic so you an see how to get started... https://lucidworks.com/post/indexing-with-solrj/ Best, Erick On Wed, Sep 18, 2019 at 2:56 PM Jörn Franke wrote: > > Hi, > > I load a set of documents. Based on these documents some logic needs to be > applied to split them into chapters (this is done). One whole document is > loaded as a parent. Chapters of the whole document + metadata should be > loaded as child documents of this parent. > I want to now collect information on how this can be done: > * Use a custom loader - this is possible and works > * Use DIH and extract the chapters in a ScriptTransformer and add them as > child documents there. However, the scripttransformer receives as input > only a HashMap and while it works to transform field values etc. It does > not seem possible to add childdocuments within the DIH scripttransformer. I > tried adding a JavaArray with SolrInputDocuments, but this does not seem to > work. I see in debug/verbose mode that indeed the transformer adds them to > the HashMap correctly, but they don't end up in the document. Maybe here it > could be possible somehow via nested entities? > * Use DIH+ an UpdateProcessor (Script): there i get the SolrInputDocument > as a parameter and it seems feasible to extract chapters and add them as > child documents. > > thank you. > > best regards
DIH: Create Child Documents in ScriptTransformer
Hi, I load a set of documents. Based on these documents some logic needs to be applied to split them into chapters (this is done). One whole document is loaded as a parent. Chapters of the whole document + metadata should be loaded as child documents of this parent. I want to now collect information on how this can be done: * Use a custom loader - this is possible and works * Use DIH and extract the chapters in a ScriptTransformer and add them as child documents there. However, the scripttransformer receives as input only a HashMap and while it works to transform field values etc. It does not seem possible to add childdocuments within the DIH scripttransformer. I tried adding a JavaArray with SolrInputDocuments, but this does not seem to work. I see in debug/verbose mode that indeed the transformer adds them to the HashMap correctly, but they don't end up in the document. Maybe here it could be possible somehow via nested entities? * Use DIH+ an UpdateProcessor (Script): there i get the SolrInputDocument as a parameter and it seems feasible to extract chapters and add them as child documents. thank you. best regards