input XSLT
Just as you have an xslt response writer to convert Solr xml response to make it compatible with any application, on the input side do you have an xslt module that will parse xml documents to solr format before posting them to solr indexer. I have gone through dataimporthandler, but it works in data 'pull' mode i.e. solr pulls data from the given location. I would still want to work with applications 'posting' documents to solr indexer as and when they want. Regards, CI
Re: input XSLT
This might be possible with the Solr Cell contrib (i.e ExtractingRequestHandler) since it can parse XML and extract from XML, but that it is slightly different from what you are asking for, I think. See http://wiki.apache.org/solr/ExtractingRequestHandler You might also want to check out Tika, -Grant On Mar 10, 2009, at 2:47 AM, CIF Search wrote: Just as you have an xslt response writer to convert Solr xml response to make it compatible with any application, on the input side do you have an xslt module that will parse xml documents to solr format before posting them to solr indexer. I have gone through dataimporthandler, but it works in data 'pull' mode i.e. solr pulls data from the given location. I would still want to work with applications 'posting' documents to solr indexer as and when they want. Regards, CI -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: input XSLT
: > Just as you have an xslt response writer to convert Solr xml response to : > make it compatible with any application, on the input side do you have an : > xslt module that will parse xml documents to solr format before posting them : > to solr indexer. I have gone through dataimporthandler, but it works in data some Proof Of Concept work was done in the past, but it never really took off... https://issues.apache.org/jira/browse/SOLR-285 https://issues.apache.org/jira/browse/SOLR-370 now that we have DIH, I think another approach (that would fit better with how things currently are) would be having a "ContentStreamDataSource" for DIH analogous to the HttpDataSource (except without any explicit knowledge of URLs) thatresepected the standard COntentStream params and could then work with the XPathEntityProcessor -Hoss
Re: input XSLT
On Tue, Mar 10, 2009 at 12:17 PM, CIF Search wrote: > Just as you have an xslt response writer to convert Solr xml response to > make it compatible with any application, on the input side do you have an > xslt module that will parse xml documents to solr format before posting them > to solr indexer. I have gone through dataimporthandler, but it works in data > 'pull' mode i.e. solr pulls data from the given location. I would still want > to work with applications 'posting' documents to solr indexer as and when > they want. it is a limitation of DIH, but if you can put your xml in a file behind an http server then you can fire a command to DIH to pull data from the url quite easily. > > Regards, > CI > -- --Noble Paul
Re: input XSLT
There is a fundamental problem with using 'pull' approach using DIH. Normally people want a delta imports which are done using a timestamp field. Now it may not always be possible for application servers to sync their timestamps (given protocol restrictions due to security reasons). Due to this Solr application is likely to miss a few records occasionally. Such a problem does not arise if applications themseleves identify their records and post. Should we not have such a feature in Solr, which will allow users to push data onto the index in whichever format they wish to? This will also facilitate plugging in solr seamlessly with all kinds of applications. Regards, CI On Wed, Mar 11, 2009 at 11:52 PM, Noble Paul നോബിള് नोब्ळ् < noble.p...@gmail.com> wrote: > On Tue, Mar 10, 2009 at 12:17 PM, CIF Search wrote: > > Just as you have an xslt response writer to convert Solr xml response to > > make it compatible with any application, on the input side do you have an > > xslt module that will parse xml documents to solr format before posting > them > > to solr indexer. I have gone through dataimporthandler, but it works in > data > > 'pull' mode i.e. solr pulls data from the given location. I would still > want > > to work with applications 'posting' documents to solr indexer as and when > > they want. > it is a limitation of DIH, but if you can put your xml in a file > behind an http server then you can fire a command to DIH to pull data > from the url quite easily. > > > > Regards, > > CI > > > > > > -- > --Noble Paul >
Re: input XSLT
On Fri, Mar 13, 2009 at 11:36 AM, CIF Search wrote: > There is a fundamental problem with using 'pull' approach using DIH. > Normally people want a delta imports which are done using a timestamp > field. > Now it may not always be possible for application servers to sync their > timestamps (given protocol restrictions due to security reasons). Due to > this Solr application is likely to miss a few records occasionally. Such a > problem does not arise if applications themseleves identify their records > and post. Should we not have such a feature in Solr, which will allow users > to push data onto the index in whichever format they wish to? This will > also > facilitate plugging in solr seamlessly with all kinds of applications. > You can of course push your documents to Solr using the XML/CSV update (or using the solrj client). It's just that you can't push documents with DIH. http://wiki.apache.org/solr/#head-98c3ee61c5fc837b09e3dfe3fb420491c9071be3 -- Regards, Shalin Shekhar Mangar.
Re: input XSLT
But these documents have to be converted to a particular format before being posted. Any XML document cannot be posted to Solr (with XSLT handled by Solr internally). DIH handles any xml format, but it operates in pull mode. On Fri, Mar 13, 2009 at 11:45 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Fri, Mar 13, 2009 at 11:36 AM, CIF Search wrote: > > > There is a fundamental problem with using 'pull' approach using DIH. > > Normally people want a delta imports which are done using a timestamp > > field. > > Now it may not always be possible for application servers to sync their > > timestamps (given protocol restrictions due to security reasons). Due to > > this Solr application is likely to miss a few records occasionally. Such > a > > problem does not arise if applications themseleves identify their records > > and post. Should we not have such a feature in Solr, which will allow > users > > to push data onto the index in whichever format they wish to? This will > > also > > facilitate plugging in solr seamlessly with all kinds of applications. > > > > You can of course push your documents to Solr using the XML/CSV update (or > using the solrj client). It's just that you can't push documents with DIH. > > http://wiki.apache.org/solr/#head-98c3ee61c5fc837b09e3dfe3fb420491c9071be3 > > -- > Regards, > Shalin Shekhar Mangar. >
Re: input XSLT
Have you tried Solr Cell? http://wiki.apache.org/solr/ExtractingRequestHandler On Mar 13, 2009, at 2:49 AM, CIF Search wrote: But these documents have to be converted to a particular format before being posted. Any XML document cannot be posted to Solr (with XSLT handled by Solr internally). DIH handles any xml format, but it operates in pull mode. On Fri, Mar 13, 2009 at 11:45 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: On Fri, Mar 13, 2009 at 11:36 AM, CIF Search wrote: There is a fundamental problem with using 'pull' approach using DIH. Normally people want a delta imports which are done using a timestamp field. Now it may not always be possible for application servers to sync their timestamps (given protocol restrictions due to security reasons). Due to this Solr application is likely to miss a few records occasionally. Such a problem does not arise if applications themseleves identify their records and post. Should we not have such a feature in Solr, which will allow users to push data onto the index in whichever format they wish to? This will also facilitate plugging in solr seamlessly with all kinds of applications. You can of course push your documents to Solr using the XML/CSV update (or using the solrj client). It's just that you can't push documents with DIH. http://wiki.apache.org/solr/#head-98c3ee61c5fc837b09e3dfe3fb420491c9071be3 -- Regards, Shalin Shekhar Mangar. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: input XSLT
Does this solve your problem? https://issues.apache.org/jira/browse/SOLR-1065 On Wed, Mar 11, 2009 at 11:52 PM, Noble Paul നോബിള് नोब्ळ् wrote: > On Tue, Mar 10, 2009 at 12:17 PM, CIF Search wrote: >> Just as you have an xslt response writer to convert Solr xml response to >> make it compatible with any application, on the input side do you have an >> xslt module that will parse xml documents to solr format before posting them >> to solr indexer. I have gone through dataimporthandler, but it works in data >> 'pull' mode i.e. solr pulls data from the given location. I would still want >> to work with applications 'posting' documents to solr indexer as and when >> they want. > it is a limitation of DIH, but if you can put your xml in a file > behind an http server then you can fire a command to DIH to pull data > from the url quite easily. >> >> Regards, >> CI >> > > > > -- > --Noble Paul > -- --Noble Paul