input XSLT

2009-03-09 Thread CIF Search
Just as you have an xslt response writer to convert Solr xml response to
make it compatible with any application, on the input side do you have an
xslt module that will parse xml documents to solr format before posting them
to solr indexer. I have gone through dataimporthandler, but it works in data
'pull' mode i.e. solr pulls data from the given location. I would still want
to work with applications 'posting' documents to solr indexer as and when
they want.

Regards,
CI


Re: input XSLT

2009-03-10 Thread Grant Ingersoll
This might be possible with the Solr Cell contrib (i.e  
ExtractingRequestHandler) since it can parse XML and extract from XML,  
but that it is slightly different from what you are asking for, I  
think.  See http://wiki.apache.org/solr/ExtractingRequestHandler  You  
might also want to check out Tika,


-Grant


On Mar 10, 2009, at 2:47 AM, CIF Search wrote:

Just as you have an xslt response writer to convert Solr xml  
response to
make it compatible with any application, on the input side do you  
have an
xslt module that will parse xml documents to solr format before  
posting them
to solr indexer. I have gone through dataimporthandler, but it works  
in data
'pull' mode i.e. solr pulls data from the given location. I would  
still want
to work with applications 'posting' documents to solr indexer as and  
when

they want.

Regards,
CI


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: input XSLT

2009-03-11 Thread Chris Hostetter
: > Just as you have an xslt response writer to convert Solr xml response to
: > make it compatible with any application, on the input side do you have an
: > xslt module that will parse xml documents to solr format before posting them
: > to solr indexer. I have gone through dataimporthandler, but it works in data

some Proof Of Concept work was done in the past, but it never really took 
off...
https://issues.apache.org/jira/browse/SOLR-285
https://issues.apache.org/jira/browse/SOLR-370

now that we have DIH, I think another approach (that would fit better with 
how things currently are) would be having a "ContentStreamDataSource" for 
DIH analogous to the HttpDataSource (except without any explicit knowledge 
of URLs) thatresepected the standard COntentStream params and could then 
work with the XPathEntityProcessor

-Hoss



Re: input XSLT

2009-03-11 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Tue, Mar 10, 2009 at 12:17 PM, CIF Search  wrote:
> Just as you have an xslt response writer to convert Solr xml response to
> make it compatible with any application, on the input side do you have an
> xslt module that will parse xml documents to solr format before posting them
> to solr indexer. I have gone through dataimporthandler, but it works in data
> 'pull' mode i.e. solr pulls data from the given location. I would still want
> to work with applications 'posting' documents to solr indexer as and when
> they want.
it is a limitation of DIH, but if you can put your xml in a file
behind an http server then you can fire a command to DIH to pull data
from the url quite easily.
>
> Regards,
> CI
>



-- 
--Noble Paul


Re: input XSLT

2009-03-12 Thread CIF Search
There is a fundamental problem with using 'pull' approach using DIH.
Normally people want a delta imports which are done using a timestamp field.
Now it may not always be possible for application servers to sync their
timestamps (given protocol restrictions due to security reasons). Due to
this Solr application is likely to miss a few records occasionally. Such a
problem does not arise if applications themseleves identify their records
and post. Should we not have such a feature in Solr, which will allow users
to push data onto the index in whichever format they wish to? This will also
facilitate plugging in solr seamlessly with all kinds of applications.

Regards,
CI

On Wed, Mar 11, 2009 at 11:52 PM, Noble Paul നോബിള്‍ नोब्ळ् <
noble.p...@gmail.com> wrote:

>  On Tue, Mar 10, 2009 at 12:17 PM, CIF Search  wrote:
> > Just as you have an xslt response writer to convert Solr xml response to
> > make it compatible with any application, on the input side do you have an
> > xslt module that will parse xml documents to solr format before posting
> them
> > to solr indexer. I have gone through dataimporthandler, but it works in
> data
> > 'pull' mode i.e. solr pulls data from the given location. I would still
> want
> > to work with applications 'posting' documents to solr indexer as and when
> > they want.
> it is a limitation of DIH, but if you can put your xml in a file
> behind an http server then you can fire a command to DIH to pull data
> from the url quite easily.
> >
> > Regards,
> > CI
> >
>
>
>
> --
> --Noble Paul
>


Re: input XSLT

2009-03-12 Thread Shalin Shekhar Mangar
On Fri, Mar 13, 2009 at 11:36 AM, CIF Search  wrote:

> There is a fundamental problem with using 'pull' approach using DIH.
> Normally people want a delta imports which are done using a timestamp
> field.
> Now it may not always be possible for application servers to sync their
> timestamps (given protocol restrictions due to security reasons). Due to
> this Solr application is likely to miss a few records occasionally. Such a
> problem does not arise if applications themseleves identify their records
> and post. Should we not have such a feature in Solr, which will allow users
> to push data onto the index in whichever format they wish to? This will
> also
> facilitate plugging in solr seamlessly with all kinds of applications.
>

You can of course push your documents to Solr using the XML/CSV update (or
using the solrj client). It's just that you can't push documents with DIH.

http://wiki.apache.org/solr/#head-98c3ee61c5fc837b09e3dfe3fb420491c9071be3

-- 
Regards,
Shalin Shekhar Mangar.


Re: input XSLT

2009-03-12 Thread CIF Search
But these documents have to be converted to a particular format before being
posted. Any XML document cannot be posted to Solr (with XSLT handled by Solr
internally).
DIH handles any xml format, but it operates in pull mode.


On Fri, Mar 13, 2009 at 11:45 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Fri, Mar 13, 2009 at 11:36 AM, CIF Search  wrote:
>
> > There is a fundamental problem with using 'pull' approach using DIH.
> > Normally people want a delta imports which are done using a timestamp
> > field.
> > Now it may not always be possible for application servers to sync their
> > timestamps (given protocol restrictions due to security reasons). Due to
> > this Solr application is likely to miss a few records occasionally. Such
> a
> > problem does not arise if applications themseleves identify their records
> > and post. Should we not have such a feature in Solr, which will allow
> users
> > to push data onto the index in whichever format they wish to? This will
> > also
> > facilitate plugging in solr seamlessly with all kinds of applications.
> >
>
> You can of course push your documents to Solr using the XML/CSV update (or
> using the solrj client). It's just that you can't push documents with DIH.
>
> http://wiki.apache.org/solr/#head-98c3ee61c5fc837b09e3dfe3fb420491c9071be3
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: input XSLT

2009-03-13 Thread Grant Ingersoll

Have you tried Solr Cell?  http://wiki.apache.org/solr/ExtractingRequestHandler



On Mar 13, 2009, at 2:49 AM, CIF Search wrote:

But these documents have to be converted to a particular format  
before being
posted. Any XML document cannot be posted to Solr (with XSLT handled  
by Solr

internally).
DIH handles any xml format, but it operates in pull mode.


On Fri, Mar 13, 2009 at 11:45 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

On Fri, Mar 13, 2009 at 11:36 AM, CIF Search   
wrote:



There is a fundamental problem with using 'pull' approach using DIH.
Normally people want a delta imports which are done using a  
timestamp

field.
Now it may not always be possible for application servers to sync  
their
timestamps (given protocol restrictions due to security reasons).  
Due to
this Solr application is likely to miss a few records  
occasionally. Such

a
problem does not arise if applications themseleves identify their  
records
and post. Should we not have such a feature in Solr, which will  
allow

users
to push data onto the index in whichever format they wish to? This  
will

also
facilitate plugging in solr seamlessly with all kinds of  
applications.




You can of course push your documents to Solr using the XML/CSV  
update (or
using the solrj client). It's just that you can't push documents  
with DIH.


http://wiki.apache.org/solr/#head-98c3ee61c5fc837b09e3dfe3fb420491c9071be3

--
Regards,
Shalin Shekhar Mangar.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: input XSLT

2009-03-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
Does this solve your problem?

https://issues.apache.org/jira/browse/SOLR-1065



On Wed, Mar 11, 2009 at 11:52 PM, Noble Paul നോബിള്‍  नोब्ळ्
 wrote:
> On Tue, Mar 10, 2009 at 12:17 PM, CIF Search  wrote:
>> Just as you have an xslt response writer to convert Solr xml response to
>> make it compatible with any application, on the input side do you have an
>> xslt module that will parse xml documents to solr format before posting them
>> to solr indexer. I have gone through dataimporthandler, but it works in data
>> 'pull' mode i.e. solr pulls data from the given location. I would still want
>> to work with applications 'posting' documents to solr indexer as and when
>> they want.
> it is a limitation of DIH, but if you can put your xml in a file
> behind an http server then you can fire a command to DIH to pull data
> from the url quite easily.
>>
>> Regards,
>> CI
>>
>
>
>
> --
> --Noble Paul
>



-- 
--Noble Paul