Re: multiple binary documents into a single solr document - Vignette/OpenText integration

2010-12-10 Thread briankous

Hi there,

We are trying to replace opentext (V7.6) autonomy with solr  so that we can
index other contents, too.  Due to lack of manpower and time, the management
wants to buy the adapter if available. Do you know of any vendor who sells
the adapter or professional service?  Thank you.

Brian Ko
b...@behr.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/multiple-binary-documents-into-a-single-solr-document-Vignette-OpenText-integration-tp472172p2065107.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: multiple binary documents into a single solr document - Vignette/OpenText integration

2010-03-25 Thread Chris Hostetter

:  I tried calling the addFile() twice (one call for each file) and no
:  error but nothing getting indexed as well.
...
: Write your own RequestHandler that uses the existing ExtractingRequestHandler
: to actually parse the streams, and then you combine the results arbitrarily in
: your handler, eventually sending an AddUpdateCommand to the update processor.
: You can obtain both the update processor and SolrCell instance from
: req.getCore().

The key bit being: yes you contain attach multiple files to your request, 
and yes the SolrQueryRequest abstraction can handle that (it appears as 
two ContentStreams to the RequestHandler) but the existing 
ExtractingRequestHandler assumes there will only be one ContentStream and 
constructsa one document for it -- the API isn't really designed arround 
the idea of how to generate a single SolrInputDOcument from multipole 
COntentStreams (where would you get the title from? etc...)

There was talk about trying to generalize this, but i don't think anyone 
else has looked into it much.  Here's one refrence, but i definitely 
remember a more recent thread about this idea...

http://n3.nabble.com/ExtractingRequestHandler-and-XmlUpdateHandler-tt492202.html#a492211



-Hoss



Re: multiple binary documents into a single solr document - Vignette/OpenText integration

2010-03-24 Thread Andrzej Bialecki

On 2010-03-24 15:58, Fábio Aragão da Silva wrote:

hello there,
I'm working on the development of a piece of code that integrates Solr
with Vignette/OpenText Content Management, meaning Vignette content
instances will be indexed in solr when published and deleted from solr
when unpublished. I'm using solr 1.4, solrj and solr cell.

I've implemented most of the code and I've ran into only a single
issue so far: vignette content management supports the attachment of
multiple binary documents (such as .doc, .pdf or .xls files) to a
single content instance. I am mapping each content instance in
Vignette to a solr document, but now I have a content instance in
vignette with multiple binary files attached to it.

So my question is: is it possible to have more than one binary file
indexed into a single document in solr?

I'm a beginner in solr, but from what I understood I have two options
to index content using solrj: either to use UpdateRequest() and the
add() method to add a SolrInputDocument to the request (in case the
document doesn´t represent a binary file), or to use
ContentStreamUpdateRequest() and the addFile() method to add a binary
file to the content stream request.

I don't see a way, though, to say this document is comprised of two
files, a word and a pdf, so index them as one document in solr using
content1 and content2 fields - or merge their content into a single
'content' field).

I tried calling the addFile() twice (one call for each file) and no
error but nothing getting indexed as well.

ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest(/update/extract);
req.addFile(new File(file1.doc));
req.addFile(new File(file2.pdf));
req.setParam(literal.id, multiple_files_test);
req.setParam(uprefix, attr_);
req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
server.request(req);

Any thoughts on this would be greatly appreciated.


Write your own RequestHandler that uses the existing 
ExtractingRequestHandler to actually parse the streams, and then you 
combine the results arbitrarily in your handler, eventually sending an 
AddUpdateCommand to the update processor. You can obtain both the update 
processor and SolrCell instance from req.getCore().



--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com