Re: using Tika (ExtractingRequestHandler)

2012-06-05 Thread Jack Krupansky

Hoss,

In your edit, I noticed that the wiki makes "SolrPlugin" a link, but to a 
nonexistent page, although the page "SolrPlugins" does exist.


See: "it is provided as a SolrPlugin,"
http://wiki.apache.org/solr/ExtractingRequestHandler

I also noticed a few other things:

1. Reference to the "/site" directory that does not exist. So, the statement 
"Note, the /site directory in the solr download contains some nice example 
docs to try" is not terribly useful.

2. The path to tutorial.html should be "../../docs/api/doc-files"
3. There is no tutorial.pdf file as referenced in the curl examples.

-- Jack Krupansky

-Original Message- 
From: Chris Hostetter

Sent: Tuesday, June 05, 2012 6:47 PM
To: solr-user@lucene.apache.org
Subject: Re: using Tika (ExtractingRequestHandler)


I've updated the wiki to try and fill in some of these holes...

http://wiki.apache.org/solr/ExtractingRequestHandler

: i'm looking at using Tika to index a bunch of documents. the wiki page 
seems to be a little bit out of date ("// TODO: this is out of date as of 
Solr 1.4 - dist/apache-solr-cell-1.4.jar and all of contrib/extraction/lib 
are needed") and it also looks a little incomplete.

:
: is there an actual list of all the required jar files? i'm not sure they 
are in the same place in the 3.6.0 distribution as they were in 1.4, and 
having an actual list would be very helpful in figuring out where they are.

:
: as for "Sending Documents to Solr", is there any plan to address this 
todo: "// TODO: describe the different ways to send the documents to solr 
(POST body, form encoded, remoteStreaming)". this is really just a nice to 
have, i can see how to accomplish my goals using a method that is currently 
documented.

:
: thanks,
:richard
:

-Hoss 



Re: using Tika (ExtractingRequestHandler)

2012-06-05 Thread Chris Hostetter

I've updated the wiki to try and fill in some of these holes...

http://wiki.apache.org/solr/ExtractingRequestHandler

: i'm looking at using Tika to index a bunch of documents. the wiki page seems 
to be a little bit out of date ("// TODO: this is out of date as of Solr 1.4 - 
dist/apache-solr-cell-1.4.jar and all of contrib/extraction/lib are needed") 
and it also looks a little incomplete.
: 
: is there an actual list of all the required jar files? i'm not sure they are 
in the same place in the 3.6.0 distribution as they were in 1.4, and having an 
actual list would be very helpful in figuring out where they are.
: 
: as for "Sending Documents to Solr", is there any plan to address this todo: 
"// TODO: describe the different ways to send the documents to solr (POST body, 
form encoded, remoteStreaming)". this is really just a nice to have, i can see 
how to accomplish my goals using a method that is currently documented.
: 
: thanks,
:richard
: 

-Hoss


Re: using Tika (ExtractingRequestHandler)

2012-05-17 Thread Ahmet Arslan
> i'm looking at using Tika to index a
> bunch of documents. the wiki page seems to be a little bit
> out of date ("// TODO: this is out of date as of Solr 1.4 -
> dist/apache-solr-cell-1.4.jar and all of
> contrib/extraction/lib are needed") and it also looks a
> little incomplete.
> 
> is there an actual list of all the required jar files? i'm
> not sure they are in the same place in the 3.6.0
> distribution as they were in 1.4, and having an actual list
> would be very helpful in figuring out where they are.

Here is a list of 
  

If you want to use DIH :
 




using Tika (ExtractingRequestHandler)

2012-05-17 Thread Welty, Richard
i'm looking at using Tika to index a bunch of documents. the wiki page seems to 
be a little bit out of date ("// TODO: this is out of date as of Solr 1.4 - 
dist/apache-solr-cell-1.4.jar and all of contrib/extraction/lib are needed") 
and it also looks a little incomplete.

is there an actual list of all the required jar files? i'm not sure they are in 
the same place in the 3.6.0 distribution as they were in 1.4, and having an 
actual list would be very helpful in figuring out where they are.

as for "Sending Documents to Solr", is there any plan to address this todo: "// 
TODO: describe the different ways to send the documents to solr (POST body, 
form encoded, remoteStreaming)". this is really just a nice to have, i can see 
how to accomplish my goals using a method that is currently documented.

thanks,
   richard