Re: Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?

2012-03-19 Thread Yonik Seeley
On Mon, Mar 19, 2012 at 5:48 PM, vybe3142  wrote:
> Thanks for the response
>
> No, the file is plain text.
>
> All I'm trying to do is index plain ASCII text files via a remote reference
> to their file paths.

The XML update handler expects a specific format of XML.
The json, CSV, javabin update handlers likewise expect a specific
document format.

If you have Word, PDF, HTML, or plain text files, one way to index them is
http://wiki.apache.org/solr/ExtractingRequestHandler (aka Solr Cell)

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


Re: Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?

2012-03-19 Thread vybe3142
BTW, .. using the client I pasted, I get the same error even with the
standard supplied executable SOLR jar.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-for-SOLR-SOLRJ-to-index-files-directly-bypassing-HTTP-streaming-tp3833419p3840483.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?

2012-03-19 Thread vybe3142
Thanks for the response

No, the file is plain text. 

All I'm trying to do is index plain ASCII text files via a remote reference
to their file paths. 

I guess what I need to do is specify the content type as text. I don't think
a "content-type" param will help since this behavior is tied to the
BinaryRequestWriter() . There's got to be some built in functionality in
SOLR that will enable me to achieve this.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-for-SOLR-SOLRJ-to-index-files-directly-bypassing-HTTP-streaming-tp3833419p3840478.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?

2012-03-19 Thread Yonik Seeley
On Mon, Mar 19, 2012 at 4:38 PM, vybe3142  wrote:
> Okay, I added the javabin handler snippet to the solrconfig.xml file
> (actually shared across all cores).  I got further (the request made it past
> tomcat and into SOLR) but  haven't quite succeeded yet.
>
> Server trace:
> Mar 19, 2012 3:31:35 PM org.apache.solr.core.SolrCore execute
> INFO: [testcore1] webapp=/solr path=/update/javabin
> params={waitSearcher=true&commit=true&literal.id=testid1&waitFlush=true&wt=javabin&stream.file=C:\work\SolrC
> lient\data\justin2.txt&version=2} status=500 QTime=82

Is this justin2.txt file in the "javabin" format?  That's what you're
telling Solr by hitting the /update/javabin URL.

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


Re: Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?

2012-03-19 Thread vybe3142
Okay, I added the javabin handler snippet to the solrconfig.xml file
(actually shared across all cores).  I got further (the request made it past
tomcat and into SOLR) but  haven't quite succeeded yet.

Server trace:
Mar 19, 2012 3:31:35 PM org.apache.solr.core.SolrCore execute
INFO: [testcore1] webapp=/solr path=/update/javabin
params={waitSearcher=true&commit=true&literal.id=testid1&waitFlush=true&wt=javabin&stream.file=C:\work\SolrC
lient\data\justin2.txt&version=2} status=500 QTime=82
Mar 19, 2012 3:31:35 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: Invalid version (expected 2, but -17) or
the data in not in 'javabin' format
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:144)
at
org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:69)
at
org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:45)
at
org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:56)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)

=
SOLRJ client log:

Starting SOLR doc indexing client 2
Exception in thread "main" org.apache.solr.common.SolrException: Internal
Server Error

Internal Server Error

request: http://localhost:8080/solr/testcore1/update/javabin
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:432)


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-for-SOLR-SOLRJ-to-index-files-directly-bypassing-HTTP-streaming-tp3833419p3840290.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?

2012-03-19 Thread Erick Erickson
My guess is that this isn't defined in the solrconfig.xml file
for your testcore1/conf..

  


If you modeled your testcore1 after the solrconfig.xml files in the
example/multicore/core* directories, these are extremely simplified.
You might try copying the one from example/solr/conf and removing
stuff you don't need.


Best
Erick

On Mon, Mar 19, 2012 at 3:22 PM, vybe3142  wrote:
> Still No luck.Please help point out what I'm doing wrong. Neither the
> (commented out ) first approach (including the content with the request) nor
> the 2nd approach seem to work. Nothing seems to be acknowledged at the
> tomcat server either. I get the error:
>
>
> Starting SOLR doc indexing client 2
> Exception in thread "main" org.apache.solr.common.SolrException: Not Found
>
> Not Found
>
> request: http://localhost:8080/solr/testcore1/update/javabin
>        at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:432)
>        at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:246)
>        at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
>        at
> com.il.solrclient.SolrJClientIndexDocApp2.main(SolrJClientIndexDocApp2.java:41)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
>
> 
>
>
> public class SolrJClientIndexDocApp2 {
>    public static void main(String[] arg) throws Exception,
> SolrServerException {
>        System.out.println("Starting SOLR doc indexing client 2");
>        String url = "http://localhost:8080/solr/testcore1";;
>        CommonsHttpSolrServer server = new CommonsHttpSolrServer(url);
> //        ContentStreamUpdateRequest req = new
> ContentStreamUpdateRequest("/update/extract");
> //        req.addFile(new File("C:\\work\\SolrClient\\data\\justin2.txt"));
> //        //req.setParam(ExtractingParams.EXTRACT_ONLY, "true");
> //
> //       req.setParam("literal.id", "testid");
> //
> //        NamedList result = server.request(req);
> //        server.commit();
> //        System.out.println("Result: " + result);
>
>
>        server.setRequestWriter(new BinaryRequestWriter());
>        UpdateRequest request = new UpdateRequest();
>        request.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>        request.setParam("literal.id", "testid1");
>        request.setParam("stream.file",
> "C:\\work\\SolrClient\\data\\justin2.txt");
>        request.process(server);
>    }
>
>
> }
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Is-there-a-way-for-SOLR-SOLRJ-to-index-files-directly-bypassing-HTTP-streaming-tp3833419p3840068.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?

2012-03-19 Thread vybe3142
Still No luck.Please help point out what I'm doing wrong. Neither the
(commented out ) first approach (including the content with the request) nor
the 2nd approach seem to work. Nothing seems to be acknowledged at the
tomcat server either. I get the error: 


Starting SOLR doc indexing client 2
Exception in thread "main" org.apache.solr.common.SolrException: Not Found

Not Found

request: http://localhost:8080/solr/testcore1/update/javabin
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:432)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:246)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at
com.il.solrclient.SolrJClientIndexDocApp2.main(SolrJClientIndexDocApp2.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)




public class SolrJClientIndexDocApp2 {
public static void main(String[] arg) throws Exception,
SolrServerException {
System.out.println("Starting SOLR doc indexing client 2");
String url = "http://localhost:8080/solr/testcore1";;
CommonsHttpSolrServer server = new CommonsHttpSolrServer(url);
//ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest("/update/extract");
//req.addFile(new File("C:\\work\\SolrClient\\data\\justin2.txt"));
////req.setParam(ExtractingParams.EXTRACT_ONLY, "true");
//
//   req.setParam("literal.id", "testid");
//
//NamedList result = server.request(req);
//server.commit();
//System.out.println("Result: " + result);


server.setRequestWriter(new BinaryRequestWriter());
UpdateRequest request = new UpdateRequest();
request.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
request.setParam("literal.id", "testid1");
request.setParam("stream.file",
"C:\\work\\SolrClient\\data\\justin2.txt");
request.process(server);
}


}


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-for-SOLR-SOLRJ-to-index-files-directly-bypassing-HTTP-streaming-tp3833419p3840068.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?

2012-03-18 Thread vybe3142
I'm going to try the approach described here and see what happens

http://lucene.472066.n3.nabble.com/Fastest-way-to-use-solrj-td502659.html

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-for-SOLR-SOLRJ-to-index-files-directly-bypassing-HTTP-streaming-tp3833419p3838250.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?

2012-03-18 Thread vybe3142
Thanks much. I plan to try this tomorrow.

Can someone describe how to use remote streaming programmatically with
solrj. For example, see the basic clients described here:
http://androidyou.blogspot.com/2010/05/client-integration-with-solr-by-using.html
and observe that  the data is transferred in the http message (which I want
to avoid).


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-for-SOLR-SOLRJ-to-index-files-directly-bypassing-HTTP-streaming-tp3833419p3838238.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?

2012-03-17 Thread Mikhail Khludnev
Sure it does

http://my.safaribooksonline.com/book/web-development/9781847195883/indexing-data/ch03lvl1sec03#X2ludGVybmFsX0ZsYXNoUmVhZGVyP3htbGlkPTk3ODE4NDcxOTU4ODMvNjg=

On Sat, Mar 17, 2012 at 2:55 AM, vybe3142  wrote:

> Hi,
> Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP
> streaming.
>
> Use case:
> * Text Files to be indexed are on file server (A) (some potentially large -
> several 100 MB)
> * SOLRJ client is on server (B)
> * SOLR server is on server (C) running with dynamically created SOLR cores
>
> Looking at how ContentStreamUpdateRequest is typically used in SOLRJ, it
> looks like the files would be read from A to the client on B (across the
> wire) and then sent across the wire via an HTTP request (in the body) to C
> to be indexed.
>
> Is there a more efficient way to accomplish this i.e. pass a path to the
> file when making the request from B so that the SOLR server on C can read
> directly from file server A ?
>
> Thanks
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Is-there-a-way-for-SOLR-SOLRJ-to-index-files-directly-bypassing-HTTP-streaming-tp3833419p3833419.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics