How to index pdf's content with SolrJ?

2012-04-20 Thread vasuj

0
down vote
favorite
share [g+]
share [fb]
share [tw]
I'm trying to index a few pdf documents using SolrJ as described at
http://wiki.apache.org/solr/ContentStreamUpdateRequestExample, below there's
the code:

import static
org.apache.solr.handler.extraction.ExtractingParams.LITERALS_PREFIX;
import static
org.apache.solr.handler.extraction.ExtractingParams.MAP_PREFIX;
import static
org.apache.solr.handler.extraction.ExtractingParams.UNKNOWN_FIELD_PREFIX;

import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.client.solrj.request.AbstractUpdateRequest;
import org.apache.solr.client.solrj.request.ContentStreamUpdateRequest;
import org.apache.solr.common.util.NamedList;
...
public static void indexFilesSolrCell(String fileName) throws IOException,
SolrServerException {

  String urlString = "http://localhost:8080/solr";; 
  SolrServer server = new CommonsHttpSolrServer(urlString);

  ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");
  up.addFile(new File(fileName));
  String id = fileName.substring(fileName.lastIndexOf('/')+1);
  System.out.println(id);

  up.setParam(LITERALS_PREFIX + "id", id);
  up.setParam(LITERALS_PREFIX + "location", fileName); // this field doesn't
exists in schema.xml, it'll be created as attr_location
  up.setParam(UNKNOWN_FIELD_PREFIX, "attr_");
  up.setParam(MAP_PREFIX + "content", "attr_content");
  up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);

  NamedList request = server.request(up);
  for(Entry entry : request){
System.out.println(entry.getKey());
System.out.println(entry.getValue());
  }
}
Unfortunately when querying for *:* I get the list of indexed documents but
the content field is empty. How can I change the code above to extract also
the document's content?

Below there's the xml frament that describes this document:


  

  
  
/home/alex/Documents/lsp.pdf
  
  
stream_size
31203
Content-Type
application/pdf
  
  
31203
  
  
application/pdf
  
  lsp.pdf

I don't think that this problem is related to an incorrect installation of
Apache Tika, because previously I had a few ServerException but now I've
installed the required jars in the correct path. Moreover I've tried to
index a txt file using the same class but the attr_content field is always
empty.

Also tried In the schema.xml file, "stored= true" in the content field, 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-index-pdf-s-content-with-SolrJ-tp3927284p3927284.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to index pdf's content with SolrJ?

2012-04-21 Thread vasuj
Still i am not able to index my docs in solr

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-fails-on-server-request-up-tp3927284p3927749.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Indexing error in this function

2012-04-22 Thread vasuj
Solr Indexing error in this function. I am using Windows 8 x32, Xampp to
configure solr, tomcat. I have tried many other forums too but not helpful.
Even tried configuring many XML in Xampp/solr still could not get it
working. Any hints would be helpful. Here is my function for solr indexing
and the imports

import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.client.solrj.request.ContentStreamUpdateRequest;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.params.ModifiableSolrParams;
import org.apache.solr.client.solrj.request.AbstractUpdateRequest;
import org.apache.solr.client.solrj.SolrQuery;
`

public void GeoTagIndexToSolr(File f,String Latitude,String Longitude){
try {
String urlString = "http://localhost:8080/solr";; 
SolrServer server = new CommonsHttpSolrServer(urlString);
ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");
String fileName=f.toString();
up.addFile(new File(fileName));
up.setParam("Latitude", Latitude);
up.setParam("Longitude", Longitude);
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
server.request(up);
} catch (SolrServerException e) {
System.out.println("SolrServerException: "+e+"
");e.printStackTrace();
} catch (MalformedURLException e) {
System.out.println("MalformedURLException: "+e+"
");e.printStackTrace();
} catch (IOException e) {
System.out.println("IOException: "+e+" ");e.printStackTrace();
}   
}

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Indexing-error-in-this-function-tp3929446p3929446.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Indexing error in this function

2012-04-22 Thread vasuj
Log is :


Apr 22, 2012 2:55:17 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {add=[(null)]} 0 17
Apr 22, 2012 2:55:17 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: [doc=null] missing required
field: id
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:355)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:141)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:146)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:236)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:185)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:151)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:929)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:405)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:269)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:515)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:300)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

Apr 22, 2012 2:55:17 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update/extract
params={waitSearcher=true&commit=true&Latitude=51.9125&Longitude=179.5&wt=javabin&waitFlush=true&version=2}
status=400 QTime=17 
Apr 22, 2012 2:55:17 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {add=[(null)]} 0 24
Apr 22, 2012 2:55:17 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: [doc=null] missing required
field: id
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:355)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:141)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:146)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:236)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:185)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostVal

Re: Solr Indexing error in this function

2012-04-22 Thread vasuj
yes it worked. Thanks Gora. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Indexing-error-in-this-function-tp3929446p3929673.html
Sent from the Solr - User mailing list archive at Nabble.com.


'Error 404: missing core name in path' in Solr

2012-04-22 Thread vasuj
I http://lucene.472066.n3.nabble.com/file/n3931194/Screenshot_%2847%29.png 
used

//server.deleteByQuery( "*:*" );// CAUTION: deletes everything!
query in my solr indexing program. Since then i am receiving the error
whenever , i go to

http://localhost:8080/solr/admin/

and press search with query string :

The error is

HTTP Status 400 - Missing solr core name in path

type Status report

message Missing solr core name in path

description The request sent by the client was syntactically incorrect
(Missing solr core name in path).

Apache Tomcat/7.0.21

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-404-missing-core-name-in-path-in-Solr-tp3931194p3931194.html
Sent from the Solr - User mailing list archive at Nabble.com.