Unsupported ContentType: application/pdf Not in: [application/xml,​ text/csv,​ text/json,​ application/csv,​ application/javabin,​ text/xml,​ application/json]

2014-08-20 Thread Croci Francesco Luigi (ID SWS)
Hallo,

I have solr 4.9.0 and I’m getting the above error if I try to index a pdf 
document with the Solr Web-Interface.

Here is my schema and solrconfig. Do I miss something? :




   
   
   
   
   




   
   
   
   
   
   
   
   
   
   



   
   
   
   


fullText


id






LUCENE_45
















   
   deduplication
   



   
   true
   false
   false
   true
   ignored_
   link
   fullText
   
   deduplication
   



   
   false
   signatureField
   true
   content
   10
   .2
   solr.update.processor.TextProfileSignature
   
   
   



   

   



   explicit
   10


none


   *:*





Solr, weblogic managed server and log4j logging

2014-08-19 Thread Croci Francesco Luigi (ID SWS)
Maybe some of you uses Solr with Weblogic and can help me...

I have weblogic 12.1.3 and would like to deploy/run solr on a managed server.

I started the node manager, created a server named "server-solr" and deployed 
solr(4.7.9).
In the "server start" tab of the server configuration I added 
C:\lib\wllog4j.jar;C:\lib\log4j-1.2.16.jar in the Class Path and 
-Dlog4j.configuration=C:\download\log4j.properties 
-Dweblogic.log.Log4jLoggingEnabled=true in the Arguments

When I try to start the server I get the following error:




RE: Show the score in the search result

2014-04-17 Thread Croci Francesco Luigi (ID SWS)
I think you mean this row:

* ,fullText: ...

Ok, but what I understood is that the "*" means that ALL the fields are 
displayed anyway. Or not?

Francesco

-Original Message-
From: Stefan Matheis [mailto:matheis.ste...@gmail.com] 
Sent: Donnerstag, 17. April 2014 10:04
To: solr-user@lucene.apache.org
Subject: Re: Show the score in the search result

That's exactly what Jack mentioned, you're defining an invariant for fl, which 
ignores everything you provide at runtime.  

From http://wiki.apache.org/solr/SearchHandler#Configuration

"invariants - provides param values that will be used in spite of any values 
provided at request time. They are a way of letting the Solr maintainer lock 
down the options available to Solr clients."

-Stefan  





RE: Show the score in the search result

2014-04-17 Thread Croci Francesco Luigi (ID SWS)
Hello Chris:

trying to execute 
http://localhost:7001/solr/collection1/select?q=*%3A*&rows=1&fl=score&wt=json&indent=true&echoParams=true

I get 

{
  "error": {
"msg": "Invalid value 'true' for echoParams parameter, use 'EXPLICIT' or 
'ALL'",
"code": 400
  }
}

With echoParams=ALL:

{
  "responseHeader": {
"status": 0,
"QTime": 0,
"params": {
  "defType": "edismax",
  "echoParams": "ALL",
  "fl": "*,fullText:fullText",
  "indent": "true",
  "q": "*:*",
  "_": "1397719590902",
  "wt": "json",
  "rows": "1",
  "uf": "* -fullText_*",
  "f.all.qf": "rmDocumentTitle rmDocumentArt rmDocumentClass 
rmDocumentSubclass rmDocumentCatName rmDocumentCatNameEn fullText",
  "fq": "* -language:en -language:de"
}
  },
  "response": {
"numFound": 842,
"start": 0,
"docs": [
  {
"rmDocumentTitle": [
  "Ersterfassung"
],
"rmDocumentClass": [
  "Einführung Records Management"
],
"rmDocumentSubclass": [
  "Einführung Records Management"
],
"id": "aabziwlc4hkvgojtzyb4wbebqr4m3",
"rmDocumentArt": [
  "Ersterfassung"
],
"fullText": [
  " \n \n  \n  \n  \n  \n  \n \n  "
],
"signatureField": "d41d8cd98f00b204e9800998ecf8427e"
  }
]
  }
}

I adapted the sample on "Instant Apache Solr for Indexing Data How-to" Chapter: 
Indexing multiple languages(advanced)


here is the schema:









 
 

 

 

















 
 

 

 
















 
 

 

 






























fullText


id




Here the solrconfig:



LUCENE_45
















deduplication





true
false
false
true
true
ignored_
link
fullText

deduplication





false
signatureField
true
content
10
.2
solr.update.processor.TextProfileSignature




  

fullText
en,de
en
language
true
false

  
  




edismax


* -language:en -language:de
 

RE: Show the score in the search result

2014-04-16 Thread Croci Francesco Luigi (ID SWS)
Hello Jack,

I know it's not the best example, but I just wanted to see the score field 
"printed out"... :)

Francesco

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Mittwoch, 16. April 2014 14:32
To: solr-user@lucene.apache.org
Subject: Re: Show the score in the search result

Also, "*:*" is a constant score query, so the score will always be 1.0. Not a 
terribly good example to request the score.

Please provide the Solr query response, with the debug=true parameter so we can 
see for ourselves that no score is returned.

-- Jack Krupansky

-Original Message-
From: Erick Erickson
Sent: Wednesday, April 16, 2014 8:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Show the score in the search result

What version of Solr? Works fine for me.

Best,
Erick

On Wed, Apr 16, 2014 at 6:38 AM, Croci  Francesco Luigi (ID SWS) 
 wrote:
> I read that if I add the string "score" in the fl field, I should be 
> able to see the score within the retuned documents.
>
> As I understand "score" is a "special/reserved" word and I don't have 
> to define in the schema (right)?
>
> I did so, but in the returned fields' list I see no score field...
>
> Here is the request's URL: 
> http://localhost:7001/solr/collection1/select?q=*%3A*&fl=*%2Cscore&wt=
> json&indent=true
>
> Do I miss something?
>
> Francesco



RE: Show the score in the search result

2014-04-16 Thread Croci Francesco Luigi (ID SWS)
: 0,
  "prepare": {
"time": 0,
"query": {
  "time": 0
},
"facet": {
  "time": 0
},
"mlt": {
  "time": 0
},
"highlight": {
  "time": 0
},
    "stats": {
  "time": 0
},
"debug": {
  "time": 0
}
  },
  "process": {
"time": 0,
"query": {
  "time": 0
},
"facet": {
  "time": 0
},
"mlt": {
  "time": 0
},
"highlight": {
  "time": 0
},
"stats": {
  "time": 0
},
"debug": {
  "time": 0
}
  }
}
  }
}

Francesco



-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Mittwoch, 16. April 2014 14:32
To: solr-user@lucene.apache.org
Subject: Re: Show the score in the search result

Also, "*:*" is a constant score query, so the score will always be 1.0. Not a 
terribly good example to request the score.

Please provide the Solr query response, with the debug=true parameter so we can 
see for ourselves that no score is returned.

-- Jack Krupansky

-Original Message-
From: Erick Erickson
Sent: Wednesday, April 16, 2014 8:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Show the score in the search result

What version of Solr? Works fine for me.

Best,
Erick

On Wed, Apr 16, 2014 at 6:38 AM, Croci  Francesco Luigi (ID SWS) 
 wrote:
> I read that if I add the string "score" in the fl field, I should be 
> able to see the score within the retuned documents.
>
> As I understand "score" is a "special/reserved" word and I don't have 
> to define in the schema (right)?
>
> I did so, but in the returned fields' list I see no score field...
>
> Here is the request's URL: 
> http://localhost:7001/solr/collection1/select?q=*%3A*&fl=*%2Cscore&wt=
> json&indent=true
>
> Do I miss something?
>
> Francesco



RE: Show the score in the search result

2014-04-16 Thread Croci Francesco Luigi (ID SWS)
Hello Erik,

Solr 4.7.1

Francesco

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Mittwoch, 16. April 2014 14:01
To: solr-user@lucene.apache.org
Subject: Re: Show the score in the search result

What version of Solr? Works fine for me.

Best,
Erick

On Wed, Apr 16, 2014 at 6:38 AM, Croci  Francesco Luigi (ID SWS) 
 wrote:
> I read that if I add the string "score" in the fl field, I should be able to 
> see the score within the retuned documents.
>
> As I understand "score" is a "special/reserved" word and I don't have to 
> define in the schema (right)?
>
> I did so, but in the returned fields' list I see no score field...
>
> Here is the request's URL: 
> http://localhost:7001/solr/collection1/select?q=*%3A*&fl=*%2Cscore&wt=
> json&indent=true
>
> Do I miss something?
>
> Francesco


Show the score in the search result

2014-04-16 Thread Croci Francesco Luigi (ID SWS)
I read that if I add the string "score" in the fl field, I should be able to 
see the score within the retuned documents.

As I understand "score" is a "special/reserved" word and I don't have to define 
in the schema (right)?

I did so, but in the returned fields' list I see no score field...

Here is the request's URL: 
http://localhost:7001/solr/collection1/select?q=*%3A*&fl=*%2Cscore&wt=json&indent=true

Do I miss something?

Francesco


Search a list of words and returned order

2014-04-11 Thread Croci Francesco Luigi (ID SWS)
When I search  for a list of words, per default Solr uses the OR operator.

In my case I index (pdfs) files. How/what can I do so that when I search the 
index for a list of words, I get the list of documents ordered first by  the 
ones that have all the words in them?

Thank you
Francesco


RE: Query and field name with wildcard

2014-04-07 Thread Croci Francesco Luigi (ID SWS)
Sorry, found the problem myself...

I used the /select where the edismax was not defined. 
The other two, /selectEN and  /selectDE, worked.

Adding the edismax to the /select made it work too.

Ciao
Francesco

-Original Message-
From: Croci Francesco Luigi (ID SWS) [mailto:fcr...@id.ethz.ch] 
Sent: Montag, 7. April 2014 11:20
To: solr-user@lucene.apache.org
Subject: RE: Query and field name with wildcard

Hello Alex,

I saw your example and took it as template for my needs.

I tried with the aliasing, but, maybe because I did it wrong, it does not 
work...

"error": {
"msg": "undefined field all",
"code": 400
  }

Here is a snippet of my solrconfig.xml:

...


explicit


rmDocumentTitle rmDocumentArt 
rmDocumentClass rmDocumentSubclass rmDocumentCatName rmDocumentCatNameEn 
fullText




  
edismax
fullText_en
full_Text
json
true
  
  
language:en
fullText_en
rmDocumentTitle rmDocumentArt 
rmDocumentClass rmDocumentSubclass rmDocumentCatName rmDocumentCatNameEn 
fullText_en
* -fullText_*
*,fullText:fullText_en
  



  
edismax
fullText_de
full_Text
json
true
  
  
language:de
fullText_de
rmDocumentTitle rmDocumentArt 
rmDocumentClass rmDocumentSubclass rmDocumentCatName rmDocumentCatNameEn 
fullText_de
* -fullText_*
*,fullText:fullText_de
  

...

What am I missing/ doing wrong?


Regards,
Francesco

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Sent: Freitag, 4. April 2014 11:08
To: solr-user@lucene.apache.org
Subject: Re: Query and field name with wildcard

Are you using eDisMax. That gives a lot of options, including field aliasing, 
including a single name to multiple fields:
http://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2F_renaming
(with example on p77 of my book
http://www.packtpub.com/apache-solr-for-indexing-data/book :-)

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/ Current project: 
http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, Apr 4, 2014 at 3:52 PM, Croci  Francesco Luigi (ID SWS) 
 wrote:
> In my index I have some fields which have the same prefix(rmDocumentTitle, 
> rmDocumentClass, rmDocumentSubclass, rmDocumentArt). Apparently it is not 
> possible to specify a query like this:
>
> q = rm* : some_word
>
> Is there a way to do this without having to write a long list of ORs?
>
> Another question is if it is really not possible to search a word over 
> the entire index. Something like this: q = * : some_word
>
> Thank you
> Francesco


RE: Query and field name with wildcard

2014-04-07 Thread Croci Francesco Luigi (ID SWS)
Hello Alex,

I saw your example and took it as template for my needs.

I tried with the aliasing, but, maybe because I did it wrong, it does not 
work...

"error": {
"msg": "undefined field all",
"code": 400
  }

Here is a snippet of my solrconfig.xml:

...


explicit


rmDocumentTitle rmDocumentArt 
rmDocumentClass rmDocumentSubclass rmDocumentCatName rmDocumentCatNameEn 
fullText




  
edismax
fullText_en
full_Text
json
true
  
  
language:en
fullText_en
rmDocumentTitle rmDocumentArt 
rmDocumentClass rmDocumentSubclass rmDocumentCatName rmDocumentCatNameEn 
fullText_en
* -fullText_*
*,fullText:fullText_en
  



  
edismax
fullText_de
full_Text
json
true
  
  
language:de
fullText_de
rmDocumentTitle rmDocumentArt 
rmDocumentClass rmDocumentSubclass rmDocumentCatName rmDocumentCatNameEn 
fullText_de
* -fullText_*
*,fullText:fullText_de
  

...

What am I missing/ doing wrong?


Regards,
Francesco

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Freitag, 4. April 2014 11:08
To: solr-user@lucene.apache.org
Subject: Re: Query and field name with wildcard

Are you using eDisMax. That gives a lot of options, including field aliasing, 
including a single name to multiple fields:
http://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2F_renaming
(with example on p77 of my book
http://www.packtpub.com/apache-solr-for-indexing-data/book :-)

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/ Current project: 
http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, Apr 4, 2014 at 3:52 PM, Croci  Francesco Luigi (ID SWS) 
 wrote:
> In my index I have some fields which have the same prefix(rmDocumentTitle, 
> rmDocumentClass, rmDocumentSubclass, rmDocumentArt). Apparently it is not 
> possible to specify a query like this:
>
> q = rm* : some_word
>
> Is there a way to do this without having to write a long list of ORs?
>
> Another question is if it is really not possible to search a word over 
> the entire index. Something like this: q = * : some_word
>
> Thank you
> Francesco


Query and field name with wildcard

2014-04-04 Thread Croci Francesco Luigi (ID SWS)
In my index I have some fields which have the same prefix(rmDocumentTitle, 
rmDocumentClass, rmDocumentSubclass, rmDocumentArt). Apparently it is not 
possible to specify a query like this:

q = rm* : some_word

Is there a way to do this without having to write a long list of ORs?

Another question is if it is really not possible to search a word over the 
entire index. Something like this: q = * : some_word

Thank you
Francesco


How to index only the pdf content/text

2014-03-25 Thread Croci Francesco Luigi (ID SWS)
I searched a way to index only the content/text part of a PDF (without all the 
other fields Tika creates) and I found the "solution" with the "uprefix" = 
ignored_ and .

The problem is, that uprefix works on fields that are not specified in the 
schema. In my schema I specified two fields (id and rmDocumentTitle) and this 
two fields are added to the content too (what I will avoid).

How can I exclude this two fields to be added to the fullText?

Here are my config files:

schema.xml



   
   
   
   
   





   
   
   
   
   
   
   
   
   
   
   
   
   
   



   
   
   
   
   


fullText


id



solrconfig.xml


...

   
   true
   false
   false
   true
   true
   ignored_
   link
   fullText
   
   deduplication
   



   
   false
   signatureField
   true
   content
   10
   .2
   solr.update.processor.TextProfileSignature
   
   
   




none


   *:*




Thank you for any help.
Francesco


analyzer with multiple stem-filters for more languages

2014-03-14 Thread Croci Francesco Luigi (ID SWS)
It is possible to define an analyzer with more than one Stem-filter for more 
languages?

Something like this:


...
  (default for english)



Greetings
Francesco


RE: Problem adding fields when indexing a pdf (add-on)

2014-03-13 Thread Croci Francesco Luigi (ID SWS)
Ok. Maybe I found the problem:

in the solrconfig.xml I have true

I set it to false and now rmDocumentTitle is there too...

Regards
Francesco

-Original Message-
From: Croci Francesco Luigi (ID SWS) [mailto:fcr...@id.ethz.ch] 
Sent: Donnerstag, 13. März 2014 14:39
To: solr-user@lucene.apache.org
Subject: RE: Problem adding fields when indexing a pdf (add-on)

Yes, in my test class I always do server.deleteByQuery("*:*", 5); at first.

As you can see I have fullText and signatureField defined. And they are there.
The only difference is that they are not manually set.
Can it be, that if you use the literal.* parameter you have to use lowercase?

Regards
Francesco

-Original Message-
From: Gora Mohanty [mailto:g...@mimirtech.com] 
Sent: Donnerstag, 13. März 2014 14:35
To: solr-user@lucene.apache.org
Subject: Re: Problem adding fields when indexing a pdf (add-on)

On 13 March 2014 18:33, Croci  Francesco Luigi (ID SWS)  
wrote:
> Ok, I renamed the filed " rmDocumentTitle" to " rmdocumenttitle" and now the 
> field is there!
>
> Is there some naming rules for the field's names? No uppercase?

No. We have used mixed-case names in the past.

Are you sure that you reindexed the first time before checking?

Regards,
Gora


RE: Problem adding fields when indexing a pdf (add-on)

2014-03-13 Thread Croci Francesco Luigi (ID SWS)
Yes, in my test class I always do server.deleteByQuery("*:*", 5); at first.

As you can see I have fullText and signatureField defined. And they are there.
The only difference is that they are not manually set.
Can it be, that if you use the literal.* parameter you have to use lowercase?

Regards
Francesco

-Original Message-
From: Gora Mohanty [mailto:g...@mimirtech.com] 
Sent: Donnerstag, 13. März 2014 14:35
To: solr-user@lucene.apache.org
Subject: Re: Problem adding fields when indexing a pdf (add-on)

On 13 March 2014 18:33, Croci  Francesco Luigi (ID SWS)  
wrote:
> Ok, I renamed the filed " rmDocumentTitle" to " rmdocumenttitle" and now the 
> field is there!
>
> Is there some naming rules for the field's names? No uppercase?

No. We have used mixed-case names in the past.

Are you sure that you reindexed the first time before checking?

Regards,
Gora


RE: Problem adding fields when indexing a pdf (add-on)

2014-03-13 Thread Croci Francesco Luigi (ID SWS)
Ok, I renamed the filed " rmDocumentTitle" to " rmdocumenttitle" and now the 
field is there!

Is there some naming rules for the field's names? No uppercase?

Greetings
Francesco

-Original Message-----
From: Croci Francesco Luigi (ID SWS) [mailto:fcr...@id.ethz.ch] 
Sent: Donnerstag, 13. März 2014 13:57
To: solr-user@lucene.apache.org
Subject: Problem adding fields when indexing a pdf (add-on)

I tried to define a new field "test" in the schema () and added 
req.setParam("literal.test", "test title"); in the code.

The field (test) is there O_O.

Can someone explain me the difference? Why rmDocumentTitle is not there while 
test is?

Ciao
Francesco




Problem adding fields when indexing a pdf (add-on)

2014-03-13 Thread Croci Francesco Luigi (ID SWS)
I tried to define a new field "test" in the schema () and added 
req.setParam("literal.test", "test title"); in the code.

The field (test) is there O_O.

Can someone explain me the difference? Why rmDocumentTitle is not there while 
test is?

Ciao
Francesco




Problem adding fields when indexing a pdf

2014-03-13 Thread Croci Francesco Luigi (ID SWS)
When I index a pdf I would like to "manually" add the document's title in a 
filed named rmDocumentTitle.

I defined the filed in the schema.xml, but when I query Solr I see that the 
field was not created...

Do I make something wrong?

Below the code snippet, schema and solrconfig.xml

Thank you for any hint
Francesco

...
ContentStreamUpdateRequest req = new 
ContentStreamUpdateRequest("/update/extract");
req.addContentStream(contentStream);

req.setParam("literal.id", file.getName().substring(0, 
file.getName().indexOf('.')));
req.setParam("literal.rmDocumentTitle", "test title");
req.setParam("uprefix", "ignored_");

req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);

NamedList result = server.request(req);
...


schema.xml




   
   
   
   
   
   




   
   
   



   
   
   
   
   


fullText


id



solrconfig.xml



LUCENE_45











   
   deduplication
   



   
   true
   true
   false
   true
   true
   ignored_
   link
   fullText
   
   deduplication
   



   
   false
   signatureField
   true
   content
   10
   .2
   solr.update.processor.TextProfileSignature
   
   
   




none


   *:*





RE: Many PDFs indexed but only one returned in te Solr-UI

2014-03-11 Thread Croci Francesco Luigi (ID SWS)
Hi Erik,

you were right...

I had the "signatureField" bound to the "uid" in the solrconfig.xml, so the uid 
was always the same.
Now I defined a new field for the "signatureField" and it works!

Before:
...


false
uid  <-
true
content
10
.2
solr.update.processor.TextProfileSignature



...


...






uid


After:
...


false
signatureField  
<-
true
content
10
.2
solr.update.processor.TextProfileSignature



...


...


  <--




uid


Greetings
Francesco

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Dienstag, 11. März 2014 12:46
To: solr-user@lucene.apache.org
Subject: Re: Many PDFs indexed but only one returned in te Solr-UI

Hmmm, that looks OK to me. I'd log out
the id you assign for each document,
it's _possible_ that somehow you're
getting the same ID for all the files
except this line should be preventing that:
 doc.addField("id", document);

Tail the Solr log while you're doing this and see the update messages to insure 
that there are more than one. And I'm assuming that you've got more than one 
file in your directory.


BTW, doing the commit after every doc is generally poor practice in 
production.I know you're just testing now, but thought I'd mention it. Let 
autocommit handle most of it and (perhaps) commit once at the end.

Hmmm, silly question perhaps, but are you absolutely sure that you're querying 
the same core you're indexing to? On the same machine?
Sometimes as a sanity check I'll add, say, a timestamp to the id field (i.e.
doc.add("id", filename + timestamp) just to have something that changes every 
run.

Best
Erick

On Tue, Mar 11, 2014 at 6:00 AM, Croci  Francesco Luigi (ID SWS) 
 wrote:
> I followed the example here 
> (http://searchhub.org/2012/02/14/indexing-with-solrj/) for indexing all the 
> pdfs in a directory. The process seems to work well, but at the end, when I 
> go in the Solr-UI and click on "Execute query"(with q=*:*), I get only one 
> entry.
>
> Do I miss something in my code?
>
> ...
>
> String[] files = documentDir.list();
>
>
>
> if (files != null)
>
> {
>
>   for (String document : files)
>
>   {
>
> ContentHandler textHandler = new BodyContentHandler();
>
> Metadata metadata = new Metadata();
>
> ParseContext context = new ParseContext();
>
> AutoDetectParser autoDetectParser = new AutoDetectParser();
>
>
>
> InputStream inputStream = null;
>
>
>
> try
>
> {
>
>   inputStream = new FileInputStream(new File(documentDir, 
> document));
>
>
>
>   autoDetectParser.parse(inputStream, textHandler, metadata, 
> context);
>
>
>
>   SolrInputDocument doc = new SolrInputDocument();
>
>   doc.addField("id", document);
>
>
>
>   String content = textHandler.toString();
>
>
>
>   if (content != null)
>
>   {
>
> doc.addField("fullText", content);
>
>   }
>
>
>
>   UpdateResponse resp = server.add(doc, 1);
>
>
>
>   server.commit(true, true, true);
>
>
>
>   if (resp.getStatus() != 0)
>
>   {
>
> throw new IDSystemException(LOG, "Document could not be 
> indexed. Status returned: " + resp.getStatus());
>
>   }
>
> }
>
> catch (FileNotFoundException fnfe)
>
> {
>
>   throw new IDSystemException(LOG, fnfe.getMessage(), fnfe);
>
> }
>
> catch (IOException ioe)
>
> {
>
>   throw new IDSystemException(LOG, ioe.getMessage(), ioe);
>
> }
>
> catch (SAXException se)
>
> {
>
>   throw new IDSystemException(LOG, se.getMessage(), se);
>
> }
>
> catch (TikaException te)
>
> {
>
>   throw new IDSystemException(LOG, te.getMessage(), te);
>
> }
>
> catch (SolrServerException sse)
>
> {
>
>   throw new IDSystemException(LOG, sse.getMessage(), sse);
>
> }
>
> finally
>
> {
>
>   if (inputStream != null)
>
>   {
>
> try
>
> {
>
>   inputStream.close();
>
> }
>
> catch (IOException ioe)
>
> {
>
>   throw new IDSystemException(LOG, ioe.getMessage(), ioe);
>
> }
>
>   }
>
> }
>
>...
>
> Thank you for any hint.
>
> Francesco


FW: Files locked after indexing

2014-03-11 Thread Croci Francesco Luigi (ID SWS)
Hi to all,

I'm pretty new with solr and tika and I have a problem.

I have the following workflow in my (web)application:

  *   download a pdf file from an archive
  *   index the file
  *   delete the file


My problem is that after indexing the file, it remains locked and the 
delete-part throws an exception.

Here is my code-snippet for indexing the file:

try
{
   ContentStreamUpdateRequest req = new 
ContentStreamUpdateRequest("/update/extract");
   req.addFile(file, type);
   req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);

   NamedList result = server.request(req);

   Assert.assertEquals(0, ((NamedList) 
result.get("responseHeader")).get("status"));
}

I also tried the "ContentStream" way but without success:
ContentStream contentStream = null;

try
{
  contentStream = new ContentStreamBase.FileStream(document);

  ContentStreamUpdateRequest req = new 
ContentStreamUpdateRequest(UPDATE_EXTRACT_REQUEST);
  req.addContentStream(contentStream);
  req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);

  NamedList result = server.request(req);

  if (!((NamedList) 
result.get("responseHeader")).get("status").equals(0))
  {
throw new IDSystemException(LOG, "Document could not be indexed. Status 
returned: " +
 ((NamedList) 
result.get("responseHeader")).get("status"));
  }
}
   catch...
   finally
{
  try
  {
if(contentStream != null && contentStream.getStream() != null)
{
  contentStream.getStream().close();
}
  }
  catch (IOException ioe)
  {
throw new IDSystemException(LOG, ioe.getMessage(), ioe);
  }
}


Do I miss something?

Thank you
Francesco



Many PDFs indexed but only one returned in te Solr-UI

2014-03-11 Thread Croci Francesco Luigi (ID SWS)
I followed the example here 
(http://searchhub.org/2012/02/14/indexing-with-solrj/) for indexing all the 
pdfs in a directory. The process seems to work well, but at the end, when I go 
in the Solr-UI and click on "Execute query"(with q=*:*), I get only one entry.

Do I miss something in my code?

...

String[] files = documentDir.list();



if (files != null)

{

  for (String document : files)

  {

ContentHandler textHandler = new BodyContentHandler();

Metadata metadata = new Metadata();

ParseContext context = new ParseContext();

AutoDetectParser autoDetectParser = new AutoDetectParser();



InputStream inputStream = null;



try

{

  inputStream = new FileInputStream(new File(documentDir, document));



  autoDetectParser.parse(inputStream, textHandler, metadata, context);



  SolrInputDocument doc = new SolrInputDocument();

  doc.addField("id", document);



  String content = textHandler.toString();



  if (content != null)

  {

doc.addField("fullText", content);

  }



  UpdateResponse resp = server.add(doc, 1);



  server.commit(true, true, true);



  if (resp.getStatus() != 0)

  {

throw new IDSystemException(LOG, "Document could not be indexed. 
Status returned: " + resp.getStatus());

  }

}

catch (FileNotFoundException fnfe)

{

  throw new IDSystemException(LOG, fnfe.getMessage(), fnfe);

}

catch (IOException ioe)

{

  throw new IDSystemException(LOG, ioe.getMessage(), ioe);

}

catch (SAXException se)

{

  throw new IDSystemException(LOG, se.getMessage(), se);

}

catch (TikaException te)

{

  throw new IDSystemException(LOG, te.getMessage(), te);

}

catch (SolrServerException sse)

{

  throw new IDSystemException(LOG, sse.getMessage(), sse);

}

finally

{

  if (inputStream != null)

  {

try

{

  inputStream.close();

}

catch (IOException ioe)

{

  throw new IDSystemException(LOG, ioe.getMessage(), ioe);

}

  }

}

   ...

Thank you for any hint.

Francesco