Unsupported ContentType: application/pdf Not in: [application/xml, text/csv, text/json, application/csv, application/javabin, text/xml, application/json]
Hallo, I have solr 4.9.0 and I’m getting the above error if I try to index a pdf document with the Solr Web-Interface. Here is my schema and solrconfig. Do I miss something? : fullText id LUCENE_45 deduplication true false false true ignored_ link fullText deduplication false signatureField true content 10 .2 solr.update.processor.TextProfileSignature explicit 10 none *:*
Solr, weblogic managed server and log4j logging
Maybe some of you uses Solr with Weblogic and can help me... I have weblogic 12.1.3 and would like to deploy/run solr on a managed server. I started the node manager, created a server named "server-solr" and deployed solr(4.7.9). In the "server start" tab of the server configuration I added C:\lib\wllog4j.jar;C:\lib\log4j-1.2.16.jar in the Class Path and -Dlog4j.configuration=C:\download\log4j.properties -Dweblogic.log.Log4jLoggingEnabled=true in the Arguments When I try to start the server I get the following error:
RE: Show the score in the search result
I think you mean this row: * ,fullText: ... Ok, but what I understood is that the "*" means that ALL the fields are displayed anyway. Or not? Francesco -Original Message- From: Stefan Matheis [mailto:matheis.ste...@gmail.com] Sent: Donnerstag, 17. April 2014 10:04 To: solr-user@lucene.apache.org Subject: Re: Show the score in the search result That's exactly what Jack mentioned, you're defining an invariant for fl, which ignores everything you provide at runtime. From http://wiki.apache.org/solr/SearchHandler#Configuration "invariants - provides param values that will be used in spite of any values provided at request time. They are a way of letting the Solr maintainer lock down the options available to Solr clients." -Stefan
RE: Show the score in the search result
Hello Chris: trying to execute http://localhost:7001/solr/collection1/select?q=*%3A*&rows=1&fl=score&wt=json&indent=true&echoParams=true I get { "error": { "msg": "Invalid value 'true' for echoParams parameter, use 'EXPLICIT' or 'ALL'", "code": 400 } } With echoParams=ALL: { "responseHeader": { "status": 0, "QTime": 0, "params": { "defType": "edismax", "echoParams": "ALL", "fl": "*,fullText:fullText", "indent": "true", "q": "*:*", "_": "1397719590902", "wt": "json", "rows": "1", "uf": "* -fullText_*", "f.all.qf": "rmDocumentTitle rmDocumentArt rmDocumentClass rmDocumentSubclass rmDocumentCatName rmDocumentCatNameEn fullText", "fq": "* -language:en -language:de" } }, "response": { "numFound": 842, "start": 0, "docs": [ { "rmDocumentTitle": [ "Ersterfassung" ], "rmDocumentClass": [ "Einführung Records Management" ], "rmDocumentSubclass": [ "Einführung Records Management" ], "id": "aabziwlc4hkvgojtzyb4wbebqr4m3", "rmDocumentArt": [ "Ersterfassung" ], "fullText": [ " \n \n \n \n \n \n \n \n " ], "signatureField": "d41d8cd98f00b204e9800998ecf8427e" } ] } } I adapted the sample on "Instant Apache Solr for Indexing Data How-to" Chapter: Indexing multiple languages(advanced) here is the schema: fullText id Here the solrconfig: LUCENE_45 deduplication true false false true true ignored_ link fullText deduplication false signatureField true content 10 .2 solr.update.processor.TextProfileSignature fullText en,de en language true false edismax * -language:en -language:de
RE: Show the score in the search result
Hello Jack, I know it's not the best example, but I just wanted to see the score field "printed out"... :) Francesco -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Mittwoch, 16. April 2014 14:32 To: solr-user@lucene.apache.org Subject: Re: Show the score in the search result Also, "*:*" is a constant score query, so the score will always be 1.0. Not a terribly good example to request the score. Please provide the Solr query response, with the debug=true parameter so we can see for ourselves that no score is returned. -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Wednesday, April 16, 2014 8:00 AM To: solr-user@lucene.apache.org Subject: Re: Show the score in the search result What version of Solr? Works fine for me. Best, Erick On Wed, Apr 16, 2014 at 6:38 AM, Croci Francesco Luigi (ID SWS) wrote: > I read that if I add the string "score" in the fl field, I should be > able to see the score within the retuned documents. > > As I understand "score" is a "special/reserved" word and I don't have > to define in the schema (right)? > > I did so, but in the returned fields' list I see no score field... > > Here is the request's URL: > http://localhost:7001/solr/collection1/select?q=*%3A*&fl=*%2Cscore&wt= > json&indent=true > > Do I miss something? > > Francesco
RE: Show the score in the search result
: 0, "prepare": { "time": 0, "query": { "time": 0 }, "facet": { "time": 0 }, "mlt": { "time": 0 }, "highlight": { "time": 0 }, "stats": { "time": 0 }, "debug": { "time": 0 } }, "process": { "time": 0, "query": { "time": 0 }, "facet": { "time": 0 }, "mlt": { "time": 0 }, "highlight": { "time": 0 }, "stats": { "time": 0 }, "debug": { "time": 0 } } } } } Francesco -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Mittwoch, 16. April 2014 14:32 To: solr-user@lucene.apache.org Subject: Re: Show the score in the search result Also, "*:*" is a constant score query, so the score will always be 1.0. Not a terribly good example to request the score. Please provide the Solr query response, with the debug=true parameter so we can see for ourselves that no score is returned. -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Wednesday, April 16, 2014 8:00 AM To: solr-user@lucene.apache.org Subject: Re: Show the score in the search result What version of Solr? Works fine for me. Best, Erick On Wed, Apr 16, 2014 at 6:38 AM, Croci Francesco Luigi (ID SWS) wrote: > I read that if I add the string "score" in the fl field, I should be > able to see the score within the retuned documents. > > As I understand "score" is a "special/reserved" word and I don't have > to define in the schema (right)? > > I did so, but in the returned fields' list I see no score field... > > Here is the request's URL: > http://localhost:7001/solr/collection1/select?q=*%3A*&fl=*%2Cscore&wt= > json&indent=true > > Do I miss something? > > Francesco
RE: Show the score in the search result
Hello Erik, Solr 4.7.1 Francesco -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Mittwoch, 16. April 2014 14:01 To: solr-user@lucene.apache.org Subject: Re: Show the score in the search result What version of Solr? Works fine for me. Best, Erick On Wed, Apr 16, 2014 at 6:38 AM, Croci Francesco Luigi (ID SWS) wrote: > I read that if I add the string "score" in the fl field, I should be able to > see the score within the retuned documents. > > As I understand "score" is a "special/reserved" word and I don't have to > define in the schema (right)? > > I did so, but in the returned fields' list I see no score field... > > Here is the request's URL: > http://localhost:7001/solr/collection1/select?q=*%3A*&fl=*%2Cscore&wt= > json&indent=true > > Do I miss something? > > Francesco
Show the score in the search result
I read that if I add the string "score" in the fl field, I should be able to see the score within the retuned documents. As I understand "score" is a "special/reserved" word and I don't have to define in the schema (right)? I did so, but in the returned fields' list I see no score field... Here is the request's URL: http://localhost:7001/solr/collection1/select?q=*%3A*&fl=*%2Cscore&wt=json&indent=true Do I miss something? Francesco
Search a list of words and returned order
When I search for a list of words, per default Solr uses the OR operator. In my case I index (pdfs) files. How/what can I do so that when I search the index for a list of words, I get the list of documents ordered first by the ones that have all the words in them? Thank you Francesco
RE: Query and field name with wildcard
Sorry, found the problem myself... I used the /select where the edismax was not defined. The other two, /selectEN and /selectDE, worked. Adding the edismax to the /select made it work too. Ciao Francesco -Original Message- From: Croci Francesco Luigi (ID SWS) [mailto:fcr...@id.ethz.ch] Sent: Montag, 7. April 2014 11:20 To: solr-user@lucene.apache.org Subject: RE: Query and field name with wildcard Hello Alex, I saw your example and took it as template for my needs. I tried with the aliasing, but, maybe because I did it wrong, it does not work... "error": { "msg": "undefined field all", "code": 400 } Here is a snippet of my solrconfig.xml: ... explicit rmDocumentTitle rmDocumentArt rmDocumentClass rmDocumentSubclass rmDocumentCatName rmDocumentCatNameEn fullText edismax fullText_en full_Text json true language:en fullText_en rmDocumentTitle rmDocumentArt rmDocumentClass rmDocumentSubclass rmDocumentCatName rmDocumentCatNameEn fullText_en * -fullText_* *,fullText:fullText_en edismax fullText_de full_Text json true language:de fullText_de rmDocumentTitle rmDocumentArt rmDocumentClass rmDocumentSubclass rmDocumentCatName rmDocumentCatNameEn fullText_de * -fullText_* *,fullText:fullText_de ... What am I missing/ doing wrong? Regards, Francesco -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Freitag, 4. April 2014 11:08 To: solr-user@lucene.apache.org Subject: Re: Query and field name with wildcard Are you using eDisMax. That gives a lot of options, including field aliasing, including a single name to multiple fields: http://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2F_renaming (with example on p77 of my book http://www.packtpub.com/apache-solr-for-indexing-data/book :-) Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 4, 2014 at 3:52 PM, Croci Francesco Luigi (ID SWS) wrote: > In my index I have some fields which have the same prefix(rmDocumentTitle, > rmDocumentClass, rmDocumentSubclass, rmDocumentArt). Apparently it is not > possible to specify a query like this: > > q = rm* : some_word > > Is there a way to do this without having to write a long list of ORs? > > Another question is if it is really not possible to search a word over > the entire index. Something like this: q = * : some_word > > Thank you > Francesco
RE: Query and field name with wildcard
Hello Alex, I saw your example and took it as template for my needs. I tried with the aliasing, but, maybe because I did it wrong, it does not work... "error": { "msg": "undefined field all", "code": 400 } Here is a snippet of my solrconfig.xml: ... explicit rmDocumentTitle rmDocumentArt rmDocumentClass rmDocumentSubclass rmDocumentCatName rmDocumentCatNameEn fullText edismax fullText_en full_Text json true language:en fullText_en rmDocumentTitle rmDocumentArt rmDocumentClass rmDocumentSubclass rmDocumentCatName rmDocumentCatNameEn fullText_en * -fullText_* *,fullText:fullText_en edismax fullText_de full_Text json true language:de fullText_de rmDocumentTitle rmDocumentArt rmDocumentClass rmDocumentSubclass rmDocumentCatName rmDocumentCatNameEn fullText_de * -fullText_* *,fullText:fullText_de ... What am I missing/ doing wrong? Regards, Francesco -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Freitag, 4. April 2014 11:08 To: solr-user@lucene.apache.org Subject: Re: Query and field name with wildcard Are you using eDisMax. That gives a lot of options, including field aliasing, including a single name to multiple fields: http://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2F_renaming (with example on p77 of my book http://www.packtpub.com/apache-solr-for-indexing-data/book :-) Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 4, 2014 at 3:52 PM, Croci Francesco Luigi (ID SWS) wrote: > In my index I have some fields which have the same prefix(rmDocumentTitle, > rmDocumentClass, rmDocumentSubclass, rmDocumentArt). Apparently it is not > possible to specify a query like this: > > q = rm* : some_word > > Is there a way to do this without having to write a long list of ORs? > > Another question is if it is really not possible to search a word over > the entire index. Something like this: q = * : some_word > > Thank you > Francesco
Query and field name with wildcard
In my index I have some fields which have the same prefix(rmDocumentTitle, rmDocumentClass, rmDocumentSubclass, rmDocumentArt). Apparently it is not possible to specify a query like this: q = rm* : some_word Is there a way to do this without having to write a long list of ORs? Another question is if it is really not possible to search a word over the entire index. Something like this: q = * : some_word Thank you Francesco
How to index only the pdf content/text
I searched a way to index only the content/text part of a PDF (without all the other fields Tika creates) and I found the "solution" with the "uprefix" = ignored_ and . The problem is, that uprefix works on fields that are not specified in the schema. In my schema I specified two fields (id and rmDocumentTitle) and this two fields are added to the content too (what I will avoid). How can I exclude this two fields to be added to the fullText? Here are my config files: schema.xml fullText id solrconfig.xml ... true false false true true ignored_ link fullText deduplication false signatureField true content 10 .2 solr.update.processor.TextProfileSignature none *:* Thank you for any help. Francesco
analyzer with multiple stem-filters for more languages
It is possible to define an analyzer with more than one Stem-filter for more languages? Something like this: ... (default for english) Greetings Francesco
RE: Problem adding fields when indexing a pdf (add-on)
Ok. Maybe I found the problem: in the solrconfig.xml I have true I set it to false and now rmDocumentTitle is there too... Regards Francesco -Original Message- From: Croci Francesco Luigi (ID SWS) [mailto:fcr...@id.ethz.ch] Sent: Donnerstag, 13. März 2014 14:39 To: solr-user@lucene.apache.org Subject: RE: Problem adding fields when indexing a pdf (add-on) Yes, in my test class I always do server.deleteByQuery("*:*", 5); at first. As you can see I have fullText and signatureField defined. And they are there. The only difference is that they are not manually set. Can it be, that if you use the literal.* parameter you have to use lowercase? Regards Francesco -Original Message- From: Gora Mohanty [mailto:g...@mimirtech.com] Sent: Donnerstag, 13. März 2014 14:35 To: solr-user@lucene.apache.org Subject: Re: Problem adding fields when indexing a pdf (add-on) On 13 March 2014 18:33, Croci Francesco Luigi (ID SWS) wrote: > Ok, I renamed the filed " rmDocumentTitle" to " rmdocumenttitle" and now the > field is there! > > Is there some naming rules for the field's names? No uppercase? No. We have used mixed-case names in the past. Are you sure that you reindexed the first time before checking? Regards, Gora
RE: Problem adding fields when indexing a pdf (add-on)
Yes, in my test class I always do server.deleteByQuery("*:*", 5); at first. As you can see I have fullText and signatureField defined. And they are there. The only difference is that they are not manually set. Can it be, that if you use the literal.* parameter you have to use lowercase? Regards Francesco -Original Message- From: Gora Mohanty [mailto:g...@mimirtech.com] Sent: Donnerstag, 13. März 2014 14:35 To: solr-user@lucene.apache.org Subject: Re: Problem adding fields when indexing a pdf (add-on) On 13 March 2014 18:33, Croci Francesco Luigi (ID SWS) wrote: > Ok, I renamed the filed " rmDocumentTitle" to " rmdocumenttitle" and now the > field is there! > > Is there some naming rules for the field's names? No uppercase? No. We have used mixed-case names in the past. Are you sure that you reindexed the first time before checking? Regards, Gora
RE: Problem adding fields when indexing a pdf (add-on)
Ok, I renamed the filed " rmDocumentTitle" to " rmdocumenttitle" and now the field is there! Is there some naming rules for the field's names? No uppercase? Greetings Francesco -Original Message----- From: Croci Francesco Luigi (ID SWS) [mailto:fcr...@id.ethz.ch] Sent: Donnerstag, 13. März 2014 13:57 To: solr-user@lucene.apache.org Subject: Problem adding fields when indexing a pdf (add-on) I tried to define a new field "test" in the schema () and added req.setParam("literal.test", "test title"); in the code. The field (test) is there O_O. Can someone explain me the difference? Why rmDocumentTitle is not there while test is? Ciao Francesco
Problem adding fields when indexing a pdf (add-on)
I tried to define a new field "test" in the schema () and added req.setParam("literal.test", "test title"); in the code. The field (test) is there O_O. Can someone explain me the difference? Why rmDocumentTitle is not there while test is? Ciao Francesco
Problem adding fields when indexing a pdf
When I index a pdf I would like to "manually" add the document's title in a filed named rmDocumentTitle. I defined the filed in the schema.xml, but when I query Solr I see that the field was not created... Do I make something wrong? Below the code snippet, schema and solrconfig.xml Thank you for any hint Francesco ... ContentStreamUpdateRequest req = new ContentStreamUpdateRequest("/update/extract"); req.addContentStream(contentStream); req.setParam("literal.id", file.getName().substring(0, file.getName().indexOf('.'))); req.setParam("literal.rmDocumentTitle", "test title"); req.setParam("uprefix", "ignored_"); req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); NamedList result = server.request(req); ... schema.xml fullText id solrconfig.xml LUCENE_45 deduplication true true false true true ignored_ link fullText deduplication false signatureField true content 10 .2 solr.update.processor.TextProfileSignature none *:*
RE: Many PDFs indexed but only one returned in te Solr-UI
Hi Erik, you were right... I had the "signatureField" bound to the "uid" in the solrconfig.xml, so the uid was always the same. Now I defined a new field for the "signatureField" and it works! Before: ... false uid <- true content 10 .2 solr.update.processor.TextProfileSignature ... ... uid After: ... false signatureField <- true content 10 .2 solr.update.processor.TextProfileSignature ... ... <-- uid Greetings Francesco -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Dienstag, 11. März 2014 12:46 To: solr-user@lucene.apache.org Subject: Re: Many PDFs indexed but only one returned in te Solr-UI Hmmm, that looks OK to me. I'd log out the id you assign for each document, it's _possible_ that somehow you're getting the same ID for all the files except this line should be preventing that: doc.addField("id", document); Tail the Solr log while you're doing this and see the update messages to insure that there are more than one. And I'm assuming that you've got more than one file in your directory. BTW, doing the commit after every doc is generally poor practice in production.I know you're just testing now, but thought I'd mention it. Let autocommit handle most of it and (perhaps) commit once at the end. Hmmm, silly question perhaps, but are you absolutely sure that you're querying the same core you're indexing to? On the same machine? Sometimes as a sanity check I'll add, say, a timestamp to the id field (i.e. doc.add("id", filename + timestamp) just to have something that changes every run. Best Erick On Tue, Mar 11, 2014 at 6:00 AM, Croci Francesco Luigi (ID SWS) wrote: > I followed the example here > (http://searchhub.org/2012/02/14/indexing-with-solrj/) for indexing all the > pdfs in a directory. The process seems to work well, but at the end, when I > go in the Solr-UI and click on "Execute query"(with q=*:*), I get only one > entry. > > Do I miss something in my code? > > ... > > String[] files = documentDir.list(); > > > > if (files != null) > > { > > for (String document : files) > > { > > ContentHandler textHandler = new BodyContentHandler(); > > Metadata metadata = new Metadata(); > > ParseContext context = new ParseContext(); > > AutoDetectParser autoDetectParser = new AutoDetectParser(); > > > > InputStream inputStream = null; > > > > try > > { > > inputStream = new FileInputStream(new File(documentDir, > document)); > > > > autoDetectParser.parse(inputStream, textHandler, metadata, > context); > > > > SolrInputDocument doc = new SolrInputDocument(); > > doc.addField("id", document); > > > > String content = textHandler.toString(); > > > > if (content != null) > > { > > doc.addField("fullText", content); > > } > > > > UpdateResponse resp = server.add(doc, 1); > > > > server.commit(true, true, true); > > > > if (resp.getStatus() != 0) > > { > > throw new IDSystemException(LOG, "Document could not be > indexed. Status returned: " + resp.getStatus()); > > } > > } > > catch (FileNotFoundException fnfe) > > { > > throw new IDSystemException(LOG, fnfe.getMessage(), fnfe); > > } > > catch (IOException ioe) > > { > > throw new IDSystemException(LOG, ioe.getMessage(), ioe); > > } > > catch (SAXException se) > > { > > throw new IDSystemException(LOG, se.getMessage(), se); > > } > > catch (TikaException te) > > { > > throw new IDSystemException(LOG, te.getMessage(), te); > > } > > catch (SolrServerException sse) > > { > > throw new IDSystemException(LOG, sse.getMessage(), sse); > > } > > finally > > { > > if (inputStream != null) > > { > > try > > { > > inputStream.close(); > > } > > catch (IOException ioe) > > { > > throw new IDSystemException(LOG, ioe.getMessage(), ioe); > > } > > } > > } > >... > > Thank you for any hint. > > Francesco
FW: Files locked after indexing
Hi to all, I'm pretty new with solr and tika and I have a problem. I have the following workflow in my (web)application: * download a pdf file from an archive * index the file * delete the file My problem is that after indexing the file, it remains locked and the delete-part throws an exception. Here is my code-snippet for indexing the file: try { ContentStreamUpdateRequest req = new ContentStreamUpdateRequest("/update/extract"); req.addFile(file, type); req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); NamedList result = server.request(req); Assert.assertEquals(0, ((NamedList) result.get("responseHeader")).get("status")); } I also tried the "ContentStream" way but without success: ContentStream contentStream = null; try { contentStream = new ContentStreamBase.FileStream(document); ContentStreamUpdateRequest req = new ContentStreamUpdateRequest(UPDATE_EXTRACT_REQUEST); req.addContentStream(contentStream); req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); NamedList result = server.request(req); if (!((NamedList) result.get("responseHeader")).get("status").equals(0)) { throw new IDSystemException(LOG, "Document could not be indexed. Status returned: " + ((NamedList) result.get("responseHeader")).get("status")); } } catch... finally { try { if(contentStream != null && contentStream.getStream() != null) { contentStream.getStream().close(); } } catch (IOException ioe) { throw new IDSystemException(LOG, ioe.getMessage(), ioe); } } Do I miss something? Thank you Francesco
Many PDFs indexed but only one returned in te Solr-UI
I followed the example here (http://searchhub.org/2012/02/14/indexing-with-solrj/) for indexing all the pdfs in a directory. The process seems to work well, but at the end, when I go in the Solr-UI and click on "Execute query"(with q=*:*), I get only one entry. Do I miss something in my code? ... String[] files = documentDir.list(); if (files != null) { for (String document : files) { ContentHandler textHandler = new BodyContentHandler(); Metadata metadata = new Metadata(); ParseContext context = new ParseContext(); AutoDetectParser autoDetectParser = new AutoDetectParser(); InputStream inputStream = null; try { inputStream = new FileInputStream(new File(documentDir, document)); autoDetectParser.parse(inputStream, textHandler, metadata, context); SolrInputDocument doc = new SolrInputDocument(); doc.addField("id", document); String content = textHandler.toString(); if (content != null) { doc.addField("fullText", content); } UpdateResponse resp = server.add(doc, 1); server.commit(true, true, true); if (resp.getStatus() != 0) { throw new IDSystemException(LOG, "Document could not be indexed. Status returned: " + resp.getStatus()); } } catch (FileNotFoundException fnfe) { throw new IDSystemException(LOG, fnfe.getMessage(), fnfe); } catch (IOException ioe) { throw new IDSystemException(LOG, ioe.getMessage(), ioe); } catch (SAXException se) { throw new IDSystemException(LOG, se.getMessage(), se); } catch (TikaException te) { throw new IDSystemException(LOG, te.getMessage(), te); } catch (SolrServerException sse) { throw new IDSystemException(LOG, sse.getMessage(), sse); } finally { if (inputStream != null) { try { inputStream.close(); } catch (IOException ioe) { throw new IDSystemException(LOG, ioe.getMessage(), ioe); } } } ... Thank you for any hint. Francesco