Re: ExtractingRequestHandler and XmlUpdateHandler
: If I can find the bandwidth, I'd like to make something which allows : file uploads via the XMLUpdateHandler as well... Do you have any ideas the XmlUpdateRequestHandler already supports file uploads ... all request handlers do using the ContentStream abstraction... http://wiki.apache.org/solr/ContentStream -Hoss
Re: Dynamic Boosting at query time with boost value as another fieldvalue
: ohk.. that means I can't use colon in the fieldname ever in such a scenario : ? In most internals, the lucene/solr code base allows *any* character in the field name, so you *can* use colons in field names, but many of the surface features (like the query parser) treat colon's as special characters, so in *some* situations colons don't work in field names. -Hoss
Re: [RESULTS] Community Logo Preferences
Just my thoughts on the matter: the designer of the runner-up logo and the 3rd place logo is also responsible for 5 other logos that made it in the list. They are basically different versions of the same concept. If you add up the scores for logo's 2, 3, 6, 8, 11, 20 and 23 you will see a score of 86 (23 votes)! So one can argue the community likes this concept the most. Mark Lindeman Ryan McKinley schreef op 11/28/2008 10:28 PM: Check the results from the poll: http://people.apache.org/~ryan/solr-logo-results.html The obvious winner is: https://issues.apache.org/jira/secure/attachment/12394282/solr2_maho_impression.png But since things are never simple given the similarity of this logo to solaris logo: http://toastytech.com/guis/sol10logo.png SO... we will check with the Apache PRC for guidance before making any final decisions. With their feedback, we *may* pick one of the 'runner up' logos. Stay tuned! ryan
Re: ExtractingRequestHandler and XmlUpdateHandler
On Dec 15, 2008, at 3:13 AM, Chris Hostetter wrote: : If I can find the bandwidth, I'd like to make something which allows : file uploads via the XMLUpdateHandler as well... Do you have any ideas the XmlUpdateRequestHandler already supports file uploads ... all request handlers do using the ContentStream abstraction... http://wiki.apache.org/solr/ContentStream But it doesn't do what Jacob is asking for... he wants (if I'm not mistaken) the ability to send a binary file along with Solr XML, and merge the extraction from the file (via Tika) with the fields specified in the XML. Currently this is not possible, as far as I know. Maybe this sort of thing could be coded to part of an update processor chain? Somehow DIH and the Tika need to tie together eventually too, eh? Erik
Re: ExtractingRequestHandler and XmlUpdateHandler
Hi Erik, This is indeed what I was talking about... It could even be handled via some type of transient file storage system. this might even be better to avoid the risks associated with uploading a huge file across a network and might (have no idea) be easier to implement. So I could send the file, and receive back a token which I would then throw into one of my fields as a reference. Then using it to map tika fields as well. like: str name=file_mod_date${FILETOKEN}.last_modified/str str name=file_body${FILETOKEN}.content/str Best, Jacob On Mon, Dec 15, 2008 at 2:29 PM, Erik Hatcher e...@ehatchersolutions.com wrote: On Dec 15, 2008, at 3:13 AM, Chris Hostetter wrote: : If I can find the bandwidth, I'd like to make something which allows : file uploads via the XMLUpdateHandler as well... Do you have any ideas the XmlUpdateRequestHandler already supports file uploads ... all request handlers do using the ContentStream abstraction... http://wiki.apache.org/solr/ContentStream But it doesn't do what Jacob is asking for... he wants (if I'm not mistaken) the ability to send a binary file along with Solr XML, and merge the extraction from the file (via Tika) with the fields specified in the XML. Currently this is not possible, as far as I know. Maybe this sort of thing could be coded to part of an update processor chain? Somehow DIH and the Tika need to tie together eventually too, eh? Erik -- +1 510 277-0891 (o) +91 33 7458 (m) web: http://pajamadesign.com Skype: pajamadesign Yahoo: jacobsingh AIM: jacobsingh gTalk: jacobsi...@gmail.com
Re: dataimport handler with mysql: wrong field mapping
Have you tried using the dynamicField name=* type=string indexed=true / options in the schema.xml? After the indexing, take a look to the fields DIH has generated. Bye, L.M. 2008/12/15 jokkmokk jokkm...@gmx.at: HI, I'm desperately trying to get the dataimport handler to work, however it seems that it just ignores the field name mapping. I have the fields body and subject in the database and those are called title and content in the solr schema, so I use the following import config: dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/mydb user=root password=/ document entity name=phorum_messages query=select * from phorum_messages field column=body name=content/ field column=subject name=title/ /entity /document /dataConfig however I always get the following exception: org.apache.solr.common.SolrException: ERROR:unknown field 'body' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:274) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:279) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:317) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:326) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:367) but according to the documentation it should add a document with title and content not body and subject?! I'd appreciate any help as I can't see anything wrong with my configuration... TIA, Stefan -- View this message in context: http://www.nabble.com/dataimport-handler-with-mysql%3A-wrong-field-mapping-tp21013109p21013109.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dataimport handler with mysql: wrong field mapping
sorry, I'm using the 1.3.0 release. I've now worked around that issue by using aliases in the sql statement so that no mapping is needed. This way it works perfectly. best regards Stefan Shalin Shekhar Mangar wrote: Which solr version are you using? -- View this message in context: http://www.nabble.com/dataimport-handler-with-mysql%3A-wrong-field-mapping-tp21013109p21013639.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: using BoostingTermQuery
I'm no QueryParser expert, but I would probably start w/ the default query parser in Solr (LuceneQParser), and then progress a bit to the DisMax one. I'd ask specific questions based on what you see there. If you get far enough along, you may consider asking for help on the java-user list as well. Thanks - I think I've got it working now. I ended up subclassing QueryParser and overriding newTermQuery() to create a BoostingTermQuery instead of a plain ol' TermQuery. Seems to work. Kindly let me know where and how to configure the overridden query parser in solr -Ayyanar -- View this message in context: http://www.nabble.com/using-BoostingTermQuery-tp19637123p21011626.html Sent from the Solr - User mailing list archive at Nabble.com.
Feature Request: Return count for documents which are possible to select
Hi all, Whilst Solr is a great resource (a big thank you to the developers) it presents me with a couple of issues. The need for hierarchical facets I would say is a fairly crucial missing piece but has already been pointed out (http://issues.apache.org/jira/browse/SOLR-64). The other issue relates to providing (count) feedback for disjoint selections. When a facet value is selected this constrains the documents and solr returns the counts for all the other facet values. Thus the user can see all the possible valid selections (i.e. having a count 0) and the number of documents which will be returned if that value is selected. However one of the valid selections is to select another value in the facet, creating a disjoint selection and increasingly the number of returned documents. However there is currently no way for the user to know which values are valid to select as the count only relates to currently selected documents and not documents which are also still possible to select. I hope this is clear, it's not the easiest issue to explain (or perhaps I just do it badly). Anyway other Faceted Browsers, such as the Simile Project's Exhibit, do return counts showing the effect of disjoint selections which is more useful for the user. N PS I'm unsure whether this should be posted to the developer's list so I posted here first.
Re: ExtractingRequestHandler and XmlUpdateHandler
Jacob, Hmmm... seems the wires are still crossed and confusing. On Dec 15, 2008, at 6:34 AM, Jacob Singh wrote: This is indeed what I was talking about... It could even be handled via some type of transient file storage system. this might even be better to avoid the risks associated with uploading a huge file across a network and might (have no idea) be easier to implement. If the file is visible from the Solr server, there is no need to actually send the bits through HTTP. Solr's content steam capabilities allow a file to be retrieved from Solr itself. So I could send the file, and receive back a token which I would then throw into one of my fields as a reference. Then using it to map tika fields as well. like: str name=file_mod_date${FILETOKEN}.last_modified/str str name=file_body${FILETOKEN}.content/str Huh? I'm don't follow the file token thing. Perhaps you're thinking you'll post the file, then later update other fields on that same document. An important point here is that Solr currently does not have document update capabilities. A document can be fully replaced, but cannot have fields added to it, once indexed. It needs to be handled all in one shot to accomplish the blending of file/field indexing. Note the ExtractingRequestHandler already has the field mapping capability. But, here's a solution that will work for you right now... let Tika extract the content and return back to you, then turn around and post it and whatever other fields you like: http://wiki.apache.org/solr/TikaExtractOnlyExampleOutput In that example, the contents aren't being indexed, just returned back to the client. And you can leverage the content stream capability with this as well avoiding posting the actual binary file, pointing the extracting request to a file path visible by Solr. Erik
Re: ExtractingRequestHandler and XmlUpdateHandler
Hi Erik, Sorry I wasn't totally clear. Some responses inline: If the file is visible from the Solr server, there is no need to actually send the bits through HTTP. Solr's content steam capabilities allow a file to be retrieved from Solr itself. Yeah, I know. But in my case not possible. Perhaps a simple file receiving HTTP POST handler which simply stored the file on disk and returned a path to it is the way to go here. So I could send the file, and receive back a token which I would then throw into one of my fields as a reference. Then using it to map tika fields as well. like: str name=file_mod_date${FILETOKEN}.last_modified/str str name=file_body${FILETOKEN}.content/str Huh? I'm don't follow the file token thing. Perhaps you're thinking you'll post the file, then later update other fields on that same document. An important point here is that Solr currently does not have document update capabilities. A document can be fully replaced, but cannot have fields added to it, once indexed. It needs to be handled all in one shot to accomplish the blending of file/field indexing. Note the ExtractingRequestHandler already has the field mapping capability. Sorta... I was more thinking of a new feature wherein a Solr Request handler doesn't actually put the file in the index, merely runs it through tika and stores a datastore which links a token with the tika extraction. Then the client could make another request w/ the XMLUpdateHandler which referenced parts of the stored tika extraction. But, here's a solution that will work for you right now... let Tika extract the content and return back to you, then turn around and post it and whatever other fields you like: http://wiki.apache.org/solr/TikaExtractOnlyExampleOutput In that example, the contents aren't being indexed, just returned back to the client. And you can leverage the content stream capability with this as well avoiding posting the actual binary file, pointing the extracting request to a file path visible by Solr. Yeah, I saw that. This is pretty much what I was talking about above, the only disadvantage (which is a deal breaker in our case) is the extra bandwidth to move the file back and forth. Thanks for your help and quick response. I think we'll integrate the POST fields as Grant has kindly provided multi-value input now, and see what happens in the future. I realize what I'm talking about (XML and binary together) is probably not a high priority feature. Best Jacob Erik -- +1 510 277-0891 (o) +91 33 7458 (m) web: http://pajamadesign.com Skype: pajamadesign Yahoo: jacobsingh AIM: jacobsingh gTalk: jacobsi...@gmail.com
Re: Sample code for some examples for using solr in applications
See also http://wiki.apache.org/solr/SolrResources On Dec 15, 2008, at 2:57 AM, Andre Hagenbruch wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sajith Vimukthi schrieb: Hi Sajith, I need some sample code of some examples done using solr. I need to get an idea on how I can use solr in my application. Please be kind enough to reply me asap. It would be a grt help. did you already have a look at the documentation for Solrj (http://wiki.apache.org/solr/Solrj) or any of the other clients? Overall, the wiki (http://wiki.apache.org/solr/) is a good place to get started... Hth, Andre -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAklGDf4ACgkQ3wuzs9k1icW/zgCeMSYFlHAwksHS2UZKZ9ZsaipX NZcAn1Oibwe8aH9odu4Abc5DqbI1opI3 =HIl+ -END PGP SIGNATURE- -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
RE: Solrj: Multivalued fields give Bad Request
Sorry, Forgot the most important detail. The document I am adding contains multiple names fields: sInputDocument.addField(names, value); sInputDocument.addField(names, value); sInputDocument.addField(names, value); There is no problem when a document only contains one value in the names field. -Original Message- From: Schilperoort, René [mailto:rene.schilpero...@getronics.com] Sent: maandag 15 december 2008 16:52 To: solr-user@lucene.apache.org Subject: Solrj: Multivalued fields give Bad Request Hi all, When adding documents to Solr using solr I receive the following Exception. org.apache.solr.common.SolrException: Bad Request The field is configured as followed: field name=names type=string indexed=true stored=true multiValued=true/ Any suggestions? Regards, Rene
Re: dataimport handler with mysql: wrong field mapping
Which solr version are you using? On Mon, Dec 15, 2008 at 6:04 PM, jokkmokk jokkm...@gmx.at wrote: HI, I'm desperately trying to get the dataimport handler to work, however it seems that it just ignores the field name mapping. I have the fields body and subject in the database and those are called title and content in the solr schema, so I use the following import config: dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/mydb user=root password=/ document entity name=phorum_messages query=select * from phorum_messages field column=body name=content/ field column=subject name=title/ /entity /document /dataConfig however I always get the following exception: org.apache.solr.common.SolrException: ERROR:unknown field 'body' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:274) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:279) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:317) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:326) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:367) but according to the documentation it should add a document with title and content not body and subject?! I'd appreciate any help as I can't see anything wrong with my configuration... TIA, Stefan -- View this message in context: http://www.nabble.com/dataimport-handler-with-mysql%3A-wrong-field-mapping-tp21013109p21013109.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: using BoostingTermQuery
In the solrconfig.xml (scroll all the way to the bottom, and I believe the example has some commented out) On Dec 15, 2008, at 5:45 AM, ayyanar wrote: I'm no QueryParser expert, but I would probably start w/ the default query parser in Solr (LuceneQParser), and then progress a bit to the DisMax one. I'd ask specific questions based on what you see there. If you get far enough along, you may consider asking for help on the java-user list as well. Thanks - I think I've got it working now. I ended up subclassing QueryParser and overriding newTermQuery() to create a BoostingTermQuery instead of a plain ol' TermQuery. Seems to work. Kindly let me know where and how to configure the overridden query parser in solr -Ayyanar -- View this message in context: http://www.nabble.com/using-BoostingTermQuery-tp19637123p21011626.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Solrj - SolrQuery - specifying SolrCore - when the Solr Server has multiple cores
Hi - I am looking at the article here with a brief introduction to SolrJ . http://www.ibm.com/developerworks/library/j-solr-update/index.html?ca=dgr-jw17SolrS_Tact=105AGX59S_CMP=GRsitejw17#solrj . In case we have multiple SolrCores in the server application - (since 1.3) - how do I specify as part of SolrQuery as to which core needs to be used for the given query. I am trying to dig out the information from the code. Meanwhile, if someone is aware of the same - please suggest some pointers.
Re: Solrj - SolrQuery - specifying SolrCore - when the Solr Server has multiple cores
A solr core is like a separate solr server... so create a new CommonsHttpSolrServer that points at the core. You probably want to create and reuse a single HttpClient instance for the best efficiency. -Yonik On Mon, Dec 15, 2008 at 11:06 AM, Kay Kay kaykay.uni...@gmail.com wrote: Hi - I am looking at the article here with a brief introduction to SolrJ . http://www.ibm.com/developerworks/library/j-solr-update/index.html?ca=dgr-jw17SolrS_Tact=105AGX59S_CMP=GRsitejw17#solrj . In case we have multiple SolrCores in the server application - (since 1.3) - how do I specify as part of SolrQuery as to which core needs to be used for the given query. I am trying to dig out the information from the code. Meanwhile, if someone is aware of the same - please suggest some pointers.
Re: Solrj: Multivalued fields give Bad Request
What do you see in the admin schema browser? /admin/schema.jsp When you select the field names, do you see the property Multivalued? ryan On Dec 15, 2008, at 10:55 AM, Schilperoort, René wrote: Sorry, Forgot the most important detail. The document I am adding contains multiple names fields: sInputDocument.addField(names, value); sInputDocument.addField(names, value); sInputDocument.addField(names, value); There is no problem when a document only contains one value in the names field. -Original Message- From: Schilperoort, René [mailto:rene.schilpero...@getronics.com] Sent: maandag 15 december 2008 16:52 To: solr-user@lucene.apache.org Subject: Solrj: Multivalued fields give Bad Request Hi all, When adding documents to Solr using solr I receive the following Exception. org.apache.solr.common.SolrException: Bad Request The field is configured as followed: field name=names type=string indexed=true stored=true multiValued=true/ Any suggestions? Regards, Rene
CustomQueryParser
I found the following solution in the forum to use BoostingTermQuery in solr: I ended up subclassing QueryParser and overriding newTermQuery() to create a BoostingTermQuery instead of a plain ol' TermQuery. Seems to work. http://www.nabble.com/RE:-using-BoostingTermQuery-p19651792.html I have some questions on this: 1) Anyone tried this? Is it working 2) Where to specify the query parser subclass name? SolrConfig.xml? What is the xml tag name for this? 3) Should we use Qparser? I think we can directly subclass the QueryParser and do that. Am I right? 4) Kindly post the code sample to override the newTermQuery() to create a BoostingTermQuery Thanks in advance Ayyanar -- View this message in context: http://www.nabble.com/CustomQueryParser-tp21012136p21012136.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sample code
http://lucene.apache.org/solr/tutorial.html On Dec 15, 2008, at 12:56 AM, Sajith Vimukthi wrote: Hi all, Can someone of you give me a sample code on a search function done with solr so that I can get an idea on how I can use it. Regards, Sajith Vimukthi Weerakoon Associate Software Engineer | ZONE24X7 | Tel: +94 11 2882390 ext 101 | Fax: +94 11 2878261 | http://www.zone24x7.com
dataimport handler with mysql: wrong field mapping
HI, I'm desperately trying to get the dataimport handler to work, however it seems that it just ignores the field name mapping. I have the fields body and subject in the database and those are called title and content in the solr schema, so I use the following import config: dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/mydb user=root password=/ document entity name=phorum_messages query=select * from phorum_messages field column=body name=content/ field column=subject name=title/ /entity /document /dataConfig however I always get the following exception: org.apache.solr.common.SolrException: ERROR:unknown field 'body' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:274) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:279) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:317) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:326) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:367) but according to the documentation it should add a document with title and content not body and subject?! I'd appreciate any help as I can't see anything wrong with my configuration... TIA, Stefan -- View this message in context: http://www.nabble.com/dataimport-handler-with-mysql%3A-wrong-field-mapping-tp21013109p21013109.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExtractingRequestHandler and XmlUpdateHandler
On Dec 15, 2008, at 8:20 AM, Jacob Singh wrote: Hi Erik, Sorry I wasn't totally clear. Some responses inline: If the file is visible from the Solr server, there is no need to actually send the bits through HTTP. Solr's content steam capabilities allow a file to be retrieved from Solr itself. Yeah, I know. But in my case not possible. Perhaps a simple file receiving HTTP POST handler which simply stored the file on disk and returned a path to it is the way to go here. So I could send the file, and receive back a token which I would then throw into one of my fields as a reference. Then using it to map tika fields as well. like: str name=file_mod_date${FILETOKEN}.last_modified/str str name=file_body${FILETOKEN}.content/str Huh? I'm don't follow the file token thing. Perhaps you're thinking you'll post the file, then later update other fields on that same document. An important point here is that Solr currently does not have document update capabilities. A document can be fully replaced, but cannot have fields added to it, once indexed. It needs to be handled all in one shot to accomplish the blending of file/field indexing. Note the ExtractingRequestHandler already has the field mapping capability. Sorta... I was more thinking of a new feature wherein a Solr Request handler doesn't actually put the file in the index, merely runs it through tika and stores a datastore which links a token with the tika extraction. Then the client could make another request w/ the XMLUpdateHandler which referenced parts of the stored tika extraction. Hmmm, thinking out loud Override SolrContentHandler. It is responsible for mapping the Tika output to a Solr Document. Capture all the content into a single buffer. Add said buffer to a field that is stored only Add a second field that is indexed. This is your token. You could, just as well, have that token be the only thing that gets returned by extract only. Alternately, you could implement an UpdateProcessor thingamajob that takes the output and stores it to the filesystem and just adds the token to a document. But, here's a solution that will work for you right now... let Tika extract the content and return back to you, then turn around and post it and whatever other fields you like: http://wiki.apache.org/solr/TikaExtractOnlyExampleOutput In that example, the contents aren't being indexed, just returned back to the client. And you can leverage the content stream capability with this as well avoiding posting the actual binary file, pointing the extracting request to a file path visible by Solr. Yeah, I saw that. This is pretty much what I was talking about above, the only disadvantage (which is a deal breaker in our case) is the extra bandwidth to move the file back and forth. Thanks for your help and quick response. I think we'll integrate the POST fields as Grant has kindly provided multi-value input now, and see what happens in the future. I realize what I'm talking about (XML and binary together) is probably not a high priority feature. Is the use case this: 1. You want to assign metadata and also store the original and have it stored in binary format, too? Thus, Solr becomes a backing, searchable store? I think we could possibly add an option to serialize the ContentStream onto a Field on the Document. In other words, store the original with the Document. Of course, buyer beware on the cost of doing so.
Re: Solrj - SolrQuery - specifying SolrCore - when the Solr Server has multiple cores
Thanks Yonik for the clarification. Yonik Seeley wrote: A solr core is like a separate solr server... so create a new CommonsHttpSolrServer that points at the core. You probably want to create and reuse a single HttpClient instance for the best efficiency. -Yonik On Mon, Dec 15, 2008 at 11:06 AM, Kay Kay kaykay.uni...@gmail.com wrote: Hi - I am looking at the article here with a brief introduction to SolrJ . http://www.ibm.com/developerworks/library/j-solr-update/index.html?ca=dgr-jw17SolrS_Tact=105AGX59S_CMP=GRsitejw17#solrj . In case we have multiple SolrCores in the server application - (since 1.3) - how do I specify as part of SolrQuery as to which core needs to be used for the given query. I am trying to dig out the information from the code. Meanwhile, if someone is aware of the same - please suggest some pointers.
Please help me articulate this query
Hey all, I'm having trouble articulating a query and I'm hopeful someone out there can help me out :) My situation is this: I am indexing a series of questions that can either be asked from a main question entry page, or a specific subject page. I have a field called referring which indexes the title of the specific subject page, plus the regular question whenever that document is submitted from a specific specific subject page. Otherwise, every document is indexed with just the question. Specifically, what I am trying to do is when I am on the page specific subject page (e.g. Tom Cruise) I want to search for all of the questions asked from that page, plus any question asked about Tom Cruise. Something like: q=(referring:Tom AND Cruise) OR (question:Tom AND Cruise) Have you ever used a Tom Tom? - Not returned Where is the best place to take a cruise? - Not returned When did he have is first kid? - Returned iff question was asked from Tom Cruise page Do you think that Tom Cruise will make more movies? - Always returned Any thoughts? -Derek
Re: Please help me articulate this query
I think in this case you would want to index each question with the possible referrers ( by title might be too imprecise, I'd go with filename or ID) and then do a search like this (assuming in this case it's by filename) q=(referring:TomCruise.html) OR (question: Tom AND Cruise) Which seems to be what you're thinking. I would make the referrer a type string though so that you don't accidentally pull in documents from a different subject (Tom Cruise this would work ok, but imagine you need to distinguish between George Washington and George Washington Carver). -- Steve On Dec 15, 2008, at 2:59 PM, Derek Springer wrote: Hey all, I'm having trouble articulating a query and I'm hopeful someone out there can help me out :) My situation is this: I am indexing a series of questions that can either be asked from a main question entry page, or a specific subject page. I have a field called referring which indexes the title of the specific subject page, plus the regular question whenever that document is submitted from a specific specific subject page. Otherwise, every document is indexed with just the question. Specifically, what I am trying to do is when I am on the page specific subject page (e.g. Tom Cruise) I want to search for all of the questions asked from that page, plus any question asked about Tom Cruise. Something like: q=(referring:Tom AND Cruise) OR (question:Tom AND Cruise) Have you ever used a Tom Tom? - Not returned Where is the best place to take a cruise? - Not returned When did he have is first kid? - Returned iff question was asked from Tom Cruise page Do you think that Tom Cruise will make more movies? - Always returned Any thoughts? -Derek
Re: Please help me articulate this query
Thanks for the tip, I appreciate it! However, does anyone know how to articulate the syntax of (This AND That) OR (Something AND Else) into a query string? i.e. q=referring:### AND question:### On Mon, Dec 15, 2008 at 12:32 PM, Stephen Weiss swe...@stylesight.comwrote: I think in this case you would want to index each question with the possible referrers ( by title might be too imprecise, I'd go with filename or ID) and then do a search like this (assuming in this case it's by filename) q=(referring:TomCruise.html) OR (question: Tom AND Cruise) Which seems to be what you're thinking. I would make the referrer a type string though so that you don't accidentally pull in documents from a different subject (Tom Cruise this would work ok, but imagine you need to distinguish between George Washington and George Washington Carver). -- Steve On Dec 15, 2008, at 2:59 PM, Derek Springer wrote: Hey all, I'm having trouble articulating a query and I'm hopeful someone out there can help me out :) My situation is this: I am indexing a series of questions that can either be asked from a main question entry page, or a specific subject page. I have a field called referring which indexes the title of the specific subject page, plus the regular question whenever that document is submitted from a specific specific subject page. Otherwise, every document is indexed with just the question. Specifically, what I am trying to do is when I am on the page specific subject page (e.g. Tom Cruise) I want to search for all of the questions asked from that page, plus any question asked about Tom Cruise. Something like: q=(referring:Tom AND Cruise) OR (question:Tom AND Cruise) Have you ever used a Tom Tom? - Not returned Where is the best place to take a cruise? - Not returned When did he have is first kid? - Returned iff question was asked from Tom Cruise page Do you think that Tom Cruise will make more movies? - Always returned Any thoughts? -Derek
Re: Multi tokenizer
: I need to tokenize my field on whitespaces, html, punctuation, apostrophe : but if I use HTMLStripStandardTokenizerFactory it strips only html : but no apostrophes you might consider using one of the HTML Tokenizers, and then use a PatternReplaceFilterFilter ... or if you know java write a simple Tokenizer that uses the HTMLStripReader. in the long run, changing the HTMLStripReader to be useble as a CharFilter so it can work with any Tokenizer is probably the way we'll go -- but i don't think anyone has started working on a patch for that. thanks... I used HTMLStripStandardTokenizerFactory and then a PatternReplaceFilterFilter now it works
TextField size limit
Hi all, i have a TextField containing over 400k of text when i try to search a word solr doesn't return any result but if I search for a single document, I can see that the word exists there So I suppose that solr has a textfield size limit (the field is indexed using a tokenizer and some filters) Could anyone help me to undestand the problem? and if is it possible to solve? Thanks in advance, Antonio
Re: TextField size limit
Check your solrconfig.xml: maxFieldLength1/maxFieldLength That's probably the truncating factor. That's the maximum number of terms, not bytes or characters. Erik On Dec 15, 2008, at 5:00 PM, Antonio Zippo wrote: Hi all, i have a TextField containing over 400k of text when i try to search a word solr doesn't return any result but if I search for a single document, I can see that the word exists there So I suppose that solr has a textfield size limit (the field is indexed using a tokenizer and some filters) Could anyone help me to undestand the problem? and if is it possible to solve? Thanks in advance, Antonio
Slow Response time after optimize
Hi guys, I have a typical master/slave setup running with Solr 1.3.0. I did some basic scalability test with JMeter and tweaked our environment and determined that we can handle approximately 26 simultaneous threads and get end-to-end response times of under 200ms even with typically every 5 minute distribution. However, as soon as I issue a single optimize on the master, the response time goes up to over 500ms and does not seem to recover. As soon as I restarted the response time is back down to 200ms. My index is approximately 5 GB in size and the queries are just basic constructed disjunction queries such as title:iphone OR bodytext:iphone. Has anybody seen this issue before? Thanks, Sammy
Re: Standard request with functional query
Hey guys, Thanks for the response, but how would make recency a factor on scoring documents with the standard request handler. The query (title:iphone OR bodytext:iphone OR title:firmware OR bodytext:firmware) AND _val_:ord(dateCreated)^0.1 seems to do something very similar to just sorting by dateCreated rather than having dateCreated being a part of the score. Thanks, Sammy n Thu, Dec 4, 2008 at 1:35 PM, Sammy Yu temi...@gmail.com wrote: Hi guys, I have a standard query that searches across multiple text fields such as q=title:iphone OR bodytext:iphone OR title:firmware OR bodytext:firmware This comes back with documents that have iphone and firmware (I know I can use dismax handler but it seems to be really slow), which is great. Now I want to give some more weight to more recent documents (there is a dateCreated field in each document). So I've modified the query as such: (title:iphone OR bodytext:iphone OR title:firmware OR bodytext:firmware) AND _val_:ord(dateCreated)^0.1 URLencoded to q=(title%3Aiphone+OR+bodytext%3Aiphone+OR+title%3Afirmware+OR+bodytext%3Afirmware)+AND+_val_%3Aord(dateCreated)^0.1 However, the results are not as one would expects. The first few documents only come back with the word iphone and appears to be sorted by date created. It seems to completely ignore the score and use the dateCreated field for the score. On a not directly related issue it seems like if you put the weight within the double quotes: (title:iphone OR bodytext:iphone OR title:firmware OR bodytext:firmware) AND _val_:ord(dateCreated)^0.1 the parser complains: org.apache.lucene.queryParser.ParseException: Cannot parse '(title:iphone OR bodytext:iphone OR title:firmware OR bodytext:firmware) AND _val_:ord(dateCreated)^0.1': Expected ',' at position 16 in 'ord(dateCreated)^0.1' Thanks, Sammy
Re: TextField size limit
Check your solrconfig.xml: maxFieldLength1/maxFieldLength That's probably the truncating factor. That's the maximum number of terms, not bytes or characters. Erik Thanks... I think it could be the problem. i tried to count whitespace in a single text and it's over 55.000 ... but solr truncates to 10.000 do you know if I can change the value to 100.000 without recreate the index? (when I modify schema.xml I need to create the index again but with solrconfig.xml?) Thanks, Antonio
Re: TextField size limit
On Mon, Dec 15, 2008 at 5:28 PM, Antonio Zippo reven...@yahoo.it wrote: Check your solrconfig.xml: maxFieldLength1/maxFieldLength That's probably the truncating factor. That's the maximum number of terms, not bytes or characters. Thanks... I think it could be the problem. i tried to count whitespace in a single text and it's over 55.000 ... but solr truncates to 10.000 do you know if I can change the value to 100.000 without recreate the index? (when I modify schema.xml I need to create the index again but with solrconfig.xml?) No need to re-index with this change. But you will have to re-index any documents that got cut off of course. -Yonik
Re: TextField size limit
No need to re-index with this change. But you will have to re-index any documents that got cut off of course. -Yonik Ok, thanks... I hoped to reindex the documents over the existent index (with incremental update...while solr is running) ...and without delete the index folder But the important is to solve the problem ;-) Thanks... Antonio
Some solrconfig.xml attributes being ignored
Hello, In my solrconfig.xml file I am setting the attribute hl.snippets to 3. When I perform a search, it returns only a single snippet for each highlighted field. However, when I set the hl.snippets field manually as a search parameter, I get up to 3 highlighted snippets. This is the configuration that I am using to set the highlighted parameters: fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter default=true lst name=defaults str name=hl.snippets3/str str name=hl.fragsize100/str str name=hl.regex.slop0.5/str str name=hl.regex.pattern\w[-\w ,/\n\']{50,150}/str /lst /fragmenter I tried setting hl.fragmenter=regex as a parameter as well, to be sure that it was using the correct one, and the result set is the same. Any ideas what could be causing this attribute not to be read? It has me concerned that other attributes are being ignored as well. Thanks, Mark Ferguson
Re: Using Regex fragmenter to extract paragraphs
You actually don't need to escape most characters inside a character class, the escaping of the period was unnecessary. I've tried using the example regex ([-\w ,/\n\']{20,200}), and I'm _still_ getting lots of highlighted snippets that don't match the regex (starting with a period, etc.) Has anyone else has any trouble with the default regex fragmenter? If someone has used it and gotten the expected results, can you let me know, so I know that the problem is on my end? Thanks for your help, Mark On Sun, Dec 14, 2008 at 8:34 AM, Erick Erickson erickerick...@gmail.comwrote: Shouldn't you escape the question mark at the end too? On Fri, Dec 12, 2008 at 6:22 PM, Mark Ferguson mark.a.fergu...@gmail.com wrote: Someone helped me with the regex and pointed out a couple mistakes, most notably the extra quantifier in .*{400,600}. My new regex is this: \w.{400,600}[\.!?] Unfortunately, my results still aren't any better. Some results start with a word character, some don't, and none seem to end with punctuation. Any ideas would else could be wrong? Mark On Fri, Dec 12, 2008 at 2:37 PM, Mark Ferguson mark.a.fergu...@gmail.com wrote: Hello, I am trying to use the regex fragmenter and am having a hard time getting the results I want. I am trying to get fragments that start on a word character and end on punctuation, but for some reason the fragments being returned to me seem to be very inflexible, despite that I've provided a large slop. Here are the relevant parameters I'm using, maybe someone can help point out where I've gone wrong: str name=hl.fragsize500/str str name=hl.fragmenterregex/str str name=hl.regex.slop0.8/str str name=hl.regex.pattern[\w].*{400,600}[.!?]/str str name=hltrue/str str name=qchinese/str This should be matching between 400-600 characters, beginning with a word character and ending with one of .!?. Here is an example of a typical result: . Check these pictures out. Nine panda cubs on display for the first time Thursday in southwest China. They're less than a year old. They just recently stopped nursing. There are only 1,600 of these guys left in the mountain forests of central China, another 120 in span class='hl'Chinese/span breeding facilities and zoos. And they're about 20 that live outside China in zoos. They exist almost entirely on bamboo. They can live to be 30 years old. And these little guys will eventually get much bigger. They'll grow As you can see, it is starting with a period and ending on a word character! It's almost as if the fragments are just coming out as they will and the regex isn't doing anything at all, but the results are different when I use the gap fragmenter. In the above result I don't see any reason why it shouldn't have stripped out the preceding period and the last two words, there is plenty of room in the slop and in the regex pattern. Please help me figure out what I'm doing wrong... Thanks a lot, Mark Ferguson
Re: Some solrconfig.xml attributes being ignored
Try adding echoParams=all to your query to verify the params that the solr request handler is getting. -Yonik On Mon, Dec 15, 2008 at 6:10 PM, Mark Ferguson mark.a.fergu...@gmail.com wrote: Hello, In my solrconfig.xml file I am setting the attribute hl.snippets to 3. When I perform a search, it returns only a single snippet for each highlighted field. However, when I set the hl.snippets field manually as a search parameter, I get up to 3 highlighted snippets. This is the configuration that I am using to set the highlighted parameters: fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter default=true lst name=defaults str name=hl.snippets3/str str name=hl.fragsize100/str str name=hl.regex.slop0.5/str str name=hl.regex.pattern\w[-\w ,/\n\']{50,150}/str /lst /fragmenter I tried setting hl.fragmenter=regex as a parameter as well, to be sure that it was using the correct one, and the result set is the same. Any ideas what could be causing this attribute not to be read? It has me concerned that other attributes are being ignored as well. Thanks, Mark Ferguson
Re: Some solrconfig.xml attributes being ignored
Thanks for this tip, it's very helpful. Indeed, it looks like none of the highlighting parameters are being included. It's using the correct request handler and hl is set to true, but none of the highlighting parameters from solrconfig.xml are in the parameter list. Here is my query: http://localhost:8080/solr1/select?rows=50hl=truefl=url,urlmd5,page_title,scoreechoParams=allq=java Here are the settings for the request handler and the highlighter: requestHandler name=dismax class=solr.SearchHandler default=true lst name=defaults str name=defTypedismax/str float name=tie0.01/float str name=qfbody_text^1.0 page_title^1.6 meta_desc^1.3/str str name=q.alt*:*/str str name=hl.flbody_text page_title meta_desc/str str name=f.page_title.hl.fragsize0/str str name=f.meta_desc.hl.fragsize0/str str name=hl.fragmenterregex/str /lst /requestHandler highlighting fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter default=true lst name=defaults str name=hl.snippets3/str str name=hl.fragsize100/str str name=hl.regex.slop0.5/str str name=hl.regex.pattern\w[-\w ,/\n\']{50,150}/str /lst /fragmenter /highlighting And here is the param list returned to me: lst name=params str name=echoParamsall/str str name=tie0.01/str str name=hl.fragmenterregex/str str name=f.page_title.hl.fragsize0/str str name=qfbody_text^1.0 page_title^1.6 meta_desc^1.3/str str name=f.meta_desc.hl.fragsize0/str str name=q.alt*:*/str str name=hl.flpage_title,body_text/str str name=defTypedismax/str str name=echoParamsall/str str name=flurl,urlmd5,page_title,score/str str name=qjava/str str name=hltrue/str str name=rows50/str /lst So it seems like everything is working except for the highlighter. I should mention that when I enter a bogus fragmenter as a parameter (e.g. hl.fragmenter=bogus), it returns a 400 error that the fragmenter cannot be found, so the config file _is_ finding the regex fragmenter. It just doesn't seem to actually be including its parameters... Any ideas are appreciated, thanks again for the help. Mark On Mon, Dec 15, 2008 at 4:23 PM, Yonik Seeley ysee...@gmail.com wrote: Try adding echoParams=all to your query to verify the params that the solr request handler is getting. -Yonik On Mon, Dec 15, 2008 at 6:10 PM, Mark Ferguson mark.a.fergu...@gmail.com wrote: Hello, In my solrconfig.xml file I am setting the attribute hl.snippets to 3. When I perform a search, it returns only a single snippet for each highlighted field. However, when I set the hl.snippets field manually as a search parameter, I get up to 3 highlighted snippets. This is the configuration that I am using to set the highlighted parameters: fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter default=true lst name=defaults str name=hl.snippets3/str str name=hl.fragsize100/str str name=hl.regex.slop0.5/str str name=hl.regex.pattern\w[-\w ,/\n\']{50,150}/str /lst /fragmenter I tried setting hl.fragmenter=regex as a parameter as well, to be sure that it was using the correct one, and the result set is the same. Any ideas what could be causing this attribute not to be read? It has me concerned that other attributes are being ignored as well. Thanks, Mark Ferguson
Re: SolrConfig.xml Replication
It does appear to be working for us now. The files replicated out appropriately which is a huge help. Thanks to all! -Jeff On 12/13/08 9:42 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Jeff, SOLR-821 has a patch now. It'd be nice to get some feedback if you manage to try it out. On Thu, Dec 11, 2008 at 8:33 PM, Jeff Newburn jnewb...@zappos.com wrote: Thank you for the quick response. I will keep an eye on that to see how it progresses. On 12/10/08 8:03 PM, Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com wrote: This is a known issue and I was planning to take it up soon. https://issues.apache.org/jira/browse/SOLR-821 On Thu, Dec 11, 2008 at 5:30 AM, Jeff Newburn jnewb...@zappos.com wrote: I am curious as to whether there is a solution to be able to replicate solrconfig.xml with the 1.4 replication. The obvious problem is that the master would replicate the solrconfig turning all slaves into masters with its config. I have also tried on a whim to configure the master and slave on the master so that the slave points to the same server but that seems to break the replication completely. Please let me know if anybody has any ideas -Jeff -- Regards, Shalin Shekhar Mangar.
Re: Dismax Minimum Match/Stopwords Bug
Would this mean that, for example, if we wanted to search productId (long) we'd need to make a field type that had stopwords in it rather than simply using (long)? Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833 On Dec 12, 2008, at 11:56 PM, Chris Hostetter wrote: : I have discovered some weirdness with our Minimum Match functionality. : Essentially it comes up with absolutely no results on certain queries. : Basically, searches with 2 words and 1 being ³the² don¹t have a return : result. From what we can gather the minimum match criteria is making it : such that if there are 2 words then both are required. Unfortunately, the you haven't mentioned what qf you're using, and you only listed one field type, which includes stopwords -- but i suspect your qf contains at least one field that *doesn't* remove stopwords. this is in fact an unfortunate aspect of the way dismax works -- each chunk of text recognized by the querypaser is passed to each analyzer for each field. Any chunk that produces a query for a field becomes a DisjunctionMaxQuery, and is included in the mm count -- even if that chunk is a stopword in every other field (and produces no query) so you have to either be consistent with your stopwords across all fields, or make your mm really small. searching for dismax stopwords turns this up... http://www.nabble.com/Re%3A-DisMax-request-handler-doesn%27t-work-with-stopwords--p11016770.html ...if i'm wrong about your situation (some fields in the qf with stopwords and some fields without) then please post all of the params you are using (not just mm) and the full parsedquery_tostring from when debugQuery=true is turned on. -Hoss
Parent Child Entity - DataImport
I have a parent entity that grabs a list of records of a certain type from 1 table... and a sub-entity that queries another table to retrieve the actual data... for various reasons I cannot join the tables... the 2nd sql query converts the rows into an xml to be processed by a custom transformer (done due to the complex nature of the second table) Full-import works fine but delta-import is not adding any new records... Do I have to specify a deltaQuery for the sub-entity? What else might be goin on? document name=doc entity name=table1 pk=id query= SELECT ID,MY_GUID FROM activityLog WHERE type in (11, 15) deltaQuery= SELECT ID,MY_GUID FROM activityLog WHERE type in (11, 15) and created_date '${dataimporter.last_index_time}' field column=MY_GUID name=myGuid/ entity name=table2 pk=ID query=select dbms_xmlgen.getxml(' select Name, Title, Description from metaDataTable where MY_GUID = ${table1.MY_GUID_ID} ') mdrXmlClob from dual transformer=MD.Solr.Utils.transformers.MDTransformer field column=Name name=mdName/ field column=Title name=mdTitle/ field column=Description name=mdDescription/ /entity /entity /document -- View this message in context: http://www.nabble.com/Parent-Child-Entity---DataImport-tp21024979p21024979.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Some solrconfig.xml attributes being ignored
It seems like maybe the fragmenter parameters just don't get displayed with echoParams=all set. It may only display as far as the request handler's parameters. The reason I think this is because I tried increasing hl.fragsize to 1000 and the results were returned correctly (much larger snippets), so I know it was read correctly. I moved hl.snippets into the requestHandler config instead of the fragmenter, and this seems to have solved the problem. However, I'm uneasy with this solution because I don't know why it wasn't being read correctly when setting it inside the fragmenter. Mark On Mon, Dec 15, 2008 at 5:08 PM, Mark Ferguson mark.a.fergu...@gmail.comwrote: Thanks for this tip, it's very helpful. Indeed, it looks like none of the highlighting parameters are being included. It's using the correct request handler and hl is set to true, but none of the highlighting parameters from solrconfig.xml are in the parameter list. Here is my query: http://localhost:8080/solr1/select?rows=50hl=truefl=url,urlmd5,page_title,scoreechoParams=allq=java Here are the settings for the request handler and the highlighter: requestHandler name=dismax class=solr.SearchHandler default=true lst name=defaults str name=defTypedismax/str float name=tie0.01/float str name=qfbody_text^1.0 page_title^1.6 meta_desc^1.3/str str name=q.alt*:*/str str name=hl.flbody_text page_title meta_desc/str str name=f.page_title.hl.fragsize0/str str name=f.meta_desc.hl.fragsize0/str str name=hl.fragmenterregex/str /lst /requestHandler highlighting fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter default=true lst name=defaults str name=hl.snippets3/str str name=hl.fragsize100/str str name=hl.regex.slop0.5/str str name=hl.regex.pattern\w[-\w ,/\n\']{50,150}/str /lst /fragmenter /highlighting And here is the param list returned to me: lst name=params str name=echoParamsall/str str name=tie0.01/str str name=hl.fragmenterregex/str str name=f.page_title.hl.fragsize0/str str name=qfbody_text^1.0 page_title^1.6 meta_desc^1.3/str str name=f.meta_desc.hl.fragsize0/str str name=q.alt*:*/str str name=hl.flpage_title,body_text/str str name=defTypedismax/str str name=echoParamsall/str str name=flurl,urlmd5,page_title,score/str str name=qjava/str str name=hltrue/str str name=rows50/str /lst So it seems like everything is working except for the highlighter. I should mention that when I enter a bogus fragmenter as a parameter (e.g. hl.fragmenter=bogus), it returns a 400 error that the fragmenter cannot be found, so the config file _is_ finding the regex fragmenter. It just doesn't seem to actually be including its parameters... Any ideas are appreciated, thanks again for the help. Mark On Mon, Dec 15, 2008 at 4:23 PM, Yonik Seeley ysee...@gmail.com wrote: Try adding echoParams=all to your query to verify the params that the solr request handler is getting. -Yonik On Mon, Dec 15, 2008 at 6:10 PM, Mark Ferguson mark.a.fergu...@gmail.com wrote: Hello, In my solrconfig.xml file I am setting the attribute hl.snippets to 3. When I perform a search, it returns only a single snippet for each highlighted field. However, when I set the hl.snippets field manually as a search parameter, I get up to 3 highlighted snippets. This is the configuration that I am using to set the highlighted parameters: fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter default=true lst name=defaults str name=hl.snippets3/str str name=hl.fragsize100/str str name=hl.regex.slop0.5/str str name=hl.regex.pattern\w[-\w ,/\n\']{50,150}/str /lst /fragmenter I tried setting hl.fragmenter=regex as a parameter as well, to be sure that it was using the correct one, and the result set is the same. Any ideas what could be causing this attribute not to be read? It has me concerned that other attributes are being ignored as well. Thanks, Mark Ferguson
Getting Field Collapsing working
Hi everybody, So I have applied the Ivans latest patch to a clean 1.3. I built it using 'ant compile' and 'ant dist', got the solr build.war file. Moved that into the Tomcat directory. Modified my solrconfig.xml to include the following: searchComponent name=collapse class=org.apache.solr.handler.component.CollapseComponent / arr name=components strquery/str strfacet/str strmlt/str strhighlight/str strdebug/str strcollapse/str /arr arr name=first-components strmyFirstComponentName/str strcollapse/str /arr thinking that everything should work correctly I did a search with the following: http://localhost:8080/solr/select/?q=mikaversion=2.2start=0rows=10indent=oncollapse=truecollapse.field=type I see the query parameters captured in the responseHeaders section, but I don't see a collapse section. Does anybody have any ideas? Any help would be greatly appreciated. Thank you, -John
Re: Parent Child Entity - DataImport
I do not observe anything wrong. you can also mention the 'deltaImportQuery' and try it someting like entity name=table1 pk=id query= SELECT ID,MY_GUID FROM activityLog WHERE type in (11, 15) deltaImportQuery=SELECT ID,MY_GUID FROM activityLog WHERE type in (11, 15) AND id=${dataimporter.delta.ID} deltaQuery=SELECT ID,MY_GUID FROM activityLog WHERE type in (11, 15) and created_date '${dataimporter.last_index_time}' On Tue, Dec 16, 2008 at 5:54 AM, sbutalia sbuta...@gmail.com wrote: I have a parent entity that grabs a list of records of a certain type from 1 table... and a sub-entity that queries another table to retrieve the actual data... for various reasons I cannot join the tables... the 2nd sql query converts the rows into an xml to be processed by a custom transformer (done due to the complex nature of the second table) Full-import works fine but delta-import is not adding any new records... Do I have to specify a deltaQuery for the sub-entity? What else might be goin on? document name=doc entity name=table1 pk=id query= SELECT ID,MY_GUID FROM activityLog WHERE type in (11, 15) deltaQuery= SELECT ID,MY_GUID FROM activityLog WHERE type in (11, 15) and created_date '${dataimporter.last_index_time}' field column=MY_GUID name=myGuid/ entity name=table2 pk=ID query=select dbms_xmlgen.getxml(' select Name, Title, Description from metaDataTable where MY_GUID = ${table1.MY_GUID_ID} ') mdrXmlClob from dual transformer=MD.Solr.Utils.transformers.MDTransformer field column=Name name=mdName/ field column=Title name=mdTitle/ field column=Description name=mdDescription/ /entity /entity /document -- View this message in context: http://www.nabble.com/Parent-Child-Entity---DataImport-tp21024979p21024979.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: SolrConfig.xml Replication
Jeff, Thanks. It would be nice if you just review the config syntax and see if all possible usecases are covered . Is there any scope for improvement ? On Tue, Dec 16, 2008 at 5:45 AM, Jeff Newburn jnewb...@zappos.com wrote: It does appear to be working for us now. The files replicated out appropriately which is a huge help. Thanks to all! -Jeff On 12/13/08 9:42 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Jeff, SOLR-821 has a patch now. It'd be nice to get some feedback if you manage to try it out. On Thu, Dec 11, 2008 at 8:33 PM, Jeff Newburn jnewb...@zappos.com wrote: Thank you for the quick response. I will keep an eye on that to see how it progresses. On 12/10/08 8:03 PM, Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com wrote: This is a known issue and I was planning to take it up soon. https://issues.apache.org/jira/browse/SOLR-821 On Thu, Dec 11, 2008 at 5:30 AM, Jeff Newburn jnewb...@zappos.com wrote: I am curious as to whether there is a solution to be able to replicate solrconfig.xml with the 1.4 replication. The obvious problem is that the master would replicate the solrconfig turning all slaves into masters with its config. I have also tried on a whim to configure the master and slave on the master so that the slave points to the same server but that seems to break the replication completely. Please let me know if anybody has any ideas -Jeff -- Regards, Shalin Shekhar Mangar. -- --Noble Paul
Re: Parent Child Entity - DataImport
I'ev had a chance to play with this more and noticed the query does run fine but it only updates the records that are already indexed it doesn't add new ones. The only option that i'ev found so far is to do a full-import with the clean=false attribute and created_date last_indexed_date... Is there a better way? Thanks Noble Paul നോബിള് नोब्ळ् wrote: I do not observe anything wrong. you can also mention the 'deltaImportQuery' and try it someting like entity name=table1 pk=id query= SELECT ID,MY_GUID FROM activityLog WHERE type in (11, 15) deltaImportQuery=SELECT ID,MY_GUID FROM activityLog WHERE type in (11, 15) AND id=${dataimporter.delta.ID} deltaQuery=SELECT ID,MY_GUID FROM activityLog WHERE type in (11, 15) and created_date '${dataimporter.last_index_time}' On Tue, Dec 16, 2008 at 5:54 AM, sbutalia sbuta...@gmail.com wrote: I have a parent entity that grabs a list of records of a certain type from 1 table... and a sub-entity that queries another table to retrieve the actual data... for various reasons I cannot join the tables... the 2nd sql query converts the rows into an xml to be processed by a custom transformer (done due to the complex nature of the second table) Full-import works fine but delta-import is not adding any new records... Do I have to specify a deltaQuery for the sub-entity? What else might be goin on? document name=doc entity name=table1 pk=id query= SELECT ID,MY_GUID FROM activityLog WHERE type in (11, 15) deltaQuery= SELECT ID,MY_GUID FROM activityLog WHERE type in (11, 15) and created_date '${dataimporter.last_index_time}' field column=MY_GUID name=myGuid/ entity name=table2 pk=ID query=select dbms_xmlgen.getxml(' select Name, Title, Description from metaDataTable where MY_GUID = ${table1.MY_GUID_ID} ') mdrXmlClob from dual transformer=MD.Solr.Utils.transformers.MDTransformer field column=Name name=mdName/ field column=Title name=mdTitle/ field column=Description name=mdDescription/ /entity /entity /document -- View this message in context: http://www.nabble.com/Parent-Child-Entity---DataImport-tp21024979p21024979.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context: http://www.nabble.com/Parent-Child-Entity---DataImport-tp21024979p21027045.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Parent Child Entity - DataImport
Are the queries being fired wrong/different when you tried full-import? On Tue, Dec 16, 2008 at 9:57 AM, sbutalia sbuta...@gmail.com wrote: I'ev had a chance to play with this more and noticed the query does run fine but it only updates the records that are already indexed it doesn't add new ones. The only option that i'ev found so far is to do a full-import with the clean=false attribute and created_date last_indexed_date... Is there a better way? Thanks Noble Paul നോബിള് नोब्ळ् wrote: I do not observe anything wrong. you can also mention the 'deltaImportQuery' and try it someting like entity name=table1 pk=id query= SELECT ID,MY_GUID FROM activityLog WHERE type in (11, 15) deltaImportQuery=SELECT ID,MY_GUID FROM activityLog WHERE type in (11, 15) AND id=${dataimporter.delta.ID} deltaQuery=SELECT ID,MY_GUID FROM activityLog WHERE type in (11, 15) and created_date '${dataimporter.last_index_time}' On Tue, Dec 16, 2008 at 5:54 AM, sbutalia sbuta...@gmail.com wrote: I have a parent entity that grabs a list of records of a certain type from 1 table... and a sub-entity that queries another table to retrieve the actual data... for various reasons I cannot join the tables... the 2nd sql query converts the rows into an xml to be processed by a custom transformer (done due to the complex nature of the second table) Full-import works fine but delta-import is not adding any new records... Do I have to specify a deltaQuery for the sub-entity? What else might be goin on? document name=doc entity name=table1 pk=id query= SELECT ID,MY_GUID FROM activityLog WHERE type in (11, 15) deltaQuery= SELECT ID,MY_GUID FROM activityLog WHERE type in (11, 15) and created_date '${dataimporter.last_index_time}' field column=MY_GUID name=myGuid/ entity name=table2 pk=ID query=select dbms_xmlgen.getxml(' select Name, Title, Description from metaDataTable where MY_GUID = ${table1.MY_GUID_ID} ') mdrXmlClob from dual transformer=MD.Solr.Utils.transformers.MDTransformer field column=Name name=mdName/ field column=Title name=mdTitle/ field column=Description name=mdDescription/ /entity /entity /document -- View this message in context: http://www.nabble.com/Parent-Child-Entity---DataImport-tp21024979p21024979.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context: http://www.nabble.com/Parent-Child-Entity---DataImport-tp21024979p21027045.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: Please help me articulate this query
Derek, q=+referring:XXX +question:YYY (of course, you'll have to URL-encode that query string0 Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Derek Springer de...@mahalo.com To: solr-user@lucene.apache.org Sent: Monday, December 15, 2008 3:40:55 PM Subject: Re: Please help me articulate this query Thanks for the tip, I appreciate it! However, does anyone know how to articulate the syntax of (This AND That) OR (Something AND Else) into a query string? i.e. q=referring:### AND question:### On Mon, Dec 15, 2008 at 12:32 PM, Stephen Weiss wrote: I think in this case you would want to index each question with the possible referrers ( by title might be too imprecise, I'd go with filename or ID) and then do a search like this (assuming in this case it's by filename) q=(referring:TomCruise.html) OR (question: Tom AND Cruise) Which seems to be what you're thinking. I would make the referrer a type string though so that you don't accidentally pull in documents from a different subject (Tom Cruise this would work ok, but imagine you need to distinguish between George Washington and George Washington Carver). -- Steve On Dec 15, 2008, at 2:59 PM, Derek Springer wrote: Hey all, I'm having trouble articulating a query and I'm hopeful someone out there can help me out :) My situation is this: I am indexing a series of questions that can either be asked from a main question entry page, or a specific subject page. I have a field called referring which indexes the title of the specific subject page, plus the regular question whenever that document is submitted from a specific specific subject page. Otherwise, every document is indexed with just the question. Specifically, what I am trying to do is when I am on the page specific subject page (e.g. Tom Cruise) I want to search for all of the questions asked from that page, plus any question asked about Tom Cruise. Something like: q=(referring:Tom AND Cruise) OR (question:Tom AND Cruise) Have you ever used a Tom Tom? - Not returned Where is the best place to take a cruise? - Not returned When did he have is first kid? - Returned iff question was asked from Tom Cruise page Do you think that Tom Cruise will make more movies? - Always returned Any thoughts? -Derek
Details on logging in Solr
Hi, I was trying to do a performance test on Solr web application. If I run the performance tests, lot og logging is happening due to which I am getting log files in GBs Is there any clean way of deactivating logging or changing the log level to say error .. Is there any property file for the same. Please give your inputs for the same. Regards, Rinesh. -- View this message in context: http://www.nabble.com/Details-on-logging-in-Solr-tp21027267p21027267.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Details on logging in Solr
solr 1.3 uses java logging. Most app containers (tomcat, resin, etc) give you a way to configure that. Also check: http://java.sun.com/j2se/1.4.2/docs/guide/util/logging/overview.html#1.8 You can make runtime changes from the /admin/ logging tab. However, these changes are not persisted when the app restarts. On Dec 15, 2008, at 11:52 PM, Rinesh1 wrote: Hi, I was trying to do a performance test on Solr web application. If I run the performance tests, lot og logging is happening due to which I am getting log files in GBs Is there any clean way of deactivating logging or changing the log level to say error .. Is there any property file for the same. Please give your inputs for the same. Regards, Rinesh. -- View this message in context: http://www.nabble.com/Details-on-logging-in-Solr-tp21027267p21027267.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Details on logging in Solr
Hi Ryan, Thanks for the inputs .These are the set of steps followed to solve this issue. 1.make a loggging property file say solrLogging.properties.We can copy the default logging property file available at JAVA_HOME/jre/lib folder. default java logging file will look like the following. # Default Logging Configuration File # # You can use a different file by specifying a filename # with the java.util.logging.config.file system property. # For example java -Djava.util.logging.config.file=myfile # Global properties # handlers specifies a comma separated list of log Handler # classes. These handlers will be installed during VM startup. # Note that these classes must be on the system classpath. # By default we only configure a ConsoleHandler, which will only # show messages at the INFO and above levels. handlers= java.util.logging.ConsoleHandler # To also add the FileHandler, use the following line instead. #handlers= java.util.logging.FileHandler, java.util.logging.ConsoleHandler # Default global logging level. # This specifies which kinds of events are logged across # all loggers. For any given facility this global level # can be overriden by a facility specific level # Note that the ConsoleHandler also has a separate level # setting to limit messages printed to the console. .level= INFO # Handler specific properties. # Describes specific configuration info for Handlers. # default file output is in user's home directory. java.util.logging.FileHandler.pattern = %h/java%u.log java.util.logging.FileHandler.limit = 5 java.util.logging.FileHandler.count = 1 java.util.logging.FileHandler.formatter = java.util.logging.XMLFormatter # Limit the message that are printed on the console to INFO and above. java.util.logging.ConsoleHandler.level = INFO java.util.logging.ConsoleHandler.formatter = java.util.logging.SimpleFormatter # Facility specific properties. # Provides extra control for each logger. # For example, set the com.xyz.foo logger to only log SEVERE # messages: com.xyz.foo.level = SEVERE To prevent INFO level messages change java.util.logging.ConsoleHandler.level = INFO to java.util.logging.ConsoleHandler.level = SEVERE 2.While starting the server for Ex. Jboss add the following line to the run.bat or run.sh set JAVA_OPTS=%JAVA_OPTS% -Djava.util.logging.config.file=Y:\solrLog.properties This will solve the issue Regards, Rinesh. ryantxu wrote: solr 1.3 uses java logging. Most app containers (tomcat, resin, etc) give you a way to configure that. Also check: http://java.sun.com/j2se/1.4.2/docs/guide/util/logging/overview.html#1.8 You can make runtime changes from the /admin/ logging tab. However, these changes are not persisted when the app restarts. On Dec 15, 2008, at 11:52 PM, Rinesh1 wrote: Hi, I was trying to do a performance test on Solr web application. If I run the performance tests, lot og logging is happening due to which I am getting log files in GBs Is there any clean way of deactivating logging or changing the log level to say error .. Is there any property file for the same. Please give your inputs for the same. Regards, Rinesh. -- View this message in context: http://www.nabble.com/Details-on-logging-in-Solr-tp21027267p21027267.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Details-on-logging-in-Solr-tp21027267p21027540.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solrj: Multivalued fields give Bad Request
Ryan, It turned out that another multivalue field was causing my problem. This field was no longer configured in my schema. My dynamic catch-all field of type ignored was not multivalued, adding multivalue to this field solved my problem. Regards, Rene -Original Message- From: Ryan McKinley [mailto:ryan...@gmail.com] Sent: maandag 15 december 2008 17:28 To: solr-user@lucene.apache.org Subject: Re: Solrj: Multivalued fields give Bad Request What do you see in the admin schema browser? /admin/schema.jsp When you select the field names, do you see the property Multivalued? ryan On Dec 15, 2008, at 10:55 AM, Schilperoort, René wrote: Sorry, Forgot the most important detail. The document I am adding contains multiple names fields: sInputDocument.addField(names, value); sInputDocument.addField(names, value); sInputDocument.addField(names, value); There is no problem when a document only contains one value in the names field. -Original Message- From: Schilperoort, René [mailto:rene.schilpero...@getronics.com] Sent: maandag 15 december 2008 16:52 To: solr-user@lucene.apache.org Subject: Solrj: Multivalued fields give Bad Request Hi all, When adding documents to Solr using solr I receive the following Exception. org.apache.solr.common.SolrException: Bad Request The field is configured as followed: field name=names type=string indexed=true stored=true multiValued=true/ Any suggestions? Regards, Rene