Re: Features not present in Solr
Srikanth, I don't know anything about Endeca, so I can't compare Solr to it. However, I know Solr is powerful. Very powerful. So, maybe you should tell us more about your needs to get a good answer. As a response to your second question: You should not expect that Solr is a database. It is an index-server. A database makes your data save. If there goes something wrong - which is always possible - Solr gives no warranties. Maybe someone other can tell you more about this topic. - Mitch Srikanth B wrote: > > Hello > > We are in the process of researching on Solr features. I am looking for > two > things > 1. Features not available in Solr but present in other products > like > Endeca > 2. What one shouldn't not expect from Solr > > Any thoughts ? > > Thanks in advance > Srikanth > > -- View this message in context: http://old.nabble.com/Features-not-present-in-Solr-tp27966315p27982734.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facetting with Synonyms
Hi Otis, thank you for responsing. Hmm, since I am not omniscient, this seems to be no way for me, because this would mean I have to know all about the artist at index-time. But your response makes me thinking about an idea: A synonym-mapper. The syonym-mapper should work on the responsed facets of the query. It is not important to map S&P to Snaga & Pillath and force Solr to combine both result sets. The same to HP and Hewlett Packard. To response only one of those terms to the user is enough, since I can translate "HP" with the help of a synonymFilter to "Hewlett Packard" at query-time, if the user is interested in such a facet. What do you think about this? If I want to do such changes to Solr, I think I need to customize something that directly computes the results for the responseWriter. Do you know which classes are responsible for that? If this would be too complicated, because one has to make changes in too much classes, maybe I will contribute a tool which does this on an already built response. Another way would be to create only a new responseWriter, am I right? If you think this would be a good idea, I will go on to ask some architectural questions, to save memory and time. Maybe I will go on to open an issue for that... Any other ideas are welcome :-)! Kind regards - Mitch Otis Gospodnetic wrote: > > Hi Mitch, > > You asked how others would solve this problem. I would try to normalize > the data before indexing it. In other words, I'd clean it up myself to > avoid GIGO situation. > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Hadoop ecosystem search :: http://search-hadoop.com/ > > -- View this message in context: http://old.nabble.com/Facetting-with-Synonyms-tp27976997p27982710.html Sent from the Solr - User mailing list archive at Nabble.com.
Chunked streaming upload to Solr
I would like to upload data to Solr for indexing, in chunks, in one HTTP POST request. Is this possible? What exactly should I set as the client socket parameters? What I'm getting is that with the default parameters, the first write adds a Content-Length matching the size of the first chunk. Solr reads that as the entire upload. Apparently the right way to handle this with an HTTP request parameter "Transfer-Encoding" set to "chunked". (I don't know the total size of the upload.) This results in the HTTP parser blowing up. Here is the stack trace: Mar 21, 2010 8:35:18 PM sun.reflect.NativeMethodAccessorImpl invoke0 WARNING: handle failed java.io.IOException: bad chunk char: 115 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:687) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) log4j:WARN Detected problem with connection: java.net.SocketException: Unexpected end of file from server log4j:WARN Detected problem with connection: java.net.SocketException: Unexpected end of file from server log4j:WARN Detected problem with connection: java.net.SocketException: Unexpected end of file from server Has anyone made this work? Thanks, -- Lance Norskog goks...@gmail.com
Re: Facetting with Synonyms
Hi Mitch, You asked how others would solve this problem. I would try to normalize the data before indexing it. In other words, I'd clean it up myself to avoid GIGO situation. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message > From: MitchK > To: solr-user@lucene.apache.org > Sent: Sun, March 21, 2010 12:28:16 PM > Subject: Facetting with Synonyms > > Hello out there, I got a little problem: Users take care about > what will be indexed and what not. Sometimes there is a little > problem: For example: The artists "Snaga & Pillath" are similar to "S & P". > When I Index the document, I can solve this problem with the help of > a SynonymFilter. However, if I want to retrive some facets over > a result-response, there is a little problem: "S&P" and "Snaga & Pillath" > both will be responsed. Is there a possibility to response only "S&P" OR > "Snaga & Pillath"? I think another example for something like this is > "HP" and "Hewlett Packard". If one user calls the manufacturer of his printer > "HP" and another one says "Hewlett Packard" and you want to do some > facetting, there will be two responsed terms. But the truth is: Every > HP and every Hewlett Packard facet, as well as every Snaga & Pillath/S&P > facet should facet the same documents. How would you solve this > problem? Kind regards - Mitch -- View this message in context: > > target=_blank > >http://old.nabble.com/Facetting-with-Synonyms-tp27976997p27976997.html Sent > from the Solr - User mailing list archive at Nabble.com.
Re: Features not present in Solr
On Mon, 22 Mar 2010 05:12:06 +0530 Srikanth B wrote: > Thanks but Im looking for answers on the functional and technical > front. [...] Yours is a very broad question, and the details of the answers probably depend on the domain that you are trying to use Solr in. Solr is extensively documented on the Wiki, and the Solr 1.4 book is available. Why not look for yourself, and see if Solr meets your needs? Regards, Gora
Re: Weired behaviour for certain search terms
The search is now working for those terms. I did the following changes. In schema file, I replace with . Ahmet Arslan wrote: > > >> I tired adding &hl.maxAnalyzedChars=-1 to my search >> query but it didnt >> helped. >> Just wanted to know if there are limitations on the certain >> search terms. >> Its bit strange that solr is not behaving properly for >> certain terms >> (especially returning the excerpts in highlighting >> dictionary). >> The terms which i have found so far are: >> 1. co-ownership >> 2. "co ownership" >> 3. co-employees >> > > Can you paste your field type definition and declaration? Are you storing > term vectors? Also can you give us an query and document pair (returns > that document but no highlighting) I will try to reproduce the problem. > > Also what happens when you use &hl.usePhraseHighlighter=false? > > > > > -- View this message in context: http://old.nabble.com/Weired-behaviour-for-certain-search-terms-tp27927995p27981626.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Features not present in Solr
Thanks but Im looking for answers on the functional and technical front. On 3/20/10, Israel Ekpo wrote: > > One feature that is not available in Solr is any licensing fees and fine > print. > > Also you should not expect to pay in order to use Solr. > > On Fri, Mar 19, 2010 at 11:16 PM, Srikanth B > wrote: > > > Hello > > > > We are in the process of researching on Solr features. I am looking for > two > > things > >1. Features not available in Solr but present in other products > like > > Endeca > >2. What one shouldn't not expect from Solr > > > > Any thoughts ? > > > > Thanks in advance > > Srikanth > > > > > > -- > "Good Enough" is not good enough. > To give anything less than your best is to sacrifice the gift. > Quality First. Measure Twice. Cut Once. > http://www.israelekpo.com/ >
dismax and q.op
Hi, I am using dismax handler. I have it set up in my solrconfig.xml. I have *not* used default="true" while setting it up (the standard still has default="true") *I haven't mentioned value for mm* In my schema.xml I have set the default operator to be AND. When I query I use the following in my query url where my query is for say for example *international monetary fund*:- .../select?*q.alt*=international+monetary+fund&*qt=dismax* My result:- No results; but each of the terms individually gave me results! I appreciate any help on my following queries :- 1. Will the query look for documents that have *international* AND *monetary * AND *fund* or is it some other behavior based on the setting I have mentioned above. 2. Does the default operator specified in schema.xml take effect when we use dismax also or is it only for the *standard* request handler. If it has an effect if we specify value for mm like say 90% will it overridethe schema.xml default operator set up. 3. How does q.alt and q difer in behavior in the above case. I found q.alt to be giving me the results which I got when I used the standard RH also. Hence used it. 4. When I make a change to the dismax set up I have in solrconfig.xml I believe i just have to bounce the SOLR server.Do i need to re-index again for the change to take effect 5. If I use the dismax how do I see the ANALYSIS feature on the admin console other wise used for *standard* RH. Thanks for your patience. Best Rgds, Mark.
Re: trimfilterfactory on string fieldtype?
> Can the trim filter factory work on > string fieldtypes? No. String field type (solr.StrField) is not analyzed. You can use charfilter,tokenizer,tokenfilter with solr.TextField. You can use this (TrimmedString) field definition:
Re: related search
On Sun, Mar 21, 2010 at 4:30 AM, Suram wrote: > > Thanx lot Ahmet Arslan > > How can make query to get synonym value ,any suggestion You need to apply the SynonymFilterFactory to your queries as well.
use termscomponent like spellComponent ?!
hello. i play with solr but i didn`t find the perfect solution for me. my goal is a search like the amazonsearch from the iPhoneApp. ;) it is possible to use the TermsComponent like the SpellComponent ? So, that works termsComp with more than one single Term ?! i got these 3 docs with the name in my index: - nikon one - nikon two - nikon three so when ich search for "nik" termsCom suggest me "nikon". thats correctly whar i want. but when i type "nikon on" i want that solr suggest me "nikon one" , how is that realizable ??? pleeease help me somebody ;) a merge of TC nad SC where best solution in think so. this is my searchfield. did i use the correct type ? -- View this message in context: http://old.nabble.com/use-termscomponent-like-spellComponent--%21-tp27977008p27977008.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr crashing while extracting from very simple text file
Hi all I'm trying to import some text files. I'm mostly following Avi Rappoport's tutorial. Some of my files cause Solr to crash while indexing. I've narrowed it down to a very simple example. I have a file named test.txt with one line. That line is the word XXBLE and nothing else This is the command I'm using. curl "http://localhost:8080/solr-example/update/extract?literal.id=1&commit=true"; -F "myfi...@test.txt" The result is pasted below. Other files work just fine. The problem seems to be related to the letters B and E. If I change them to something else or make them lower case then it works. In my real files, the XX is something else but the result is the same. It's a common word in the files. I guess for this "quick and dirty" job I'm doing I could do a bulk replace in the files to make it lower case. Is there any workaround for this? Thanks Ross Apache Tomcat/6.0.20 - Error report HTTP Status 500 - org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:636) Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) ... 18 more Caused by: java.lang.NullPointerException at java.io.Reader.(Reader.java:78) at java.io.BufferedReader. (BufferedReader.java:93) at java.io.BufferedReader. (BufferedReader.java:108) at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:59) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119) ... 20 more type Status reportmessage org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handl
Encoding problem with ExtractRequestHandler for HTML indexing
Hi, I'm trying to index HTML documents with different encodings. My html are either in win-12XX, ISO-8859-X or UTF8 encoding. handler correctly parses all html in their respective encodings and indexes. However on the web interface I'm developing I enter query terms in UTF-8 which naturally does not match with content with different encodings. Also the results I see on my web app is not utf8 encoded as expected. My question, is there any filter I can use to convert all content extracted by the handler to UTF-8 prior to indexing? Does it make sense to write a filter which would convert tokens to UTF-8, or even is it possible with multiple encodings? Thanks in advance. Ukyo
Facetting with Synonyms
Hello out there, I got a little problem: Users take care about what will be indexed and what not. Sometimes there is a little problem: For example: The artists "Snaga & Pillath" are similar to "S & P". When I Index the document, I can solve this problem with the help of a SynonymFilter. However, if I want to retrive some facets over a result-response, there is a little problem: "S&P" and "Snaga & Pillath" both will be responsed. Is there a possibility to response only "S&P" OR "Snaga & Pillath"? I think another example for something like this is "HP" and "Hewlett Packard". If one user calls the manufacturer of his printer "HP" and another one says "Hewlett Packard" and you want to do some facetting, there will be two responsed terms. But the truth is: Every HP and every Hewlett Packard facet, as well as every Snaga & Pillath/S&P facet should facet the same documents. How would you solve this problem? Kind regards - Mitch -- View this message in context: http://old.nabble.com/Facetting-with-Synonyms-tp27976997p27976997.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boundary match as part of query language?
By the way, you'll probably want to shingle or use CommonGrams (with _BEGIN & _END being "common") for acceptable performance. I'm wondering, if Lucene's new payload features might provide an alternative mechanism to mark the first and last term. ~ David Smiley hossman wrote: > > > : Now, I know how to work-around this, by appending some unique character > : sequence at each end of the field and then include this in my search in > : the front end. However, I wonder if any of you have been planning a > : patch to add a native boundary match feature to Solr that would > : automagically add tokens (also for multi-value fields!), and expand the > : query language to allow querying for starts-with(), ends-with() and > : equals() > > well, if you *always* want boundary rules to be applied, that can be done > as simply as adding your boundary tokens automaticly in both the index and > query time analyzers ... then a search for q="New York" can > automaticly be translated into a PhraseQuery for "_BEGIN New York _END" > > If you want special QueryParser markup to specify when you wnat specific > boundary conditions that can also be done with a custom QParser, and > automaicly applying the boundry tokens in your indexing analyzer (but not > the query analyzer -- the QParser would take care of that part) In > general though it's hard to see how something like q=begin(New York) is > easier syntax then q="_BEGIN New York" > > THe point is it's realtively easy to implement something like this when > meeting specific needs, but i don't know of any working on a truely > generalized Qparser that deals with this -- largely because most people > who care about this sort of thing either have really complicated use cases > (ie: not just begin/end boudnary markers, but also want sentence, > paragraph, page, chapter, section, etc...) or want extremely specific > query syntax (ie: they're trying to recreate the syntax of an existing > system they are replacing) so a general solution doesn't work well. > > The cosest i've ever seen is Mark Miller's QSolr parser, which actually > went a completley differnet direction using a home grown syntax to > generate Span queries ... if that slacker ever gets off his butt and > starts running his webserver again, you could download it and try it out, > and probably find that it would be trivial to turn it into a QParser. > > > -Hoss > > > -- View this message in context: http://old.nabble.com/Boundary-match-as-part-of-query-language--tp27851560p27976989.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: related search
Thanx lot Ahmet Arslan How can make query to get synonym value ,any suggestion -- View this message in context: http://old.nabble.com/related-search-tp27933778p27974649.html Sent from the Solr - User mailing list archive at Nabble.com.