Re: Question by solr queries optimization
Hi Alex, Indeed Solr would automatically rewrite this query to `id:%key^3` since versions 7.1 / 8.0. This happens via BooleanQuery#rewrite, you can check out the JIRA where this was implemented: https://issues.apache.org/jira/browse/LUCENE-7925. On Wed, Dec 23, 2020 at 3:13 PM Alex Bulygin wrote: > Good day to all! Perhaps a stupid question, I'm not very experienced in > using solr, please, tell me, if I send such a request to solr id: (% key > or% key or% key) and the keys are equal, will there be any optimization of > such a request ? Or tell me from the code where such an optimization can > take place? Hope for help > > -- > Bulygin Alex > -- Adrien
Question by solr queries optimization
Good day to all! Perhaps a stupid question, I'm not very experienced in using solr, please, tell me, if I send such a request to solr id: (% key or% key or% key) and the keys are equal, will there be any optimization of such a request ? Or tell me from the code where such an optimization can take place? Hope for help -- Bulygin Alex
Re: Question on Solr
Hello Prathib, This is how I would go. Will index these XML's as flat records/plain data in Solr and then during query time search these records. Converting xml's to plain data in the form of key/ value pair will be done during ingestion time and then during query if you have to present results into XML format, you can again apply the XML transformation. Basically search XML snippets is more or less a text search which is what Solr is about. You can utilise nested documents in Solr to fit your need. Thanks, Susheel On Tue, Feb 14, 2017 at 7:39 PM, Prathib Kumarwrote: > Hi, > > We are evaluating solr to see if it can help to do a search of the xml > snippets from the whole xml doc. > > For Ex: > Document-1: > > >Prathib >Java >san jose > CA > > > Document-2: > >Joe >C++ >chennai > TN > > > Document-3: > >Ramu >Python >LosAngeles > CA > > > > My Search string is another XML doc which could be like. > > Query-1: > > san jose > > > Query-2: > >CA > > > I have broken this down for simplicity, in reality our xmls are nested and > have many attributes on each tag. > > To continue the evaluation of solr, can you please help me from where I > could start the analysis ? > > Note : currently our xml document doesnt adhere to any schema but we could > create a schema if required. > > > > Regards > Prathib Kumar. > >
Question on Solr
Hi, We are evaluating solr to see if it can help to do a search of the xml snippets from the whole xml doc. For Ex: Document-1: Prathib Java san jose CA Document-2: Joe C++ chennai TN Document-3: Ramu Python LosAngeles CA My Search string is another XML doc which could be like. Query-1: san jose Query-2: CA I have broken this down for simplicity, in reality our xmls are nested and have many attributes on each tag. To continue the evaluation of solr, can you please help me from where I could start the analysis ? Note : currently our xml document doesnt adhere to any schema but we could create a schema if required. Regards Prathib Kumar.
Question regarding Solr 4.7 solr joining across multiple cores and sorting
Hi, I'm running into an issue attempting to sort, here is the scenario. I have my mainIndex which looks something like this. id description name 1 description1 name1 2 description2 name2 I also have a subIndex which looks something like this id metric 1 4 2 5 What I am trying to do is join the two index's on the id column and have my results sorted based on a column from the subIndex with a query like the following. testServer/solr/MainIndex/select?defType=edismaxq=*fq={!join from=id to=id fromIndex=subIndex}id:*sort=metric desc desired result 2 description2 name2 1 description1 name1 I'm aware that you lose all information from the subIndex the moment the parser sees . What are my options should I want to join two indexes and sort on a column not present in the main index? Thanks in advance. Parnit
Question about SOLR soft commit
Hello there, I'm confused about soft commit. There is very little explanation about this on wiki, I hope to know some more details. Thanks in advance. Best Regards, Illu
RE: Question about solr config files encoding.
Config fiules are XML and I changed them to be handled by the XML parser (InputStreams), so XML parser reads encoding from Header. But JSON is defined to be UTF-8, so we must supply the encoding (IOUtils.UTF8_CHARSET). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Dawid Weiss [mailto:dawid.we...@gmail.com] Sent: Thursday, July 05, 2012 5:00 PM To: dev@lucene.apache.org Subject: Question about solr config files encoding. Guys should the encoding of config files really be platform-dependent? Currently Solr tests fail massively on setup because of things like this: public OpenExchangeRates(InputStream ratesStream) throws IOException { parser = new JSONParser(new InputStreamReader(ratesStream)); this reader, when confronted with UTF-16 as file.encoding results in funky exceptions like: Caused by: org.apache.noggit.JSONParser$ParseException: JSON Parse Error: char=笊,position=0 BEFORE='笊' AFTER='†≤楳捬慩浥爢㨠≔桩猠摡瑡猠捯汬散瑥搠晲潭⁶慲楯畳⁰牯癩摥牳 湤⁰牯癩摥搠晲' at org.apache.noggit.JSONParser.err(JSONParser.java:221) at org.apache.noggit.JSONParser.next(JSONParser.java:620) at org.apache.noggit.JSONParser.nextEvent(JSONParser.java:661) at org.apache.solr.schema.OpenExchangeRatesOrgProvider$OpenExchangeRates. init(OpenExchangeRatesOrgProvider.java:189) at org.apache.solr.schema.OpenExchangeRatesOrgProvider.reload(OpenExchang eRatesOrgProvider.java:129) Can we fix the encoding of these input files to UTF-8 or something? According to JSON RFC: http://tools.ietf.org/html/rfc4627#section-3 JSON text SHALL be encoded in Unicode. The default encoding is UTF-8. Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets. 00 00 00 xx UTF-32BE 00 xx 00 xx UTF-16BE xx 00 00 00 UTF-32LE xx 00 xx 00 UTF-16LE xx xx xx xx UTF-8 We could just enforce/require UTF-8? Alternatively, auto-detect this from a binary stream as a custom Reader class. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Question about solr config files encoding.
But JSON is defined to be UTF-8, so we must supply the encoding (IOUtils.UTF8_CHARSET). That RFC says it can be any unicode... this said I agree with you that we can probably assume it's UTF-8 and not worry about anything else. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Question about solr config files encoding.
3. Encoding JSON text SHALL be encoded in Unicode. The default encoding is UTF-8. Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets. 00 00 00 xx UTF-32BE 00 xx 00 xx UTF-16BE xx 00 00 00 UTF-32LE xx 00 xx 00 UTF-16LE xx xx xx xx UTF-8 :-) I think we can safely assume it is UTF-8, otherwise we must do the same shit like XML parsers with mark() on BufferedInputStream Most libraries out there can only read UTF-8 and SOLR itself produces only UTF8 JSON, right? Those tests only check response from solr. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: dawid.we...@gmail.com [mailto:dawid.we...@gmail.com] On Behalf Of Dawid Weiss Sent: Thursday, July 05, 2012 5:35 PM To: dev@lucene.apache.org Subject: Re: Question about solr config files encoding. But JSON is defined to be UTF-8, so we must supply the encoding (IOUtils.UTF8_CHARSET). That RFC says it can be any unicode... this said I agree with you that we can probably assume it's UTF-8 and not worry about anything else. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Question about solr config files encoding.
On Thu, Jul 5, 2012 at 10:59 AM, Dawid Weiss dawid.we...@gmail.com wrote: According to JSON RFC: http://tools.ietf.org/html/rfc4627#section-3 JSON text SHALL be encoded in Unicode. One of my little pet peeves with the RFC - I think this was a bad requirement. JSON should have been text, and then their should have been an optional way to detect encoding if other mechanisms don't cover it (like HTTP headers, etc). This effectively means that something like [hi] is not valid JSON for many of you reading this email (if your email client is internally representing it as something other than unicode encoded for example). We could just enforce/require UTF-8? Yes, Solr has normally always required/assumed UTF-8 for config files. It's simply an oversight in any places that don't. -Yonik http://lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Question about solr config files encoding.
I just add: Solr's XML files are parsed according to XML spec, so you can choose any charset, you only have to define it according to XML spec! Also XML POST to updatehandler can be any encoding (it does not need to be declared in header anymore, the ?xml... header is fine). There is already a test! I Fixed all this in endless sessions, but I was happy to do it, as my favourite data format is: XML :-) [I refuse to fix this for DIH, but that's another story, SOLR-2347]. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Thursday, July 05, 2012 5:43 PM To: dev@lucene.apache.org Subject: Re: Question about solr config files encoding. On Thu, Jul 5, 2012 at 10:59 AM, Dawid Weiss dawid.we...@gmail.com wrote: According to JSON RFC: http://tools.ietf.org/html/rfc4627#section-3 JSON text SHALL be encoded in Unicode. One of my little pet peeves with the RFC - I think this was a bad requirement. JSON should have been text, and then their should have been an optional way to detect encoding if other mechanisms don't cover it (like HTTP headers, etc). This effectively means that something like [hi] is not valid JSON for many of you reading this email (if your email client is internally representing it as something other than unicode encoded for example). We could just enforce/require UTF-8? Yes, Solr has normally always required/assumed UTF-8 for config files. It's simply an oversight in any places that don't. -Yonik http://lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Question about solr config files encoding.
updatehandler can be any encoding (it does not need to be declared in header ...HTTP header..., sorry -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Thursday, July 05, 2012 5:43 PM To: dev@lucene.apache.org Subject: Re: Question about solr config files encoding. On Thu, Jul 5, 2012 at 10:59 AM, Dawid Weiss dawid.we...@gmail.com wrote: According to JSON RFC: http://tools.ietf.org/html/rfc4627#section-3 JSON text SHALL be encoded in Unicode. One of my little pet peeves with the RFC - I think this was a bad requirement. JSON should have been text, and then their should have been an optional way to detect encoding if other mechanisms don't cover it (like HTTP headers, etc). This effectively means that something like [hi] is not valid JSON for many of you reading this email (if your email client is internally representing it as something other than unicode encoded for example). We could just enforce/require UTF-8? Yes, Solr has normally always required/assumed UTF-8 for config files. It's simply an oversight in any places that don't. -Yonik http://lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Question about solr config files encoding.
Sure, I don't have a problem with XML. I'll assume UTF-8 for json and go through the issues later today. Dawid On Thu, Jul 5, 2012 at 5:47 PM, Uwe Schindler u...@thetaphi.de wrote: I just add: Solr's XML files are parsed according to XML spec, so you can choose any charset, you only have to define it according to XML spec! Also XML POST to updatehandler can be any encoding (it does not need to be declared in header anymore, the ?xml... header is fine). There is already a test! I Fixed all this in endless sessions, but I was happy to do it, as my favourite data format is: XML :-) [I refuse to fix this for DIH, but that's another story, SOLR-2347]. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Thursday, July 05, 2012 5:43 PM To: dev@lucene.apache.org Subject: Re: Question about solr config files encoding. On Thu, Jul 5, 2012 at 10:59 AM, Dawid Weiss dawid.we...@gmail.com wrote: According to JSON RFC: http://tools.ietf.org/html/rfc4627#section-3 JSON text SHALL be encoded in Unicode. One of my little pet peeves with the RFC - I think this was a bad requirement. JSON should have been text, and then their should have been an optional way to detect encoding if other mechanisms don't cover it (like HTTP headers, etc). This effectively means that something like [hi] is not valid JSON for many of you reading this email (if your email client is internally representing it as something other than unicode encoded for example). We could just enforce/require UTF-8? Yes, Solr has normally always required/assumed UTF-8 for config files. It's simply an oversight in any places that don't. -Yonik http://lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
have a question on solr query
I have a field DestinationId and it can take values '123 123' or '456' I need the results of rows which not have space in the values. I need the row which has '456' alone to be returned. Can you help. Thanks Premila
Re: have a question on solr query
Your problem statement is kinda sparse on details. Have you looked at the KeywordAnalyzer? If you don't see that as relevant, can you provide some more examples of the kinds of data you expect to put the field and queries that should and should not match? Best Erick On Tue, Apr 12, 2011 at 11:24 AM, Ramamurthy, Premila premila.ramamur...@travelocity.com wrote: I have a field DestinationId and it can take values ‘123 123’ or ‘456’ I need the results of rows which not have space in the values. I need the row which has ‘456’ alone to be returned. Can you help. Thanks Premila
Doc Question for Solr Cell
I was refreshing my mind on the newly updated parameters on Solr Cell, and noticed that the Configuration section on http://wiki.apache.org/solr/ExtractingRequestHandler is out of date. Before I fixed it, I wanted to confirm that requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=ext.map.Last-Modifiedlast_modified/str bool name=ext.ignore.und.fltrue/bool /lst Should be changed to map.Last-Modified only, and that the ignore.und.fl capability is now implemented via uprefix: uprefix=prefix - Prefix all fields that are not defined in the schema with the given prefix. This is very useful when combined with dynamic field definitions. Example: uprefix=ignored_ would effectively ignore all unknown fields generated by Tika given the example schema containsdynamicField name=ignored_* type=ignored/ Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Doc Question for Solr Cell
On Aug 10, 2009, at 5:28 AM, Eric Pugh wrote: I was refreshing my mind on the newly updated parameters on Solr Cell, and noticed that the Configuration section on http://wiki.apache.org/solr/ExtractingRequestHandler is out of date. Before I fixed it, I wanted to confirm that requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=ext.map.Last-Modifiedlast_modified/str bool name=ext.ignore.und.fltrue/bool /lst Should be changed to map.Last-Modified only, and that the ignore.und.fl capability is now implemented via uprefix: uprefix=prefix - Prefix all fields that are not defined in the schema with the given prefix. This is very useful when combined with dynamic field definitions. Example: uprefix=ignored_ would effectively ignore all unknown fields generated by Tika given the example schema containsdynamicField name=ignored_* type=ignored/ That is my understanding, yes.
Re: Doc Question for Solr Cell
On Mon, Aug 10, 2009 at 5:28 AM, Eric Pughep...@opensourceconnections.com wrote: I was refreshing my mind on the newly updated parameters on Solr Cell, and noticed that the Configuration section on http://wiki.apache.org/solr/ExtractingRequestHandler is out of date. Before I fixed it, I wanted to confirm that requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=ext.map.Last-Modifiedlast_modified/str bool name=ext.ignore.und.fltrue/bool /lst Should be changed to map.Last-Modified only, and that the ignore.und.fl capability is now implemented via uprefix: Yep. Before 1.4 is released I had wanted to add good default mappings for common document types along with the fields in the example schema. And then just cut-n-paste the config from the exampe schema. It would be great if you had any recommendations for such default mappings. -Yonik http://www.lucidimagination.com