Aw: Fw: highlighting on hl.alternateField (copyField target) doesnt highlight
Answer to myself: using the solr.KeywordTokenizerFactory and solr.WordDelimiterFilterFactory can preserve the original phone number and can add a token without containing spaces. input: 12345 67890 tokens: 12345 67890, 12345, 67890, 1234567890 Two advantages: I don't need another field and the highlighter works as aspected. Best Regards. Gesendet: Donnerstag, 05. Juni 2014 um 09:14 Uhr Von: jay list jay.l...@web.de An: solr-user@lucene.apache.org Betreff: Fw: highlighting on hl.alternateField (copyField target) doesnt highlight Anybody knowing this issue? Gesendet: Dienstag, 03. Juni 2014 um 09:11 Uhr Von: jay list jay.l...@web.de An: solr-user@lucene.apache.org Betreff: highlighting on hl.alternateField (copyField target) doesnt highlight Hello, im trying to implement a user friendly search for phone numbers. These numbers consist out of two digit-tokens like 12345 67890. Finally I want the highlighting for the phone number in the search result, without any concerns about was this search result hit by field tel or copyField tel2. The field tel is splitted by a StandardTokenizer in two tokens 12345 AND 67890. And I want to catch up those people, who enter 1234567890 without any space. I use copyField tel2 to a solr.PatternReplaceCharFilterFactory to eliminate non digits followed by a solr.KeywordTokenizerFactory. In both cases the search hits as expected. The highlighter works well for tel or tel2, but I want the highlight always on field tel! Using f.tel.hl.alternateField=tel2 is returning the field value wihtout any highlighting. lst name=params str name=qtel2:1234567890/str str name=f.tel.hl.alternateFieldtel2/str str name=hltrue/str str name=hl.requireFieldMatchtrue/str str name=hl.simple.preem/str str name=hl.simple.post/em/str str name=hl.fltel,tel2/str str name=fltel,tel2/str str name=wtxml/str str name=fqtyp:person/str /lst ... result name=response numFound=1 start=0 doc str name=uiduser1/str str name=tel12345 67890/str str name=tels12345 67890/str/doc /result ... lst name=highlighting lst name=user1 arr name=tel str123456 67890/str !-- here should be a highlight -- /arr arr name=tels strem123456 67890/em/str /arr /lst /lst Any idea? Or do I have to change my velocity macros, always looking for a different highlighted field? Best Regards
Can we do conditional boosting using edismax ?
Hi, I'm using edismax parser to perform a runtime boosting. Here's my sample request handler entry. str name=qftext^2 title^3/str str name=bqSource:Blog^3 Source2:Videos^2/str str name=bfrecip(ms(NOW/DAY,PublishDate),3.16e-11,1,1)^2.0/str As you can see, I'm adding weights to text and title, as well as, boosting on source. What I'm trying to see is if there's a way to change the the weights based on Source.E.g. for source Blog, I would like to have the following boost text^3 title^2 while for source Videos , I prefer text^2 title^3. Any pointers will be appreciated. Thanks, Shamik
Re: How Can I modify the DocList and DocSet in solr
Thanks for the reply. I found one solution to modify DocList and DocSet after searching. Look At the following code snippet. private void sortByRecordIDNew(SolrIndexSearcher.QueryResult result, ResponseBuilder rb) throws IOException { DocList docList = result.getDocListAndSet().docList; SortedMapInteger, Integer sortedMap = null; if (projectSort == 0) { sortedMap = new TreeMapInteger, Integer(Collections.reverseOrder()); }else{ sortedMap = new TreeMapInteger, Integer(); } Iterator iterator = docList.iterator(); while (iterator.hasNext()) { int docId = (int) iterator.next(); Document d = rb.req.getSearcher().doc(docId); Integer val = dbData.get(d.get(ID)); // dbData is a map contains the recordId from the database // and the Unique key in schema.xml sortedMap.put(val, docId); } float[] scores = new float[docList.size()]; int[] docs = new int[docList.size()]; int docCounter = 0; int maxScore = 0; IteratorInteger it = sortedMap.keySet().iterator(); while (it.hasNext()) { int recordID = (int) it.next(); int docId = sortedMap.get(recordID); scores[docCounter] = 1.0f; docs[docCounter] = docId; docCounter++; } docList = new DocSlice(0, docCounter, docs, scores, 0, maxScore); result.setDocList(docList); } Call this method from QueryComponent's process method after the searching. In the above code I have sorted the DocList in ascending or descending order depends upon the user requirement. It works for me. -- View this message in context: http://lucene.472066.n3.nabble.com/How-Can-I-modify-the-DocList-and-DocSet-in-solr-tp4140754p4141132.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can we do conditional boosting using edismax ?
Hi Shamik, Yes it is possible with map and query functions. Please see Jan's example : http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/ On Wednesday, June 11, 2014 9:34 AM, Shamik Bandopadhyay sham...@gmail.com wrote: Hi, I'm using edismax parser to perform a runtime boosting. Here's my sample request handler entry. str name=qftext^2 title^3/str str name=bqSource:Blog^3 Source2:Videos^2/str str name=bfrecip(ms(NOW/DAY,PublishDate),3.16e-11,1,1)^2.0/str As you can see, I'm adding weights to text and title, as well as, boosting on source. What I'm trying to see is if there's a way to change the the weights based on Source.E.g. for source Blog, I would like to have the following boost text^3 title^2 while for source Videos , I prefer text^2 title^3. Any pointers will be appreciated. Thanks, Shamik
Re: Performance/scaling with custom function queries
Would Solr use multithreading to process the records of a function query as described above? In my scenario concurrent searches are not the issue, rather the speed of one query will be the optimization target. Or will I have to set up distributed search to achieve that? Thanks, Robert On Tue, Jun 10, 2014 at 10:11 AM, Robert Krüger krue...@lesspain.de wrote: Great, I was hoping for that. In my case I will have to deal with the worst case scenario, i.e. all documents matching the query, because the only criterion is the fingerprint and the result of the distance/similarity function which will have to be executed for every document. However, I am dealing with a scenario where there will not be many concurrent users. Thank you. On Mon, Jun 9, 2014 at 1:57 AM, Joel Bernstein joels...@gmail.com wrote: You only need to have fast access to the fingerprint field so only that field needs to be in memory. You'll want to review how Lucene DocValues and FieldCache work. Sorting is done with a PriorityQueue so only the top N docs are kept in memory. You'll only need to access the fingerprint field values for documents that match the query, so it won't be a full table scan unless all the docs match the query. Sounds like an interesting project. Please keep us posted. Joel Bernstein Search Engineer at Heliosearch On Sun, Jun 8, 2014 at 6:17 AM, Robert Krüger krue...@lesspain.de wrote: Hi, let's say I have an index that contains a field of type BinaryField called fingerprint that stores a few (let's say 100) bytes that are some kind of digital fingerprint-like thing. Let's say I want to perform queries on that field to achieve sorting or filtering based on a kind of custom distance function customDistance, i.e. I input a reference fingerprint and Solr returns either all documents sorted by customDistance(referenceFingerprint,documentFingerprint) or use that in an frange expression for filtering. I have read http://wiki.apache.org/solr/SolrPerformanceFactors and I do understand that using function queries with a custom function is definitely an expensive thing as it will result in what is called a full table scan in the sql world, i.e. data from all documents needs to be touched to select the correct documents or sort by a function's result. Given all that and provided, I have to use a custom function for my needs, I would like to know a few more details about solr architecture to understand what I have to look out for. I will have potentially millions of records. Does the data contained in other index fields play a role when I only use the fingerprint field for sorting and searching when it comes to RAM usage? I am hoping to calculate that my RAM should be able to accommodate the fingerprint data of all available documents for the queries to be fast but not fingerprint data and all other indexed or stored data. Example: My fingerprint data needs 100bytes per document, my other indexed field data needs 900 bytes per document. Will I need 100MB or 1GB to fit all data that is needed to process one query in memory? Are there other things to be aware of? Thanks, Robert -- Robert Krüger Managing Partner Lesspain GmbH Co. KG www.lesspain-software.com -- Robert Krüger Managing Partner Lesspain GmbH Co. KG www.lesspain-software.com
Re: Documents Added Not Available After Commit (Both Soft and Hard)
Thanks for the input! Erick - To clarify, we see the No Uncommitted Changes message repeatedly for a number of commits (not a consistent number each time this happens) and then eventually we see a commit that successfully finds changes, at which point the documents are available. Shalin - That bug looks like it could be related to our case, did you notice any impact of the bug in situations where there were not just pending deletes by term? In our case, we are adding documents, we do have some deletes, but the bulk are adds. We can see the logging of the adds in the solr log prior to seeing the No Uncommitted Changes message. Either way, it may be useful for us to upgrade and see if it fixes the issue. I'll let you know if that works out once we get a chance to do that. Thanks, Justin On Mon, Jun 9, 2014 at 3:02 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: I think this may be the same bug as LUCENE-5289 which was fixed in 4.5.1. Can you upgrade to 4.5.1 and see if that solves the problem? On Fri, Jun 6, 2014 at 7:17 PM, Justin Sweeney justin.sweene...@gmail.com wrote: Hi, An application I am working on indexes documents to a Solr index. This Solr index is setup as a single node, without any replication. This index is running Solr 4.5.0. We have noticed an issue lately that is causing some problems for our application. The problem is that we add/update a number of documents in the Solr index and we have the index setup to autoCommit (hard) once every 30 minutes. In the Solr logs, I am able to see the add command to Solr and I can also see Solr start the hard commit. When this hard commit occurs, we see the following message: INFO - 2014-06-04 20:13:55.135; org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes. Skipping IW.commit. This only happens sometimes, but Solr will go hours (we have seen 6-12 hours of this behavior) before it does a hard commit where it find changes. After the hard commit where the changes are found, we are then able to search for and find the documents that were added hours ago, but up until that point the documents are not searchable. We tried enabling autoSoftCommit every 5 minutes in the hope that this would help, but we are seeing the same behavior. Here is a sampling of the logs showing this occurring (I've trimmed it down to just show what is happening): INFO - 2014-06-05 20:00:41.300; org.apache.solr.update.processor.LogUpdateProcessor; [zoomCollection] webapp=/solr path=/update params={wt=javabinversion=2} {add=[359453225]} 0 0 INFO - 2014-06-05 20:00:41.376; org.apache.solr.update.processor.LogUpdateProcessor; [zoomCollection] webapp=/solr path=/update params={wt=javabinversion=2} {add=[347170717]} 0 1 INFO - 2014-06-05 20:00:51.527; org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} INFO - 2014-06-05 20:00:51.533; org.apache.solr.search.SolrIndexSearcher; Opening Searcher@257c43d main INFO - 2014-06-05 20:00:51.533; org.apache.solr.update.DirectUpdateHandler2; end_commit_flush INFO - 2014-06-05 20:00:51.545; org.apache.solr.core.QuerySenderListener; QuerySenderListener sending requests to Searcher@257c43d main{StandardDirectoryReader(segments_acl:1367002775953 _2f28(4.5):C13583563/4081507 _2gl6(4.5):C2754573/193533 _2g21(4.5):C1046256/296354 _2ge2(4.5):C835858/206139 _2gqd(4.5):C383500/31051 _2gmu(4.5):C125197/32491 _2grl(4.5):C46906/1255 _2gpj(4.5):C66480/16562 _2gra(4.5):C364/22 _2gr1(4.5):C36064/2556 _2gqg(4.5):C42504/21515 _2gqm(4.5):C26821/12659 _2gqu(4.5):C24172/10240 _2gqy(4.5):C697/215 _2gr2(4.5):C878/352 _2gr7(4.5):C28135/11775 _2gr9(4.5):C3276/1341 _2grb(4.5):C5/1 _2grc(4.5):C3247/1219 _2grd(4.5):C6/1 _2grf(4.5):C5/2 _2grg(4.5):C23659/10967 _2grh(4.5):C1 _2grj(4.5):C1 _2grk(4.5):C5160/1482 _2grm(4.5):C1210/351 _2grn(4.5):C3957/1372 _2gro(4.5):C7734/2207 _2grp(4.5):C220/36)} INFO - 2014-06-05 20:00:51.546; org.apache.solr.core.SolrCore; [zoomCollection] webapp=null path=null params={event=newSearcherq=d_name:ibmdistrib=false} hits=38 status=0 QTime=0 INFO - 2014-06-05 20:00:51.546; org.apache.solr.core.QuerySenderListener; QuerySenderListener done. INFO - 2014-06-05 20:00:51.547; org.apache.solr.core.SolrCore; [zoomCollection] Registered new searcher Searcher@257c43d main{StandardDirectoryReader(segments_acl:1367002775953 _2f28(4.5):C13583563/4081507 _2gl6(4.5):C2754573/193533 _2g21(4.5):C1046256/296354 _2ge2(4.5):C835858/206139 _2gqd(4.5):C383500/31051 _2gmu(4.5):C125197/32491 _2grl(4.5):C46906/1255 _2gpj(4.5):C66480/16562 _2gra(4.5):C364/22 _2gr1(4.5):C36064/2556 _2gqg(4.5):C42504/21515 _2gqm(4.5):C26821/12659 _2gqu(4.5):C24172/10240 _2gqy(4.5):C697/215
Hunspell inaccuracies with Solr 4.8.1 and french dictionnaries
Hello, I just moved from Solr 4.6 to Solr 4.8.1 and I notice differences in the way Hunspell work. Some changes are fixes (due to https://issues.apache.org/jira/browse/LUCENE-5483 I assume) but other changes look like regressions. To check this, I have compared the results obtained in the Analysis tab of Solr admin and the results obtained used hunspell -m command with the same dictionaries. Command line results: $ hunspell -m -d /DATA/solr-adscope-fr/adscope-fr/conf/fr-moderne bricolait bricolait st:bricoler po:v1it is:iimp is:3sg instituteur instituteur st:institutrice po:nom is:mas is:sg Solr Analysis tab results (I'm using HunspellStemFilterFactory) bricolait - bricolait instituteur - instituteur The dictionary and affix file are available at this address: http://www.dicollecte.org/download.php?prj=fr As shown above, the words bricolait and instituteur are correctly stemmed in command line but not with Solr filter. These examples were working correctly with Solr 4.6. Is it something I should open a JIRA issue about? Thanks, Benoît. Ce message et les pièces jointes sont confidentiels et réservés à l'usage exclusif de ses destinataires. Il peut également être protégé par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra être recherchée quant au contenu de ce message. Bien que les meilleurs efforts soient faits pour maintenir cette transmission exempte de tout virus, l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne saurait être recherchée pour tout dommage résultant d'un virus transmis. This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Worldline liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted.
Hunspell inaccuracies with Solr 4.8.1 and french dictionnaries
Hello, I just moved from Solr 4.6 to Solr 4.8.1 and I notice differences in the way Hunspell work. Some changes are fixes (due to https://issues.apache.org/jira/browse/LUCENE-5483 I assume) but other changes look like regressions. To check this, I have compared the results obtained in the Analysis tab of Solr admin and the results obtained used hunspell -m command with the same dictionaries. Command line results: $ hunspell -m -d /DATA/solr-adscope-fr/adscope-fr/conf/fr-moderne bricolait bricolait st:bricoler po:v1it is:iimp is:3sg instituteur instituteur st:institutrice po:nom is:mas is:sg Solr Analysis tab results (I'm using HunspellStemFilterFactory) bricolait - bricolait instituteur - instituteur The dictionary and affix file are available at this address: http://www.dicollecte.org/download.php?prj=fr As shown above, the words bricolait and instituteur are correctly stemmed in command line but not with Solr filter. These examples were working correctly with Solr 4.6. Is it something I should open a JIRA issue about? Thanks, Benoît. Ce message et les pièces jointes sont confidentiels et réservés à l'usage exclusif de ses destinataires. Il peut également être protégé par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra être recherchée quant au contenu de ce message. Bien que les meilleurs efforts soient faits pour maintenir cette transmission exempte de tout virus, l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne saurait être recherchée pour tout dommage résultant d'un virus transmis. This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Worldline liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted.
Non-Heap OOM Error with Small Index Size
While running a Solr-based Web application on Tomcat 6, we have been repeatedly running into Out of Memory issues. However, these OOM errors are not related to the Java heap. A snapshot of our Solr dashboard just before the OOM error reported: Physical memory: 7.13/7.29 GB JVM-Memory: 57.90 MB - 3.05 GB - 3.56 GB In addition, the top command displays: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 25382 tomcat20 0 9716m 6.8g 175m S 47.9 92.6 196:13.70 java We're unsure as to why the physical memory usage is so much higher than the JVM usage, especially given that the size of our index is roughly 500 MB. We were originally using OpenJDK, and we tried switching to Oracle JDK with no luck. Is it normal for physical memory usage to be this high? We do not want to upgrade our RAM if the problem is really just an error in the configuration. I've attached environment info below, as well as an excerpt of the latest OOM error report. Thank you very much in advance. Kind regards, Michael Additional info about our application: We index documents from a remote location by retrieving them via a REST API. The entire remote repository is crawled at regular intervals by our application. Twenty-five documents are loaded at a time (the page size provided by the API), and we manually commit each set of twenty-five documents. We do have auto-commit (but not auto-soft-commit) enabled with a time of 60s, but an auto-commit has never actually occurred. Solr Info: Solr 4.8.0 524 MB Index Size 31 Fields Just under 3000 documents Directory factory is MMapDirectory Caches enabled with default settings/size limits Selected JVM Arguments: -XX:MaxPermSize=128m -Dorg.apache.pdfbox.baseParser.pushBackSize=524288 -Xmx4096m -Xms1024m Environment: 64-bit AWS EC2 running CentOS 6.5 Tomcat 6.0.24 7.5 GB RAM Tried using both Oracle JDK 1.7.0_60 and Open JDK OOM Log Entry: OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x000773c8, 366477312, 0) failed; error='Cannot allocate memory' (errno=12) # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (malloc) failed to allocate 366477312 bytes for committing reserved memory. # An error report file with more information is saved as: # /tmp/jvm-18372/hs_error.log OOM Error Report Snippet: # Native memory allocation (malloc) failed to allocate 366477312 bytes for committing reserved memory. # Possible reasons: # The system is out of physical RAM or swap space # In 32 bit mode, the process size limit was hit # Possible solutions: # Reduce memory load on the system # Increase physical memory or swap space # Check if swap backing store is full # Use 64 bit Java on a 64 bit OS # Decrease Java heap size (-Xmx/-Xms) # Decrease number of Java threads # Decrease Java thread stack sizes (-Xss) # Set larger code cache with -XX:ReservedCodeCacheSize= # This output file may be truncated or incomplete. # # Out of Memory Error (os_linux.cpp:2769), pid=18372, tid=140031038150400 # # JRE version: OpenJDK Runtime Environment (7.0_55-b13) (build 1.7.0_55-mockbuild_2014_04_16_12_11-b00) # Java VM: OpenJDK 64-Bit Server VM (24.51-b03 mixed mode linux-amd64 compressed oops) -- View this message in context: http://lucene.472066.n3.nabble.com/Non-Heap-OOM-Error-with-Small-Index-Size-tp4141175.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Performance/scaling with custom function queries
On Wed, Jun 11, 2014 at 7:46 AM, Robert Krüger krue...@lesspain.de wrote: Or will I have to set up distributed search to achieve that? Yes — you have to shard it to achieve that. The shards could be on the same node. There were some discussions this year in JIRA about being able to do thread-per-segment but it’s not quite there yet. FWIW I think it would be a nice option for some use-cases (like yours). ~ David Smiley Freelance Apache Lucene/Solr Search Consultant/Developer http://www.linkedin.com/in/davidwsmiley
Re: Performance/scaling with custom function queries
In Solr 4.9 there is a feature called RankQueries, that allows you to plugin your own ranking collector. So, if you wanted to write a ranking/sorting collector that used a thread per segment, you could cleanly plug it in. Joel Bernstein Search Engineer at Heliosearch On Wed, Jun 11, 2014 at 9:39 AM, david.w.smi...@gmail.com david.w.smi...@gmail.com wrote: On Wed, Jun 11, 2014 at 7:46 AM, Robert Krüger krue...@lesspain.de wrote: Or will I have to set up distributed search to achieve that? Yes — you have to shard it to achieve that. The shards could be on the same node. There were some discussions this year in JIRA about being able to do thread-per-segment but it’s not quite there yet. FWIW I think it would be a nice option for some use-cases (like yours). ~ David Smiley Freelance Apache Lucene/Solr Search Consultant/Developer http://www.linkedin.com/in/davidwsmiley
Solr search
Hi, Any suggestion for tokenizer / filter / other solutions that support search in Solr as following - Use Case Input Solr should return All Results * All results Prefix Search Text* All data started by Text* (Prefix search) Exact Search Auto Text Exact match. Only Auto Text Partial (substring) *Text* All strings contains the text Now I'm using KeywordTokenizerFactory and WordDelimiterFilterFactory. My issue is with exact search: When I have document named hello_world, and I'm trying to do Exact Search of hello, I got hello_world as a result (I want to get only hello named documents). Thanks in advance, Shay.
Re: Solr search
Hi, Any suggestion for tokenizer / filter / other solutions that support search in Solr as following - Use Case Input Solr should return All Results * All results Prefix Search Text* All data started by Text* (Prefix search) Exact Search Auto Text Exact match. Only Auto Text Partial (substring) *Text* All strings contains the text Now I'm using KeywordTokenizerFactory and WordDelimiterFilterFactory. My issue is with exact search: When I have document named hello_world, and I'm trying to do Exact Search of hello, I got hello_world as a result (I want to get only hello named doccuments). The WordDelimeterFilter will split on the underscore, which means that the term hello is in the index for that document. Leave that filter out if you really do want an exact match. Searching for * by itself is not how you match all documents. It may work, but it is a wildcard search, which means under the covers that it's a search for every term in the index for that field. It's SLOW. The special shortcut *:* (this must be the entire query with no field name, and I'm assuming the standard query parser here) is what you want for all documents. In terms of user input, this is what you want to use when the user leaves the search box empty. If you're using dismax or edismax, then you would send an empty q parameter or leave it off entirely, and define a default q.alt parameter in solrconfig.xml, set to *:* for all docs. Thanks, Shawn
How to retrieve entire field value (text_general) in custom function?
I have a text_general field and want to use its value in a custom function. I'm unable to do so. It seems that the tokenizer messes this up and only a fraction of the entire value is being retrieved. See below for more details. doc str name=id1/str str name=field_tterm1 term2 term3/str long name=_version_1470628088879513600/long/doc doc str name=id 2/str str name=field_tx1 x2 x3/str long name=_version_ 1470628088907825152/long/doc public class MyFunction extends ValueSource { @Override public FunctionValues getValues(Map context, AtomicReaderContext readerContext) throws IOException { final FunctionValues values = valueSource.getValues(context, readerContext); return new StrDocValues(this) { @Override public String strVal(int doc) { return values.strVal(doc); } }; } } Tried with SOLR 4.8.1. Function returns: - term3 (for first document) - null (for the second document) I want the function to return: - term1 term2 term3 (for first document) - x1 x2 x3 (for the second document) How can I achieve this? I tried to google it but no luck. I also looked through the SOLR code but could not find something similar. Thanks! Costi
Problem faceting
Hello everyone. I’m having problems with the performance of queries with facets, the temp expend to resolve a query is very high. The index has 10Millions of documents, each one with 100 fields. The server has 8 cores and 56 Gb of ram, running with jetty with this memory configuration: -Xms24096m -Xmx44576m When I do a query, with 20 facets, the time expended is 4 – 5 seconds. If the same request is did another time, the Debug query first execution: double name=time6037.0/doublelst name=querydouble name=time265.0/double/lstlst name=facetdouble name=time5772.0/double/lst Debug query seconds executions: double name=time6037.0/doublelst name=querydouble name=time1.0/double/lstlst name=facetdouble name=time4872.0/double/lst What can I do? Why the facets are not cached? Thank you, Marcos
Re: How to retrieve entire field value (text_general) in custom function?
On 6/11/2014 9:30 AM, Costi Muraru wrote: I have a text_general field and want to use its value in a custom function. I'm unable to do so. It seems that the tokenizer messes this up and only a fraction of the entire value is being retrieved. See below for more details. Low-level Lucene details are where my knowledge falls extremely short ... but if you are accessing data in the index itself, you're going to get terms, not the original value. You need to access the stored data or docValues to see the full original text. I can't answer the question of whether or not this is something that is accessible (or even makes sense) at the level where your custom code lives, because I simply don't understand those details. Thanks, Shawn
moving to new core.properties setup
I have configured many tomcat+solrCloud setups but I'm trying now to research the new solr.properties configuration. I have a functioning zookeeper to which I manually loaded a configuration using: zkcli.sh -cmd upconfig \ -zkhost xx.xx.xx.xx:2181 \ -d /test/conf \ -n test My solr.xml looks like: solr str name=coreRootDirectory/test/data/str bool name=sharedSchematrue/bool solrcloud str name=host${host:}/str int name=hostPort8080/int str name=hostContext${hostContext:/test}/str int name=zkClientTimeout${zkClientTimeout:3}/int str name=zkhostxx.xx.xx.xx:2181/str /solrcloud shardHandlerFactory name=shardHandlerFactory class=HttpShardHandlerFactory int name=socketTimeout${socketTimeout:0}/int int name=connTimeout${connTimeout:0}/int /shardHandlerFactory /solr ... all fine. I start tomcat and I see Loading container configuration from /test/solr.xml [...] Looking for core definitions underneath /test/data Found 0 core definitions which is anticipated as I have not created any cores or collections. Then, trying to create a collection wget -O- \ 'http://xx.xx.xx.xx/test/admin/collections?action=CREATEname=testCollectionnumShards=1replicationFactor=1maxShardsPerNode=1collection.config=testproperty.dataDir=/test/data/testCollectionproperty.instanceDir=/test' I get: org.apache.solr.common.SolrException: Solr instance is not running in SolrCloud mode. Hrmmm, here I am confused. I have a working zookeeper, I have a loaded configuration, I have an empty data directory (no collections, cores, core.properties etc) and I have specified the zkHost configuration parameter in my solr.xml (yes, IP:port is correct) What exactly am I missing? thanks for the help. David
Re: Can we do conditional boosting using edismax ?
Thanks Ahmet, I'll give it a shot. -- View this message in context: http://lucene.472066.n3.nabble.com/Can-we-do-conditional-boosting-using-edismax-tp4141131p4141268.html Sent from the Solr - User mailing list archive at Nabble.com.
Implementing Hive query in Solr
Hi, My requirements is to execute this query(hive) in solr: select SUM(Primary_cause_vaR),collect_set(skuType),RiskType,market, collect_set(primary_cause) from bil_tos Where skuType='Product' group by RiskType,market; I can implement sum and groupBy operations in solr using StatsComponent concept but i've no idea to implement collect_set() in solr. Collect_set() is used in Hive queries. Please provide me equivalent function for collect_set in solr or links or how to achieve it. It'd be a great help. Thanks, Vivek