Re: SolrCloud setup - any advice?
Good point. I'd seen docValues and wondered whether they might be of use in this situation. However, as I understand it they require a value to be set for all documents until Solr 4.5. Is that true or was I imagining reading that? On 25 September 2013 11:36, Erick Erickson erickerick...@gmail.com wrote: H, I confess I haven't had a chance to play with this yet, but have you considered docValues for some of your fields? See: http://wiki.apache.org/solr/DocValues And just to tantalize you: Since Solr4.2 to build a forward index for a field, for purposes of sorting, faceting, grouping, function queries, etc. You can specify a different docValuesFormat on the fieldType (docValuesFormat=Disk) to only load minimal data on the heap, keeping other data structures on disk. Do note, though: Not a huge improvement for a static index this latter isn't a problem though since you don't have a static index Erick On Tue, Sep 24, 2013 at 4:13 AM, Neil Prosser neil.pros...@gmail.com wrote: Shawn: unfortunately the current problems are with facet.method=enum! Erick: We already round our date queries so they're the same for at least an hour so thankfully our fq entries will be reusable. However, I'll take a look at reducing the cache and autowarming counts and see what the effect on hit ratios and performance are. For SolrCloud our soft commit (openSearcher=false) interval is 15 seconds and our hard commit is 15 minutes. You're right about those sorted fields having a lot of unique values. They can be any number between 0 and 10,000,000 (it's sparsely populated across the documents) and could appear in several variants across multiple documents. This is probably a good area for seeing what we can bend with regard to our requirements for sorting/boosting. I've just looked at two shards and they've each got upwards of 1000 terms showing in the schema browser for one (potentially out of 60) fields. On 21 September 2013 20:07, Erick Erickson erickerick...@gmail.com wrote: About caches. The queryResultCache is only useful when you expect there to be a number of _identical_ queries. Think of this cache as a map where the key is the query and the value is just a list of N document IDs (internal) where N is your window size. Paging is often the place where this is used. Take a look at your admin page for this cache, you can see the hit rates. But, the take-away is that this is a very small cache memory-wise, varying it is probably not a great predictor of memory usage. The filterCache is more intense memory wise, it's another map where the key is the fq clause and the value is bounded by maxDoc/8. Take a close look at this in the admin screen and see what the hit ratio is. It may be that you can make it much smaller and still get a lot of benefit. _Especially_ considering it could occupy about 44G of memory. (43,000,000 / 8) * 8192 And the autowarm count is excessive in most cases from what I've seen. Cutting the autowarm down to, say, 16 may not make a noticeable difference in your response time. And if you're using NOW in your fq clauses, it's almost totally useless, see: http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/ Also, read Uwe's excellent blog about MMapDirectory here: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html for some problems with over-allocating memory to the JVM. Of course if you're hitting OOMs, well. bq: order them by one of their fields. This is one place I'd look first. How many unique values are in each field that you sort on? This is one of the major memory consumers. You can get a sense of this by looking at admin/schema-browser and selecting the fields you sort on. There's a text box with the number of terms returned, then a / ### where ### is the total count of unique terms in the field. NOTE: in 4.4 this will be -1 for multiValued fields, but you shouldn't be sorting on those anyway. How many fields are you sorting on anyway, and of what types? For your SolrCloud experiments, what are your soft and hard commit intervals? Because something is really screwy here. Your sharding moving the number of docs down this low per shard should be fast. Back to the point above, the only good explanation I can come up with from this remove is that the fields you sort on have a LOT of unique values. It's possible that the total number of unique values isn't scaling with sharding. That is, each shard may have, say, 90% of all unique terms (number from thin air). Worth checking anyway, but a stretch. This is definitely unusual... Best, Erick On Thu, Sep 19, 2013 at 8:20 AM, Neil Prosser neil.pros...@gmail.com wrote: Apologies for the giant email. Hopefully it makes sense. We've been trying out SolrCloud to solve some scalability issues with our current
Re: cold searcher
Thanks Shawn, the master-slave setup is something that requires separate study as our update rate is more of bulk type than small incremental bits (at least at this point). But thanks, this background information always useful. On Thu, Sep 26, 2013 at 10:52 PM, Shawn Heisey s...@elyograg.org wrote: On 9/26/2013 10:56 AM, Dmitry Kan wrote: Btw, related to master-slave setup. What makes read-only slave not to come across the same issue? Would it not pull data from the master and warm up searchers? Or does it do updates in a more controlled fashion that makes it avoid these issues? Most people have the slave pollInterval configured on an interval that's pretty long, like 15 seconds to several minutes -- much longer than a typical searcher warming time. For a slave, new searchers are only created when there is a change copied over from the master. There may be several master-side commits that happen during the pollInterval, but the slave won't see all of those. Thanks, Shawn
Re: cold searcher
Erick, I actually agree and we are looking into bundling commits into a batch type update with soft-commits serving the batches and hard commit kicking in larger periods of time. In practice, we have already noticed the periodic slow downs in search for exactly same queries before and after commit points. To describe it briefly: the queries that used to take lots of time to execute on solr 3.4 now execute super-fast whereas during the periodic slow downs they execute as slow as on solr 3.4. I bet there is a dependency, as you said, between the several searchers warmup and caches flushing. Thanks, Dmitry On Fri, Sep 27, 2013 at 3:44 AM, Erick Erickson erickerick...@gmail.comwrote: Upping the number of concurrent warming searchers is almost always the wrong thing to do. I'd lengthen the polling interval or the commit interval. Throwing away warming searchers is uselessly consuming resources. And if you're trying to do any filter queries, your caches will almost never be used since you're throwing them away so often. Best, Erick On Thu, Sep 26, 2013 at 3:52 PM, Shawn Heisey s...@elyograg.org wrote: On 9/26/2013 10:56 AM, Dmitry Kan wrote: Btw, related to master-slave setup. What makes read-only slave not to come across the same issue? Would it not pull data from the master and warm up searchers? Or does it do updates in a more controlled fashion that makes it avoid these issues? Most people have the slave pollInterval configured on an interval that's pretty long, like 15 seconds to several minutes -- much longer than a typical searcher warming time. For a slave, new searchers are only created when there is a change copied over from the master. There may be several master-side commits that happen during the pollInterval, but the slave won't see all of those. Thanks, Shawn
Re: ALIAS feature, can be used for what?
I need delete the alias for the old collection before point it to the new, right? -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, September 27, 2013 at 2:25 AM, Otis Gospodnetic wrote: Hi, Imagine you have an index and you need to reindex your data into a new index, but don't want to have to reconfigure or restart client apps when you want to point them to the new index. This is where aliases come in handy. If you created an alias for the first index and made your apps hit that alias, then you can just repoint the same alias to your new index and avoid having to touch client apps. No, I don't think you can write to multiple collections through a single alias. Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Thu, Sep 26, 2013 at 6:34 AM, yriveiro yago.rive...@gmail.com (mailto:yago.rive...@gmail.com) wrote: Today I was thinking about the ALIAS feature and the utility on Solr. Can anyone explain me with an example where this feature may be useful? It's possible have an ALIAS of multiples collections, if I do a write to the alias, Is this write replied to all collections? /Yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/ALIAS-feature-can-be-used-for-what-tp4092095.html Sent from the Solr - User mailing list archive at Nabble.com (http://Nabble.com).
Can i trust the order of how documents are received in solrcloud?
Hi, i am a new user of solrcloud, and i am wandering if this scenario could happen: in a Shard, i have three machine: leader, replica1, replica2 replica1 received a document D, and right after that, replica2 received an updated version of D, let's called it D' they all tried to forward their documents to the leader, who will generate version numbers for the documents, and then distributes them to replicas. it's possible that the leader could receive D' prior to D? so that D' gets overrided? thanks a lot! -- View this message in context: http://lucene.472066.n3.nabble.com/Can-i-trust-the-order-of-how-documents-are-received-in-solrcloud-tp4092322.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Doing time sensitive search in solr
If your different strings have different semantics (date, etc), you may need to split your entries based on that semantics. Either have the 'entity' represent one 'string-date' structure or have additional field that represents content searchable during that specific period and only have one with all the strings as stored (if you absolutely need it). Search for Gilt's presentation on Solr, they deal with some of the similar issues (flash sales). Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Sep 27, 2013 at 6:52 AM, Darniz rnizamud...@edmunds.com wrote: hello Users, i have a requirement where my content should be search based upon time. For example below is our content in our cms. entry start-date=1-sept-2013 Sept content : Honda is releasing the car this month entry entry start-date=1-dec-2013 Dec content : Toyota is releasing the car this month entry On the website based upon time we display the content. On the solr side, until now we were indexing all entries element in Solr in text field. Now after we introduced time sensitive information in our cms, i need to know if someone queries for word Toyota it should NOT come up in my search results since that content is going live in dec. The solr text field looks something like arr name=text strHonda is releasing the car this month/str strToyota is releasing this month/str /arr is there a way we can search the text field or append any meta data to the text field based on date. i hope i have made the issue clear. i kind of don't agree with this kind of practice but our requirement is pretty peculiar since we don't want to reindex data again and again. -- View this message in context: http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception
i removed the FieldReaderDataSource and dataSource=fld but it didn't help. i get the following for each document: DataImportHandlerException: Exception in invoking url null Processing Document # 9 nullpointerexception On 26. Sep 2013, at 8:39 PM, P Williams wrote: Hi, Haven't tried this myself but maybe try leaving out the FieldReaderDataSource entirely. From my quick searching looks like it's tied to SQL. Did you try copying the http://wiki.apache.org/solr/TikaEntityProcessor Advanced Parsing example exactly? What happens when you leave out FieldReaderDataSource? Cheers, Tricia On Thu, Sep 26, 2013 at 4:17 AM, Andreas Owen a...@conx.ch wrote: i'm using solr 4.3.1 and the dataimporter. i am trying to use XPathEntityProcessor within the TikaEntityProcessor for indexing html-pages but i'm getting this error for each document. i have also tried dataField=tika.text and dataField=text to no avail. the nested XPathEntityProcessor detail creates the error, the rest works fine. what am i doing wrong? error: ERROR - 2013-09-26 12:08:49.006; org.apache.solr.handler.dataimport.SqlEntityProcessor; The query failed 'null' java.lang.ClassCastException: java.io.StringReader cannot be cast to java.util.Iterator at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:179) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:365) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at
Pubmed XML indexing
Hi, I'm a newby trying to index PubMed texts obtained as xml with similar structure to: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmedretmode=xmlid=23864173,22073418 The nodes I need to extract, expressed as XPaths would be: //PubmedArticle/MedlineCitation/PMID //PubmedArticle/MedlineCitation/DateCreated/Year //PubmedArticle/MedlineCitation/Article/ArticleTitle //PubmedArticle/MedlineCitation/Article/Abstract/AbstractText //PubmedArticle/MedlineCitation/MeshHeadingList/MeshHeading I think a way to index them in Solr is to create another xml structure similar to: add doc field name=idPMID/field field name=year_iYear/field field name=nameArticleTitle/field field name=abstract_sAbstractText/field field name=catMeshHeading1/field field name=catMeshHeading2/field /doc /add Being PMID = '23864173' and ArticleTitle = 'Cost-effectiveness of low-molecular-weight heparin compared with aspirin for prophylaxis against venous thromboembolism after total joint arthroplasty' and so on. With that structure I would post it to Solr using the following statement over the documents folder java -jar post.jar *.xml I'm wondering if is there a more direct way to perform the same task that does not imply a 'iterate-parsing-restructure-write to disk-post' cycle Many thanks Francisco
Re: Sum function causing error in solr
Yes jack. have tried this. but giving the same error. -- View this message in context: http://lucene.472066.n3.nabble.com/Sum-function-causing-error-in-solr-tp4091901p4092307.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sum function causing error in solr
tried this as well. but its not working. -- View this message in context: http://lucene.472066.n3.nabble.com/Sum-function-causing-error-in-solr-tp4091901p4092306.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: autocomplete_edge type split words
Thanks for your answer. So I guess if someone wants to search on two fields, on with phrase query and one with normal query (splitted in words), one has to find a way to send query twice: one with quote and one without... Best regards, Elisabeth 2013/9/27 Erick Erickson erickerick...@gmail.com This is a classic issue where there's confusion between the query parser and field analysis. Early in the process the query parser has to take the input and break it up. that's how, for instance, a query like text:term1 term2 gets parsed as text:term1 defaultfield:term2 This happens long before the terms get to the analysis chain for the field. So your only options are to either quote the string or escape the spaces. Best, Erick On Wed, Sep 25, 2013 at 9:24 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I am using solr 4.2.1 and I have a autocomplete_edge type defined in schema.xml fieldType name=autocomplete_edge class=solr.TextField analyzer type=index charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=\s+ replacement= replace=all/ filter class=solr.EdgeNGramFilterFactory maxGramSize=30 minGramSize=1/ /analyzer analyzer type=query charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=\s+ replacement= replace=all/ filter class=solr.PatternReplaceFilterFactory pattern=^(.{30})(.*)? replacement=$1 replace=all/ /analyzer /fieldType When I have a request with more then one word, for instance rue de la, my request doesn't match with my autocomplete_edge field unless I use quotes around the query. In other words q=rue de la doesnt work and q=rue de la works. I've check the request with debugQuery=on, and I can see in first case, the query is splitted into words, and I don't understand why since my field type uses KeywordTokenizerFactory. Does anyone have a clue on how I can request my field without using quotes? Thanks, Elisabeth
Re: Pubmed XML indexing
Did you look at dataImportHandler? There is also Flume, I think. Regards, Alex On 27 Sep 2013 17:28, Francisco Fernandez fra...@gmail.com wrote: Hi, I'm a newby trying to index PubMed texts obtained as xml with similar structure to: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmedretmode=xmlid=23864173,22073418 The nodes I need to extract, expressed as XPaths would be: //PubmedArticle/MedlineCitation/PMID //PubmedArticle/MedlineCitation/DateCreated/Year //PubmedArticle/MedlineCitation/Article/ArticleTitle //PubmedArticle/MedlineCitation/Article/Abstract/AbstractText //PubmedArticle/MedlineCitation/MeshHeadingList/MeshHeading I think a way to index them in Solr is to create another xml structure similar to: add doc field name=idPMID/field field name=year_iYear/field field name=nameArticleTitle/field field name=abstract_sAbstractText/field field name=catMeshHeading1/field field name=catMeshHeading2/field /doc /add Being PMID = '23864173' and ArticleTitle = 'Cost-effectiveness of low-molecular-weight heparin compared with aspirin for prophylaxis against venous thromboembolism after total joint arthroplasty' and so on. With that structure I would post it to Solr using the following statement over the documents folder java -jar post.jar *.xml I'm wondering if is there a more direct way to perform the same task that does not imply a 'iterate-parsing-restructure-write to disk-post' cycle Many thanks Francisco
Solr Commit Time
Hi, What would be the maximum commit time for indexing 1 lakh documents in solr on a 32 gb machine. Thanks, Prasi
Re: Sum function causing error in solr
On Fri, Sep 27, 2013 at 2:28 AM, Tanu Garg tanugarg2...@gmail.com wrote: tried this as well. but its not working. It's working fine for me. What version of Solr are you using? What does your complete request look like? -Yonik http://lucidworks.com
Re: Pubmed XML indexing
You might be interested in Lux (http://luxdb.org), which is designed for indexing and querying XML using Solr and Lucene. It can run index-supported XPath/XQuery over your documents, and you can define arbitrary XPath indexes. -Mike On 9/27/13 6:28 AM, Francisco Fernandez wrote: Hi, I'm a newby trying to index PubMed texts obtained as xml with similar structure to: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmedretmode=xmlid=23864173,22073418 The nodes I need to extract, expressed as XPaths would be: //PubmedArticle/MedlineCitation/PMID //PubmedArticle/MedlineCitation/DateCreated/Year //PubmedArticle/MedlineCitation/Article/ArticleTitle //PubmedArticle/MedlineCitation/Article/Abstract/AbstractText //PubmedArticle/MedlineCitation/MeshHeadingList/MeshHeading I think a way to index them in Solr is to create another xml structure similar to: add doc field name=idPMID/field field name=year_iYear/field field name=nameArticleTitle/field field name=abstract_sAbstractText/field field name=catMeshHeading1/field field name=catMeshHeading2/field /doc /add Being PMID = '23864173' and ArticleTitle = 'Cost-effectiveness of low-molecular-weight heparin compared with aspirin for prophylaxis against venous thromboembolism after total joint arthroplasty' and so on. With that structure I would post it to Solr using the following statement over the documents folder java -jar post.jar *.xml I'm wondering if is there a more direct way to perform the same task that does not imply a 'iterate-parsing-restructure-write to disk-post' cycle Many thanks Francisco
Solr client 'Timeout was reached' ~ when new documents are inserted and commits are made.
Hi All. I have a solr 3.5 multicore installation. It has ~250 of documents, ~1,5GB of index data. When the solr is feed with new documents I see for a few seconds timeouts 'Timeout was reached' on clients. Is it normal behaviour of solr during inserting of new documents? Best regards, Rafał Radecki.
Re: ContributorsGroup
Stefan is more thorough than me, I'd have added the wrong name :) Thanks for volunteering! Erick On Thu, Sep 26, 2013 at 9:17 PM, JavaOne javaone...@yahoo.com wrote: Yes - that is me. mikelabib is my Jira user. Thanks for asking. Sent from my iPhone On Sep 26, 2013, at 7:32 PM, Erick Erickson erickerick...@gmail.com wrote: Hmmm, did Stefan add you correctly? I see MichaelLabib as a contributor, but not mikelabib... Best Erick On Thu, Sep 26, 2013 at 1:20 PM, Mike L. javaone...@yahoo.com wrote: ah sorry! its: mikelabib thanks! From: Stefan Matheis matheis.ste...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, September 26, 2013 12:05 PM Subject: Re: ContributorsGroup Mike To add you as Contributor i'd need to know your Username? :) Stefan On Thursday, September 26, 2013 at 6:50 PM, Mike L. wrote: Solr Admins, I've been using Solr for the last couple years and would like to contribute to this awesome project. Can I be added to the Contributorsgroup with also access to update the Wiki? Thanks in advance. Mike L.
Re: SolrCloud setup - any advice?
I think you're right, but you can specify a default value in your schema.xml to at least see if this is a good path to follow. Best, Erick On Fri, Sep 27, 2013 at 3:46 AM, Neil Prosser neil.pros...@gmail.com wrote: Good point. I'd seen docValues and wondered whether they might be of use in this situation. However, as I understand it they require a value to be set for all documents until Solr 4.5. Is that true or was I imagining reading that? On 25 September 2013 11:36, Erick Erickson erickerick...@gmail.com wrote: H, I confess I haven't had a chance to play with this yet, but have you considered docValues for some of your fields? See: http://wiki.apache.org/solr/DocValues And just to tantalize you: Since Solr4.2 to build a forward index for a field, for purposes of sorting, faceting, grouping, function queries, etc. You can specify a different docValuesFormat on the fieldType (docValuesFormat=Disk) to only load minimal data on the heap, keeping other data structures on disk. Do note, though: Not a huge improvement for a static index this latter isn't a problem though since you don't have a static index Erick On Tue, Sep 24, 2013 at 4:13 AM, Neil Prosser neil.pros...@gmail.com wrote: Shawn: unfortunately the current problems are with facet.method=enum! Erick: We already round our date queries so they're the same for at least an hour so thankfully our fq entries will be reusable. However, I'll take a look at reducing the cache and autowarming counts and see what the effect on hit ratios and performance are. For SolrCloud our soft commit (openSearcher=false) interval is 15 seconds and our hard commit is 15 minutes. You're right about those sorted fields having a lot of unique values. They can be any number between 0 and 10,000,000 (it's sparsely populated across the documents) and could appear in several variants across multiple documents. This is probably a good area for seeing what we can bend with regard to our requirements for sorting/boosting. I've just looked at two shards and they've each got upwards of 1000 terms showing in the schema browser for one (potentially out of 60) fields. On 21 September 2013 20:07, Erick Erickson erickerick...@gmail.com wrote: About caches. The queryResultCache is only useful when you expect there to be a number of _identical_ queries. Think of this cache as a map where the key is the query and the value is just a list of N document IDs (internal) where N is your window size. Paging is often the place where this is used. Take a look at your admin page for this cache, you can see the hit rates. But, the take-away is that this is a very small cache memory-wise, varying it is probably not a great predictor of memory usage. The filterCache is more intense memory wise, it's another map where the key is the fq clause and the value is bounded by maxDoc/8. Take a close look at this in the admin screen and see what the hit ratio is. It may be that you can make it much smaller and still get a lot of benefit. _Especially_ considering it could occupy about 44G of memory. (43,000,000 / 8) * 8192 And the autowarm count is excessive in most cases from what I've seen. Cutting the autowarm down to, say, 16 may not make a noticeable difference in your response time. And if you're using NOW in your fq clauses, it's almost totally useless, see: http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/ Also, read Uwe's excellent blog about MMapDirectory here: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html for some problems with over-allocating memory to the JVM. Of course if you're hitting OOMs, well. bq: order them by one of their fields. This is one place I'd look first. How many unique values are in each field that you sort on? This is one of the major memory consumers. You can get a sense of this by looking at admin/schema-browser and selecting the fields you sort on. There's a text box with the number of terms returned, then a / ### where ### is the total count of unique terms in the field. NOTE: in 4.4 this will be -1 for multiValued fields, but you shouldn't be sorting on those anyway. How many fields are you sorting on anyway, and of what types? For your SolrCloud experiments, what are your soft and hard commit intervals? Because something is really screwy here. Your sharding moving the number of docs down this low per shard should be fast. Back to the point above, the only good explanation I can come up with from this remove is that the fields you sort on have a LOT of unique values. It's possible that the total number of unique values isn't scaling with sharding. That is, each shard may have, say, 90% of all unique terms (number from thin air). Worth checking anyway, but a stretch. This is definitely unusual... Best, Erick On Thu,
Re: autocomplete_edge type split words
Have you looked at autoGeneratePhraseQueries? That might help. If that doesn't work, you can always do something like add an OR clause like OR original query and optionally boost it high. But I'd start with the autoGenerate bits. Best, Erick On Fri, Sep 27, 2013 at 7:37 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Thanks for your answer. So I guess if someone wants to search on two fields, on with phrase query and one with normal query (splitted in words), one has to find a way to send query twice: one with quote and one without... Best regards, Elisabeth 2013/9/27 Erick Erickson erickerick...@gmail.com This is a classic issue where there's confusion between the query parser and field analysis. Early in the process the query parser has to take the input and break it up. that's how, for instance, a query like text:term1 term2 gets parsed as text:term1 defaultfield:term2 This happens long before the terms get to the analysis chain for the field. So your only options are to either quote the string or escape the spaces. Best, Erick On Wed, Sep 25, 2013 at 9:24 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I am using solr 4.2.1 and I have a autocomplete_edge type defined in schema.xml fieldType name=autocomplete_edge class=solr.TextField analyzer type=index charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=\s+ replacement= replace=all/ filter class=solr.EdgeNGramFilterFactory maxGramSize=30 minGramSize=1/ /analyzer analyzer type=query charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=\s+ replacement= replace=all/ filter class=solr.PatternReplaceFilterFactory pattern=^(.{30})(.*)? replacement=$1 replace=all/ /analyzer /fieldType When I have a request with more then one word, for instance rue de la, my request doesn't match with my autocomplete_edge field unless I use quotes around the query. In other words q=rue de la doesnt work and q=rue de la works. I've check the request with debugQuery=on, and I can see in first case, the query is splitted into words, and I don't understand why since my field type uses KeywordTokenizerFactory. Does anyone have a clue on how I can request my field without using quotes? Thanks, Elisabeth
Re: Solr Commit Time
No way to say. How have you configured your autowarming parameters for instance? Why do you care? What problem are you trying to solve? Solr automatically handles warming up searchers and switching to the new one after a commit. Best, Erick On Fri, Sep 27, 2013 at 7:56 AM, Prasi S prasi1...@gmail.com wrote: Hi, What would be the maximum commit time for indexing 1 lakh documents in solr on a 32 gb machine. Thanks, Prasi
Re: Solr client 'Timeout was reached' ~ when new documents are inserted and commits are made.
No, this isn't normal. You probably have your servlet container or your clients have a too-short timeout. How long are we talking about here anyway? Best, Erick On Fri, Sep 27, 2013 at 8:57 AM, Rafał Radecki r.rade...@polskapresse.pl wrote: Hi All. I have a solr 3.5 multicore installation. It has ~250 of documents, ~1,5GB of index data. When the solr is feed with new documents I see for a few seconds timeouts 'Timeout was reached' on clients. Is it normal behaviour of solr during inserting of new documents? Best regards, Rafał Radecki.
Re: Solr client 'Timeout was reached' ~ when new documents are inserted and commits are made.
On client side timeout is set to 5s but when I look in solr log I see QTime less than 5000 (in ms). We use jetty to start solr process, where should I look for directives connected with timeouts?
Re: Pubmed XML indexing
Many thanks both Mike and Alexandre. I'll peek those tools. Lux seems a good option. Thanks again, Francisco El 27/09/2013, a las 09:33, Michael Sokolov escribió: You might be interested in Lux (http://luxdb.org), which is designed for indexing and querying XML using Solr and Lucene. It can run index-supported XPath/XQuery over your documents, and you can define arbitrary XPath indexes. -Mike On 9/27/13 6:28 AM, Francisco Fernandez wrote: Hi, I'm a newby trying to index PubMed texts obtained as xml with similar structure to: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmedretmode=xmlid=23864173,22073418 The nodes I need to extract, expressed as XPaths would be: //PubmedArticle/MedlineCitation/PMID //PubmedArticle/MedlineCitation/DateCreated/Year //PubmedArticle/MedlineCitation/Article/ArticleTitle //PubmedArticle/MedlineCitation/Article/Abstract/AbstractText //PubmedArticle/MedlineCitation/MeshHeadingList/MeshHeading I think a way to index them in Solr is to create another xml structure similar to: add doc field name=idPMID/field field name=year_iYear/field field name=nameArticleTitle/field field name=abstract_sAbstractText/field field name=catMeshHeading1/field field name=catMeshHeading2/field /doc /add Being PMID = '23864173' and ArticleTitle = 'Cost-effectiveness of low-molecular-weight heparin compared with aspirin for prophylaxis against venous thromboembolism after total joint arthroplasty' and so on. With that structure I would post it to Solr using the following statement over the documents folder java -jar post.jar *.xml I'm wondering if is there a more direct way to perform the same task that does not imply a 'iterate-parsing-restructure-write to disk-post' cycle Many thanks Francisco
DIH - delta query and delta import query executes transformer twice
Hi It looks like when a DIH entity has a delta and delta import query plus a transformer defined the execution of both query's call the transformer. I was expecting it to only be called on the import query. Sure we can check for a null value or something and just return the row during the delta query execution, but is their a better way of doing this. That is not call the transformer in the first place ? Cheers Lee C
Re: autocomplete_edge type split words
Yes! what I've done is set autoGeneratePhraseQueries to true for my field, then give it a boost (bq=myAutompleteEdgeNGramField=my query with spaces^50). This only worked with autoGeneratePhraseQueries=true, for a reason I didn't understand. since when I did q= myAutompleteEdgeNGramField=my query with spaces, I didn't need autoGeneratePhraseQueries set to true. and, another thing is when I tried q=myAutocompleteNGramField:(my query with spaces) OR myAutompleteEdgeNGramField=my query with spaces (with a request handler with edismax and default operator field = AND), the request on myAutocompleteNGramField would OR the grams, so I had to put an AND (myAutocompleteNGramField:(my AND query AND with AND spaces)), which was pretty ugly. I don't always understand what is exactly going on. If you have a pointer to some text I could read to get more insights about this, please let me know. Thanks again, Best regards, Elisabeth 2013/9/27 Erick Erickson erickerick...@gmail.com Have you looked at autoGeneratePhraseQueries? That might help. If that doesn't work, you can always do something like add an OR clause like OR original query and optionally boost it high. But I'd start with the autoGenerate bits. Best, Erick On Fri, Sep 27, 2013 at 7:37 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Thanks for your answer. So I guess if someone wants to search on two fields, on with phrase query and one with normal query (splitted in words), one has to find a way to send query twice: one with quote and one without... Best regards, Elisabeth 2013/9/27 Erick Erickson erickerick...@gmail.com This is a classic issue where there's confusion between the query parser and field analysis. Early in the process the query parser has to take the input and break it up. that's how, for instance, a query like text:term1 term2 gets parsed as text:term1 defaultfield:term2 This happens long before the terms get to the analysis chain for the field. So your only options are to either quote the string or escape the spaces. Best, Erick On Wed, Sep 25, 2013 at 9:24 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I am using solr 4.2.1 and I have a autocomplete_edge type defined in schema.xml fieldType name=autocomplete_edge class=solr.TextField analyzer type=index charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=\s+ replacement= replace=all/ filter class=solr.EdgeNGramFilterFactory maxGramSize=30 minGramSize=1/ /analyzer analyzer type=query charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=\s+ replacement= replace=all/ filter class=solr.PatternReplaceFilterFactory pattern=^(.{30})(.*)? replacement=$1 replace=all/ /analyzer /fieldType When I have a request with more then one word, for instance rue de la, my request doesn't match with my autocomplete_edge field unless I use quotes around the query. In other words q=rue de la doesnt work and q=rue de la works. I've check the request with debugQuery=on, and I can see in first case, the query is splitted into words, and I don't understand why since my field type uses KeywordTokenizerFactory. Does anyone have a clue on how I can request my field without using quotes? Thanks, Elisabeth
Re: Solr client 'Timeout was reached' ~ when new documents are inserted and commits are made.
On 9/27/2013 7:41 AM, Rafał Radecki wrote: On client side timeout is set to 5s but when I look in solr log I see QTime less than 5000 (in ms). We use jetty to start solr process, where should I look for directives connected with timeouts? Five seconds is WAY too short a timeout for the entire http conversation. Generally a timeout is not required, but if you feel you need to set one, set it in terms of minutes, with one minute as an absolute minimum. Updates generally take longer than queries. The amount of time taken for the update itself is usually fairly small, but after a commit there is usually cache warming, which depending on your configuration can take quite a while. I'm pretty sure that you won't see the QTime of update requests in the log, at least not listed as QTime like it is on queries. Here are two entries from my log, one for the doc insert, the other for the commit. I believe the last number is the QTime, but it doesn't *say* QTime. INFO - 2013-09-27 08:27:00.806; org.apache.solr.update.processor.LogUpdateProcessor; [inclive] webapp=/solr path=/update params={wt=javabinversion=2} {add=[notimexpix438424 (144734108581888), notimexpix438425 (1447341085825171456), notimexpix438426 (1447341085826220032), notimexpix438427 (1447341085826220033), notimexpix438428 (1447341085827268608), notimexpix438429(1447341085828317184), notimexpix438430 (1447341085829365760), notimexpix438431 (1447341085830414336), notimexpix438432 (1447341085831462912), notimexpix438433 (1447341085831462913), ... (66 adds)]} 0 181 INFO - 2013-09-27 08:27:01.975; org.apache.solr.update.processor.LogUpdateProcessor; [inclive] webapp=/solr path=/update params={waitSearcher=truecommit=truewt=javabinversion=2softCommit=false} {commit=} 0 1065 Note that the QTime doesn't represent the total amount of time for the request, because it only measures the part that's in under the control of the specific class that's generating the log - in this case LogUpdateProcessor. It can't measure the time the servlet container takes to handle the HTTP conversation, or any part of the request that takes place in Solr classes called before or after LogUpdateProcessor. Thanks, Shawn
Re: Solr client 'Timeout was reached' ~ when new documents are inserted and commits are made.
On 9/27/2013 8:37 AM, Shawn Heisey wrote: INFO - 2013-09-27 08:27:00.806; org.apache.solr.update.processor.LogUpdateProcessor; [inclive] webapp=/solr path=/update params={wt=javabinversion=2} {add=[notimexpix438424 (144734108581888), notimexpix438425 (1447341085825171456), notimexpix438426 (1447341085826220032), notimexpix438427 (1447341085826220033), notimexpix438428 (1447341085827268608), notimexpix438429(1447341085828317184), notimexpix438430 (1447341085829365760), notimexpix438431 (1447341085830414336), notimexpix438432 (1447341085831462912), notimexpix438433 (1447341085831462913), ... (66 adds)]} 0 181 INFO - 2013-09-27 08:27:01.975; org.apache.solr.update.processor.LogUpdateProcessor; [inclive] webapp=/solr path=/update params={waitSearcher=truecommit=truewt=javabinversion=2softCommit=false} {commit=} 0 1065 Note that the QTime doesn't represent the total amount of time for the request, because it only measures the part that's in under the control of the specific class that's generating the log - in this case LogUpdateProcessor. It can't measure the time the servlet container takes to handle the HTTP conversation, or any part of the request that takes place in Solr classes called before or after LogUpdateProcessor. I can illustrate the difference between QTime and the actual transaction time by showing you the log entries from the application that correspond exactly to the Solr log entries I shared: INFO - 2013-09-27 08:27:00.815; chain.c: Insert done, 66, time = 315 INFO - 2013-09-27 08:27:01.976; chain.c: Commit done, time = 1161 The add request with 66 documents had a QTime of 181, but took 315 milliseconds. The commit had a QTime of 1065, but actually took 1161 milliseconds. Thanks, Shawn
Re: Solr Commit Time
Right, it could be minutes or hours. Are the documents five word of plain text or 500 pages of PDF? Is there one simple field or are you running multiple field for different languages, plus entity extraction? And so on. Also, some people on this list don't know the term lakh, it is better to use 100,000. wunder On Sep 27, 2013, at 6:10 AM, Erick Erickson wrote: No way to say. How have you configured your autowarming parameters for instance? Why do you care? What problem are you trying to solve? Solr automatically handles warming up searchers and switching to the new one after a commit. Best, Erick On Fri, Sep 27, 2013 at 7:56 AM, Prasi S prasi1...@gmail.com wrote: Hi, What would be the maximum commit time for indexing 1 lakh documents in solr on a 32 gb machine. Thanks, Prasi
Hello and help :)
Hello everyone, I'm having a problem regarding how to make a solr query, I've posted it on stackoverflow. Can someone help me? http://stackoverflow.com/questions/19039099/apache-solr-count-of-subquery-as-a-superquery-parameter Thanks in advance! -- -- // Matheus Salvia Desenvolvedor Mobile Celular: +55 11 9-6446-2332 Skype: meta.faraday
Re: Solr and jvm Garbage Collection tuning
ด้วยที่แทงบอลแบบออนไลน์กำลังมาแรงทำให้พวกโต๊ะบอลเดี๋ยวนี้ก็เริ่มขยับขยายมาเปิดรับแทงบอลออนไลน์เอง download goldclub http://www.goldclub.net/download/ เป้าหมายหลักในวิธีการเล่นคาสิโนนั้น มีเพื่อความเพลิดเพลินหรือความสนุก ไม่ใช่เพื่อมาหาเงินหรือหวังที่จะรวย เพราะนั้นเหมือนกับการที่เราเอาจิตใจของตนเองไปผูกติดกับวิธีการเล่นพนัน โปรโมชั่น goldclub slot http://www.goldclub.net/promotion/ เล่นสนุก เล่นง่าย พร้อมบริการอย่างเป็นกันเอง กับทีมงาน ผลบอลเมื่อคืนนี้ http://www.mixscore.com/result-score/ จากที่เราได้เห็นวิธีการเล่นการพนันที่เล่นกันง่ายนั่นก็เลยทำให้คนเรานั่นเกิดความคิดที่อยากจะลองเล่นการพนันลองดู สาเหตุที่ทำให้นักเล่นหน้าใหม่ได้หัดเล่นเกมส์ซะเป็นส่วนใหญ่ goldclub slot http://www.goldclub-slot.com/ เพราะแน่นอนว่าจากที่เคยไปเที่ยวประเทศไหนที่มีคาสิโนและเข้าไปลองเล่นดูก็คงจะได้สัมผัสถึงความคึกคักของคาสิโนนั้นๆ ถอนออกมาทั้ง 1200 บาทเลยก็ได้ หรือจะถอนมาแค่ 1000 บาท อีก 200 บาทเก็บไว้ เล่นอีก แบบนี้ก็ได้เล่นกัน -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-and-jvm-Garbage-Collection-tuning-tp1455467p4092328.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sum function causing error in solr
solr-4.3.1 -- View this message in context: http://lucene.472066.n3.nabble.com/Sum-function-causing-error-in-solr-tp4091901p4092342.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr MailEntityProcessor not indexing Content-Type: multipart/mixed; emails
Hi, Trying to use DIH and MailEntityProcessor but are unable to index emails that have Content-Type: multipart/mixed; or Content-Type: multipart/related; header. Solr logs show correct number of emails in the inbox when IMAP connection is established but only emails that are of Content-Type: text/plain; or Content-Type: text/html; are indexed. No exceptions thrown. I am using out of the box example config that ships with solr-4-4.0 with the following data-config.xml dataConfig document !-- Note - In order to index attachments, set processAttachement=true and drop Tika and its dependencies to example-DIH/solr/mail/lib directory -- entity processor=MailEntityProcessor user=our_email@address password=password host=imap.gmail.com protocol=imaps folders=Inbox name=sample_entity fetchSize=1000 processAttachement=true / /document /dataConfig Is this a know bug? Thanks.
Solr doesn't return TermVectors
I followed http://wiki.apache.org/solr/TermVectorComponent step by step but with the following request, I don't get any term vectors: http://localhost:8983/solr/mycol/select?q=id:1211wt=jsonindent=truetv=true Just to be sure, I have this in my schema: field name=test_field type=text_general indexed=true stored=true required=false multiValued=true termVectors=true termPositions=true termOffsets=true/ In my solrconfig, I have this: searchComponent name=tvComponent class=org.apache.solr.handler.component.TermVectorComponent/ Could anyone help me what the problem could be? BTW the solr version is 4.4.0. Thanx -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-doesn-t-return-TermVectors-tp4092397.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr doesn't return TermVectors
You forgot the qt=custom-request-handler parameter, such as on the wiki: http://localhost:8983/solr/select/?qt=tvrhq=includes:[* TO *]fl=id And you need the custom request handler, such as on the wiki: requestHandler name=tvrh class=org.apache.solr.handler.component.SearchHandler lst name=defaults bool name=tvtrue/bool /lst arr name=last-components strtvComponent/str /arr /requestHandler You can add that last-components list to your default handler, if you wish. I have more detailed examples in my e-book. -- Jack Krupansky -Original Message- From: alibozorgkhan Sent: Friday, September 27, 2013 3:04 PM To: solr-user@lucene.apache.org Subject: Solr doesn't return TermVectors I followed http://wiki.apache.org/solr/TermVectorComponent step by step but with the following request, I don't get any term vectors: http://localhost:8983/solr/mycol/select?q=id:1211wt=jsonindent=truetv=true Just to be sure, I have this in my schema: field name=test_field type=text_general indexed=true stored=true required=false multiValued=true termVectors=true termPositions=true termOffsets=true/ In my solrconfig, I have this: searchComponent name=tvComponent class=org.apache.solr.handler.component.TermVectorComponent/ Could anyone help me what the problem could be? BTW the solr version is 4.4.0. Thanx -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-doesn-t-return-TermVectors-tp4092397.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr doesn't return TermVectors
Thanks for your reply, I actually added that before and it didn't work. I tried it again and no luck. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-doesn-t-return-TermVectors-tp4092397p4092403.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr doesn't return TermVectors
Show us the response you got. If you did have everything set up 100% properly and are still not seeing term vectors, then maybe you had indexed the data before setting up the full config. In which case, you would simply need to reindex the data. In that case the tem vector section would have indicated which fl fields did not have term vectors. As a general proposition it didn't work is an extremely unhelpful response - it gives us no clues as to what you are actually seeing. -- Jack Krupansky -Original Message- From: alibozorgkhan Sent: Friday, September 27, 2013 3:41 PM To: solr-user@lucene.apache.org Subject: Re: Solr doesn't return TermVectors Thanks for your reply, I actually added that before and it didn't work. I tried it again and no luck. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-doesn-t-return-TermVectors-tp4092397p4092403.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr doesn't return TermVectors
Hi Jack, With this query: http://localhost:8983/solr/mycol/select?q=id:1211wt=jsonindent=truetv=trueqt=tvrh I see all the fields associated with id:1211. I unloaded my collection using the Core Admin panel in solr, removed data and core.properties in my collection, added the core again and imported the data. By didn't work, I mean it returns everything I expect except the term vectors. Cheers -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-doesn-t-return-TermVectors-tp4092397p4092406.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr doesn't return TermVectors
: http://localhost:8983/solr/mycol/select?q=id:1211wt=jsonindent=truetv=trueqt=tvrh : : I see all the fields associated with id:1211. I unloaded my collection using : the Core Admin panel in solr, removed data and core.properties in my : collection, added the core again and imported the data. : : By didn't work, I mean it returns everything I expect except the term : vectors. You've shown us: - a request url - a field declaration you have not shown us: - your solrconfig.xml showing the request handler configuration (and requestDispatcher configuation) - the response you get from that request url - the log messaages you get when you hit that request url these are all things that are pretty much mandatory for us to even begin to guess what might be going wrong for you... https://wiki.apache.org/solr/UsingMailingLists -Hoss
Re: Solr doesn't return TermVectors
On 9/27/2013 1:35 PM, Jack Krupansky wrote: You forgot the qt=custom-request-handler parameter, such as on the wiki: http://localhost:8983/solr/select/?qt=tvrhq=includes:[* TO *]fl=id And you need the custom request handler, such as on the wiki: requestHandler name=tvrh class=org.apache.solr.handler.component.SearchHandler lst name=defaults bool name=tvtrue/bool /lst arr name=last-components strtvComponent/str /arr /requestHandler You can add that last-components list to your default handler, if you wish. I have more detailed examples in my e-book. That wiki page probably needs to be updated to have a /tvrh handler instead of tvrh, and with /tvrh instead of /select. The 'qt' route is the old way of doing things, before handleSelect=false became the accepted best practice. In order to help the sender, I've been trying to get this working on my dev server (running 4.4.0) and keep running into NullPointerException problems. I think there's something important that I'm missing about how to use the component. Here's an example of what my URL and request handler are using: http://server:port/solr/core/tv?q=id:someIdtv.fl=catchall requestHandler name=/tv class=solr.SearchHandler startup=lazy lst name=defaults bool name=tvtrue/bool /lst arr name=components strtvComponent/str /arr /requestHandler java.lang.NullPointerException at org.apache.solr.handler.component.TermVectorComponent.process(TermVectorComponent.java:251) Thanks, Shawn
Re: Solr doesn't return TermVectors
Hi, - This is the part I added to the solrconfig.xml: searchComponent name=tvComponent class=org.apache.solr.handler.component.TermVectorComponent/ requestHandler name=tvrh class=org.apache.solr.handler.component.SearchHandler lst name=defaults bool name=tvtrue/bool /lst arr name=last-components strtvComponent/str /arr /requestHandler - This is the result: response lst name=responseHeader int name=status0/int int name=QTime0/int lst name=params str name=indenttrue/str str name=q*:*/str str name=tvtrue/str str name=qttvrh/str str name=wtxml/str /lst/lst result name=response numFound=3 start=0 doc int name=id1/int arr name=test_field striphone chair/str /arr long name=_version_1447362558901092352/long /doc doc int name=id2/int arr name=test_field strlaptop macbook note/str /arr long name=_version_1447362568761901056/long /doc doc int name=id3/int arr name=test_field striphone is an iphone !/str /arr long name=_version_1447362579746783232/long /doc /result /response - I don't see any logs about this query Cheers -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-doesn-t-return-TermVectors-tp4092397p4092409.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr doesn't return TermVectors
Shawn !! That is it ! That fixed my problem, I changed name=tvrh to name=/tvrh and used http://localhost:8983/solr/mycol/tvrh instead and now it is returning the term vectors ! Thanx man -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-doesn-t-return-TermVectors-tp4092397p4092413.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Implementing Solr Suggester for Autocomplete (multiple columns)
I am not sure about the value to use for the option popularity. Is there a method or do you just go with some arbitrary number? On Thursday, September 26, 2013, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Great!! I haven't see your message yet, perhaps you could create a PR to that Github repository, son it will be in sync with current versions of Solr. - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Jueves, 26 de Septiembre 2013 9:10:49 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) solved. On Thu, Sep 26, 2013 at 1:50 PM, JMill apprentice...@googlemail.com wrote: I managed to get rid of the query error by playing jquery file in the velocity folder and adding line: script type=text/javascript src=#{url_for_solr}/admin/file?file=/velocity/jquery.min.jscontentType=text/javascript/script. That has not solved the issues the console is showing a new error - [13:42:55.181] TypeError: $.browser is undefined @ http://localhost:8983/solr/ac/admin/file?file=/velocity/jquery.autocomplete.jscontentType=text/javascript:90 . Any ideas? On Thu, Sep 26, 2013 at 1:12 PM, JMill apprentice...@googlemail.com wrote: Do you know the directory the #{url_root} in script type=text/javascript src=#{url_root}/js/lib/ jquery-1.7.2.min.js/script points too? and same for #{url_for_solr} script type=text/javascript src=#{url_for_solr}/js/lib/jquery-1.7.2.min.js/script On Wed, Sep 25, 2013 at 7:33 PM, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Try quering the core where the data has been imported, something like: http://localhost:8983/solr/suggestions/select?q=uc In the previous URL suggestions is the name I give to the core, so this should change, if you get results, then the problem could be the jquery dependency. I don't remember doing any change, as far as I know that js file is bundled with solr (at leat in 3.x) version perhaps you could change it the correct jquery version on solr 4.4, if you go into the admin panel (in solr 3.6): http://localhost:8983/solr/admin/schema.jsp And inspect the loaded code, the required file (jquery-1.4.2.min.js) gets loaded in solr 4.4 it should load a similar file, but perhaps a more recent version. Perhaps you could change that part to something like: script type=text/javascript src=#{url_root}/js/lib/jquery-1.7.2.min.js/script Which is used at least on a solr 4.1 that I have laying aroud here somewhere. In any case you can test the suggestions using the URL that I suggest on the top of this mail, in that case you should be able to see the possible results, of course in a less fancy way. - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 13:59:32 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) Could it be the jquery library that is the problem? I opened up solr-home/ac/conf/velocity/head.vm with an editor and I see a reference to the jquery library but I can't seem to find the directory referenced, line: script type=text/javascript src=#{url_for_solr}/admin/jquery-1.4.3.min.js. Do you know where #{url_for_solr} points to? On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez
Re: Hello and help :)
Mattheus, Given these mails form a part of an archive that are themselves self-contained, can you please post your actual question here? You're more likely to get answers that way. Thanks, Upayavira On Fri, Sep 27, 2013, at 04:36 PM, Matheus Salvia wrote: Hello everyone, I'm having a problem regarding how to make a solr query, I've posted it on stackoverflow. Can someone help me? http://stackoverflow.com/questions/19039099/apache-solr-count-of-subquery-as-a-superquery-parameter Thanks in advance! -- -- // Matheus Salvia Desenvolvedor Mobile Celular: +55 11 9-6446-2332 Skype: meta.faraday
Re: Cross index join query performance
Hi Joel, I tried this patch and it is quite a bit faster. Using the same query on a larger index (500K docs), the 'join' QTime was 1500 msec, and the 'hjoin' QTime was 100 msec! This was for true for large and small result sets. A few notes: the patch didn't compile with 4.3 because of the SolrCore.getLatestSchema call (which I worked around), and the package name should be: queryParser name=hjoin class=org.apache.solr.search.joins.HashSetJoinQParserPlugin/ Unfortunately, I just learned that our uniqueKey may have to be an alphanumeric string instead of an int, so I'm not out of the woods yet. Good stuff - thanks. Peter On Thu, Sep 26, 2013 at 6:49 PM, Joel Bernstein joels...@gmail.com wrote: It looks like you are using int join keys so you may want to check out SOLR-4787, specifically the hjoin and bjoin. These perform well when you have a large number of results from the fromIndex. If you have a small number of results in the fromIndex the standard join will be faster. On Wed, Sep 25, 2013 at 3:39 PM, Peter Keegan peterlkee...@gmail.com wrote: I forgot to mention - this is Solr 4.3 Peter On Wed, Sep 25, 2013 at 3:38 PM, Peter Keegan peterlkee...@gmail.com wrote: I'm doing a cross-core join query and the join query is 30X slower than each of the 2 individual queries. Here are the queries: Main query: http://localhost:8983/solr/mainindex/select?q=title:java QTime: 5 msec hit count: 1000 Sub query: http://localhost:8983/solr/subindex/select?q=+fld1:[0.1 TO 0.3] QTime: 4 msec hit count: 25K Join query: http://localhost:8983/solr/mainindex/select?q=title:javafq={!joinfromIndex=mainindextoIndex=subindexfrom=docid to=docid}fld1:[0.1 TO 0.3] QTime: 160 msec hit count: 205 Here are the index spec's: mainindex size: 117K docs, 1 segment mainindex schema: field name=docid type=int indexed=true stored=true required=true multiValued=false / field name=title type=text_en_splitting indexed=true stored=true multiValued=false / uniqueKeydocid/uniqueKey subindex size: 117K docs, 1 segment subindex schema: field name=docid type=int indexed=true stored=true required=true multiValued=false / field name=fld1 type=float indexed=true stored=true required=false multiValued=false / uniqueKeydocid/uniqueKey With debugQuery=true I see: debug:{ join:{ {!join from=docid to=docid fromIndex=subindex}fld1:[0.1 TO 0.3]:{ time:155, fromSetSize:24742, toSetSize:24742, fromTermCount:117810, fromTermTotalDf:117810, fromTermDirectCount:117810, fromTermHits:24742, fromTermHitsTotalDf:24742, toTermHits:24742, toTermHitsTotalDf:24742, toTermDirectCount:24627, smallSetsDeferred:115, toSetDocsAdded:24742}}, Via profiler and debugger, I see 150 msec spent in the outer 'while(term!=null)' loop in: JoinQueryWeight.getDocSet(). This seems like a lot of time to join the bitsets. Does this seem right? Peter -- Joel Bernstein Professional Services LucidWorks
Re: Hello and help :)
Sure, sorry for the inconvenience. I'm having a little trouble trying to make a query in Solr. The problem is: I must be able retrieve documents that have the same value for a specified field, but they should only be retrieved if this value appeared more than X times for a specified user. In pseudosql it would be something like: select user_id from documents where my_field=my_value and (select count(*) from documents where my_field=my_value and user_id=super.user_id) X I Know that solr return a 'numFound' for each query you make, but I dont know how to retrieve this value in a subquery. My Solr is organized in a way that a user is a document, and the properties of the user (such as name, age, etc) are grouped in another document with a 'root_id' field. So lets suppose the following query that gets all the root documents whose children have the prefix some_prefix. is_root:true AND _query_:{!join from=root_id to=id}requests_prefix:\some_prefix\ Now, how can I get the root documents (users in some sense) that have more than X children matching 'requests_prefix:some_prefix' or any other condition? Is it possible? P.S. It must be done in a single query, fields can be added at will, but the root/children structure should be preserved (preferentially). 2013/9/27 Upayavira u...@odoko.co.uk Mattheus, Given these mails form a part of an archive that are themselves self-contained, can you please post your actual question here? You're more likely to get answers that way. Thanks, Upayavira On Fri, Sep 27, 2013, at 04:36 PM, Matheus Salvia wrote: Hello everyone, I'm having a problem regarding how to make a solr query, I've posted it on stackoverflow. Can someone help me? http://stackoverflow.com/questions/19039099/apache-solr-count-of-subquery-as-a-superquery-parameter Thanks in advance! -- -- // Matheus Salvia Desenvolvedor Mobile Celular: +55 11 9-6446-2332 Skype: meta.faraday -- -- // Matheus Salvia Desenvolvedor Mobile Celular: +55 11 9-6446-2332 Skype: meta.faraday
Re: Solr doesn't return TermVectors
You are using components instead of last-components, so you have to all search components, including the QueryComponent. Better to use last-components. -- Jack Krupansky -Original Message- From: Shawn Heisey Sent: Friday, September 27, 2013 4:02 PM To: solr-user@lucene.apache.org Subject: Re: Solr doesn't return TermVectors On 9/27/2013 1:35 PM, Jack Krupansky wrote: You forgot the qt=custom-request-handler parameter, such as on the wiki: http://localhost:8983/solr/select/?qt=tvrhq=includes:[* TO *]fl=id And you need the custom request handler, such as on the wiki: requestHandler name=tvrh class=org.apache.solr.handler.component.SearchHandler lst name=defaults bool name=tvtrue/bool /lst arr name=last-components strtvComponent/str /arr /requestHandler You can add that last-components list to your default handler, if you wish. I have more detailed examples in my e-book. That wiki page probably needs to be updated to have a /tvrh handler instead of tvrh, and with /tvrh instead of /select. The 'qt' route is the old way of doing things, before handleSelect=false became the accepted best practice. In order to help the sender, I've been trying to get this working on my dev server (running 4.4.0) and keep running into NullPointerException problems. I think there's something important that I'm missing about how to use the component. Here's an example of what my URL and request handler are using: http://server:port/solr/core/tv?q=id:someIdtv.fl=catchall requestHandler name=/tv class=solr.SearchHandler startup=lazy lst name=defaults bool name=tvtrue/bool /lst arr name=components strtvComponent/str /arr /requestHandler java.lang.NullPointerException at org.apache.solr.handler.component.TermVectorComponent.process(TermVectorComponent.java:251) Thanks, Shawn
Re: Solr doesn't return TermVectors
On 9/27/2013 4:02 PM, Jack Krupansky wrote: You are using components instead of last-components, so you have to all search components, including the QueryComponent. Better to use last-components. That did it. Thank you! I didn't know why this was a problem even with your note, until I read the last part of this page, which says that using components will entirely replace the default component list with what you specify: http://wiki.apache.org/solr/SearchComponent I copied and modified the handler from one I've already got that's using TermsComponent, which was using components instead of last-components. That handler works, so I figured it would for /tv as well. :) Thanks, Shawn
Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception
I spent some more time thinking about this. Do you really need to use the TikaEntityProcessor? It doesn't offer anything new to the document you are building that couldn't be accomplished by the XPathEntityProcessor alone from what I can tell. I also tried to get the Advanced Parsinghttp://wiki.apache.org/solr/TikaEntityProcessorexample to work without success. There are some obvious typos (document instead of /document) and an odd order to the pieces (dataSources is enclosed by document). It also looks like FieldStreamDataSourcehttp://lucene.apache.org/solr/4_3_1/solr-dataimporthandler/org/apache/solr/handler/dataimport/FieldStreamDataSource.htmlis the one that is meant to work in this context. If Koji is still around maybe he could offer some help? Otherwise this bit of erroneous instruction should probably be removed from the wiki. Cheers, Tricia $ svn diff Index: solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java === --- solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java (revision 1526990) +++ solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java (working copy) @@ -99,13 +99,13 @@ runFullImport(getConfigHTML(identity)); assertQ(req(*:*), testsHTMLIdentity); } - + private String getConfigHTML(String htmlMapper) { return dataConfig + dataSource type='BinFileDataSource'/ + document + -entity name='Tika' format='xml' processor='TikaEntityProcessor' + +entity name='Tika' format='html' processor='TikaEntityProcessor' + url=' + getFile(dihextras/structured.html).getAbsolutePath() + ' + ((htmlMapper == null) ? : ( htmlMapper=' + htmlMapper + ')) + + field column='text'/ + @@ -114,4 +114,36 @@ /dataConfig; } + private String[] testsHTMLH1 = { + //*[@numFound='1'] + , //str[@name='h1'][contains(.,'H1 Header')] + }; + + @Test + public void testTikaHTMLMapperSubEntity() throws Exception { +runFullImport(getConfigSubEntity(identity)); +assertQ(req(*:*), testsHTMLH1); + } + + private String getConfigSubEntity(String htmlMapper) { +return +dataConfig + +dataSource type='BinFileDataSource' name='bin'/ + +dataSource type='FieldStreamDataSource' name='fld'/ + +document + +entity name='tika' processor='TikaEntityProcessor' url=' + getFile(dihextras/structured.html).getAbsolutePath() + ' dataSource='bin' format='html' rootEntity='false' + +!--Do appropriate mapping here meta=\true\ means it is a metadata field -- + +field column='Author' meta='true' name='author'/ + +field column='title' meta='true' name='title'/ + +!--'text' is an implicit field emited by TikaEntityProcessor . Map it appropriately-- + +field name='text' column='text'/ + +entity name='detail' type='XPathEntityProcessor' forEach='/html' dataSource='fld' dataField='tika.text' rootEntity='true' + +field xpath='//div' column='foo'/ + +field xpath='//h1' column='h1' / + +/entity + +/entity + +/document + +/dataConfig; + } + } Index: solr/contrib/dataimporthandler-extras/src/test-files/dihextras/solr/collection1/conf/dataimport-schema-no-unique-key.xml === --- solr/contrib/dataimporthandler-extras/src/test-files/dihextras/solr/collection1/conf/dataimport-schema-no-unique-key.xml (revision 1526990) +++ solr/contrib/dataimporthandler-extras/src/test-files/dihextras/solr/collection1/conf/dataimport-schema-no-unique-key.xml (working copy) @@ -194,6 +194,8 @@ field name=title type=string indexed=true stored=true/ field name=author type=string indexed=true stored=true / field name=text type=text indexed=true stored=true / + field name=h1 type=text indexed=true stored=true / + field name=foo type=text indexed=true stored=true / /fields !-- field for the QueryParser to use when an explicit fieldname is absent -- I find the SqlEntityProcessor part particularly odd. That's the default right?: 2405 T12 C1 oashd.SqlEntityProcessor.initQuery ERROR The query failed 'null' java.lang.RuntimeException: unsupported type : class java.lang.String at org.apache.solr.handler.dataimport.FieldStreamDataSource.getData(FieldStreamDataSource.java:89) at org.apache.solr.handler.dataimport.FieldStreamDataSource.getData(FieldStreamDataSource.java:1) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at
Re: Hello and help :)
If I understand your question right, Result Grouping in Solr might help you. Refer here https://cwiki.apache.org/confluence/display/solr/Result+Grouping . -- View this message in context: http://lucene.472066.n3.nabble.com/Hello-and-help-tp4092371p4092439.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Hello and help :)
Yes, but how to use result grouping inside a join/subquery? 2013/9/27 ssami ss...@outlook.com If I understand your question right, Result Grouping in Solr might help you. Refer here https://cwiki.apache.org/confluence/display/solr/Result+Grouping . -- View this message in context: http://lucene.472066.n3.nabble.com/Hello-and-help-tp4092371p4092439.html Sent from the Solr - User mailing list archive at Nabble.com. -- -- // Matheus Salvia Desenvolvedor Mobile Celular: +55 11 9-6446-2332 Skype: meta.faraday
RE: Hello and help :)
Sorry, I take it back. I overlooked that you have two different collections. Thanks, — Socratees. Date: Fri, 27 Sep 2013 20:03:46 -0300 Subject: Re: Hello and help :) From: matheus2...@gmail.com To: solr-user@lucene.apache.org Yes, but how to use result grouping inside a join/subquery? 2013/9/27 ssami ss...@outlook.com If I understand your question right, Result Grouping in Solr might help you. Refer here https://cwiki.apache.org/confluence/display/solr/Result+Grouping . -- View this message in context: http://lucene.472066.n3.nabble.com/Hello-and-help-tp4092371p4092439.html Sent from the Solr - User mailing list archive at Nabble.com. -- -- // Matheus Salvia Desenvolvedor Mobile Celular: +55 11 9-6446-2332 Skype: meta.faraday
Re: Hello and help :)
Ssami, I work with Matheus and I am helping him to take a look at this problem. We took a look at result grouping, thinking it could help us, but it has two drawbacks: - We cannot have multivalued fields, if I understood it correctly. But ok, we could manage that... - Suppose some query like that: - select count(*) NUMBER group by FIELD where CONDITION AND NUMBER 5 - In this case, we are not just taking the count for each group as a result. The count actually makes part of the where clause. - AFAIK, result grouping doesn't allow that, although I would really love to be proven wrong :D We really need this, so I am trying to figure what could I change in solr to make this work... Any hint on that? We would need to write a custom facet / search handler / search component ? Of course we prefer a solution that works with current solr features, but we could consider writing some custom code to do that Thanks in advance! Best regards, Marcelo Valle. 2013/9/27 ssami ss...@outlook.com If I understand your question right, Result Grouping in Solr might help you. Refer here https://cwiki.apache.org/confluence/display/solr/Result+Grouping . -- View this message in context: http://lucene.472066.n3.nabble.com/Hello-and-help-tp4092371p4092439.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception
ok i see what your getting at but why doesn't the following work: field xpath=//h:h1 column=h_1 / field column=text xpath=/xhtml:html/xhtml:body / i removed the tiki-processor. what am i missing, i haven't found anything in the wiki? On 28. Sep 2013, at 12:28 AM, P Williams wrote: I spent some more time thinking about this. Do you really need to use the TikaEntityProcessor? It doesn't offer anything new to the document you are building that couldn't be accomplished by the XPathEntityProcessor alone from what I can tell. I also tried to get the Advanced Parsinghttp://wiki.apache.org/solr/TikaEntityProcessorexample to work without success. There are some obvious typos (document instead of /document) and an odd order to the pieces (dataSources is enclosed by document). It also looks like FieldStreamDataSourcehttp://lucene.apache.org/solr/4_3_1/solr-dataimporthandler/org/apache/solr/handler/dataimport/FieldStreamDataSource.htmlis the one that is meant to work in this context. If Koji is still around maybe he could offer some help? Otherwise this bit of erroneous instruction should probably be removed from the wiki. Cheers, Tricia $ svn diff Index: solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java === --- solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java (revision 1526990) +++ solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java (working copy) @@ -99,13 +99,13 @@ runFullImport(getConfigHTML(identity)); assertQ(req(*:*), testsHTMLIdentity); } - + private String getConfigHTML(String htmlMapper) { return dataConfig + dataSource type='BinFileDataSource'/ + document + -entity name='Tika' format='xml' processor='TikaEntityProcessor' + +entity name='Tika' format='html' processor='TikaEntityProcessor' + url=' + getFile(dihextras/structured.html).getAbsolutePath() + ' + ((htmlMapper == null) ? : ( htmlMapper=' + htmlMapper + ')) + + field column='text'/ + @@ -114,4 +114,36 @@ /dataConfig; } + private String[] testsHTMLH1 = { + //*[@numFound='1'] + , //str[@name='h1'][contains(.,'H1 Header')] + }; + + @Test + public void testTikaHTMLMapperSubEntity() throws Exception { +runFullImport(getConfigSubEntity(identity)); +assertQ(req(*:*), testsHTMLH1); + } + + private String getConfigSubEntity(String htmlMapper) { +return +dataConfig + +dataSource type='BinFileDataSource' name='bin'/ + +dataSource type='FieldStreamDataSource' name='fld'/ + +document + +entity name='tika' processor='TikaEntityProcessor' url=' + getFile(dihextras/structured.html).getAbsolutePath() + ' dataSource='bin' format='html' rootEntity='false' + +!--Do appropriate mapping here meta=\true\ means it is a metadata field -- + +field column='Author' meta='true' name='author'/ + +field column='title' meta='true' name='title'/ + +!--'text' is an implicit field emited by TikaEntityProcessor . Map it appropriately-- + +field name='text' column='text'/ + +entity name='detail' type='XPathEntityProcessor' forEach='/html' dataSource='fld' dataField='tika.text' rootEntity='true' + +field xpath='//div' column='foo'/ + +field xpath='//h1' column='h1' / + +/entity + +/entity + +/document + +/dataConfig; + } + } Index: solr/contrib/dataimporthandler-extras/src/test-files/dihextras/solr/collection1/conf/dataimport-schema-no-unique-key.xml === --- solr/contrib/dataimporthandler-extras/src/test-files/dihextras/solr/collection1/conf/dataimport-schema-no-unique-key.xml (revision 1526990) +++ solr/contrib/dataimporthandler-extras/src/test-files/dihextras/solr/collection1/conf/dataimport-schema-no-unique-key.xml (working copy) @@ -194,6 +194,8 @@ field name=title type=string indexed=true stored=true/ field name=author type=string indexed=true stored=true / field name=text type=text indexed=true stored=true / + field name=h1 type=text indexed=true stored=true / + field name=foo type=text indexed=true stored=true / /fields !-- field for the QueryParser to use when an explicit fieldname is absent -- I find the SqlEntityProcessor part particularly odd. That's the default right?: 2405 T12 C1 oashd.SqlEntityProcessor.initQuery ERROR The query failed 'null' java.lang.RuntimeException: unsupported type : class java.lang.String at
Re: Implementing Solr Suggester for Autocomplete (multiple columns)
Actually I don't use that field, it could be used to do some form of basic collaborative filtering, so you could use a high value for items in your collection that you want to come first, but in my case this was not a requirement and I don't use it at all. - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Viernes, 27 de Septiembre 2013 16:19:40 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) I am not sure about the value to use for the option popularity. Is there a method or do you just go with some arbitrary number? On Thursday, September 26, 2013, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Great!! I haven't see your message yet, perhaps you could create a PR to that Github repository, son it will be in sync with current versions of Solr. - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Jueves, 26 de Septiembre 2013 9:10:49 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) solved. On Thu, Sep 26, 2013 at 1:50 PM, JMill apprentice...@googlemail.com wrote: I managed to get rid of the query error by playing jquery file in the velocity folder and adding line: script type=text/javascript src=#{url_for_solr}/admin/file?file=/velocity/jquery.min.jscontentType=text/javascript/script. That has not solved the issues the console is showing a new error - [13:42:55.181] TypeError: $.browser is undefined @ http://localhost:8983/solr/ac/admin/file?file=/velocity/jquery.autocomplete.jscontentType=text/javascript:90 . Any ideas? On Thu, Sep 26, 2013 at 1:12 PM, JMill apprentice...@googlemail.com wrote: Do you know the directory the #{url_root} in script type=text/javascript src=#{url_root}/js/lib/ jquery-1.7.2.min.js/script points too? and same for #{url_for_solr} script type=text/javascript src=#{url_for_solr}/js/lib/jquery-1.7.2.min.js/script On Wed, Sep 25, 2013 at 7:33 PM, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Try quering the core where the data has been imported, something like: http://localhost:8983/solr/suggestions/select?q=uc In the previous URL suggestions is the name I give to the core, so this should change, if you get results, then the problem could be the jquery dependency. I don't remember doing any change, as far as I know that js file is bundled with solr (at leat in 3.x) version perhaps you could change it the correct jquery version on solr 4.4, if you go into the admin panel (in solr 3.6): http://localhost:8983/solr/admin/schema.jsp And inspect the loaded code, the required file (jquery-1.4.2.min.js) gets loaded in solr 4.4 it should load a similar file, but perhaps a more recent version. Perhaps you could change that part to something like: script type=text/javascript src=#{url_root}/js/lib/jquery-1.7.2.min.js/script Which is used at least on a solr 4.1 that I have laying aroud here somewhere. In any case you can test the suggestions using the URL that I suggest on the top of this mail, in that case you should be able to see the possible results, of course in a less fancy way. - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 13:59:32 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) Could it be the jquery library that is the problem? I opened up solr-home/ac/conf/velocity/head.vm with an editor and I see a reference to the jquery library but I can't seem to find the directory referenced, line: script type=text/javascript src=#{url_for_solr}/admin/jquery-1.4.3.min.js. Do you know where #{url_for_solr} points to? On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
RE: Hello and help :)
Hi Marcelo, I haven't faced this exact situation before so I can only try posting my thoughts. Since Solr allows Result Grouping and Faceting at the same time, and since you can apply filters on these facets, can you take advantage of that? Or, What if you can facet by the field, and group by the field count, then apply facet filtering to exclude all filters with count less than 5? These links might be helpful. http://architects.dzone.com/articles/facet-over-same-field-multiple https://issues.apache.org/jira/browse/SOLR-2898 Thanks, — Socratees. Date: Fri, 27 Sep 2013 20:32:22 -0300 Subject: Re: Hello and help :) From: marc...@s1mbi0se.com.br To: solr-user@lucene.apache.org Ssami, I work with Matheus and I am helping him to take a look at this problem. We took a look at result grouping, thinking it could help us, but it has two drawbacks: - We cannot have multivalued fields, if I understood it correctly. But ok, we could manage that... - Suppose some query like that: - select count(*) NUMBER group by FIELD where CONDITION AND NUMBER 5 - In this case, we are not just taking the count for each group as a result. The count actually makes part of the where clause. - AFAIK, result grouping doesn't allow that, although I would really love to be proven wrong :D We really need this, so I am trying to figure what could I change in solr to make this work... Any hint on that? We would need to write a custom facet / search handler / search component ? Of course we prefer a solution that works with current solr features, but we could consider writing some custom code to do that Thanks in advance! Best regards, Marcelo Valle. 2013/9/27 ssami ss...@outlook.com If I understand your question right, Result Grouping in Solr might help you. Refer here https://cwiki.apache.org/confluence/display/solr/Result+Grouping . -- View this message in context: http://lucene.472066.n3.nabble.com/Hello-and-help-tp4092371p4092439.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Hello and help :)
Also, try the #solr and #solr-dev IRC channels at Freenode http://webchat.freenode.net/ Thanks, — Socratees. From: ss...@outlook.com To: solr-user@lucene.apache.org Subject: RE: Hello and help :) Date: Fri, 27 Sep 2013 17:23:28 -0700 Hi Marcelo, I haven't faced this exact situation before so I can only try posting my thoughts. Since Solr allows Result Grouping and Faceting at the same time, and since you can apply filters on these facets, can you take advantage of that? Or, What if you can facet by the field, and group by the field count, then apply facet filtering to exclude all filters with count less than 5? These links might be helpful. http://architects.dzone.com/articles/facet-over-same-field-multiple https://issues.apache.org/jira/browse/SOLR-2898 Thanks, — Socratees. Date: Fri, 27 Sep 2013 20:32:22 -0300 Subject: Re: Hello and help :) From: marc...@s1mbi0se.com.br To: solr-user@lucene.apache.org Ssami, I work with Matheus and I am helping him to take a look at this problem. We took a look at result grouping, thinking it could help us, but it has two drawbacks: - We cannot have multivalued fields, if I understood it correctly. But ok, we could manage that... - Suppose some query like that: - select count(*) NUMBER group by FIELD where CONDITION AND NUMBER 5 - In this case, we are not just taking the count for each group as a result. The count actually makes part of the where clause. - AFAIK, result grouping doesn't allow that, although I would really love to be proven wrong :D We really need this, so I am trying to figure what could I change in solr to make this work... Any hint on that? We would need to write a custom facet / search handler / search component ? Of course we prefer a solution that works with current solr features, but we could consider writing some custom code to do that Thanks in advance! Best regards, Marcelo Valle. 2013/9/27 ssami ss...@outlook.com If I understand your question right, Result Grouping in Solr might help you. Refer here https://cwiki.apache.org/confluence/display/solr/Result+Grouping . -- View this message in context: http://lucene.472066.n3.nabble.com/Hello-and-help-tp4092371p4092439.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception
This is a rather complicated example to chew through, but try the following two things: *) dataField=${tika.text} = dataField=text (or less likely htmlMapper tika.text) You might be trying to read content of the field rather than passing reference to the field that seems to be expected. This might explain the exception. *) It may help to be aware of https://issues.apache.org/jira/browse/SOLR-4530 . There is a new htmlMapper=identity flag on Tika entries to ensure more of HTML structure passing through. By default, Tika strips out most of the HTML tags. Regards, Alex. On Thu, Sep 26, 2013 at 5:17 PM, Andreas Owen a...@conx.ch wrote: entity name=tika processor=TikaEntityProcessor url=${rec.urlParse} dataSource=dataUrl onError=skip format=html field column=text/ entity name=detail type=XPathEntityProcessor forEach=/html dataSource=fld dataField=${tika.text} rootEntity=true onError=skip field xpath=//h1 column=h_1 / /entity /entity Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Issue in parallel Indexing using multiple csv files
Using SOLR 4.4 I'm trying to index solr core using a csv file of around 1 million records. To improve the performance, I've split the csv files into smaller sizes and tried to use csv update handler for each file to run in a separate thread. The outcome was weird. The total count of Solr Documents doesn't match with the total number of records in the csv files. But, when I run these in sequential manner, the outcome is as expected. So, the question is if it is a good option to run these csv files in parallel? Does it even work? -- View this message in context: http://lucene.472066.n3.nabble.com/Issue-in-parallel-Indexing-using-multiple-csv-files-tp4092452.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Hello and help :)
To phrase your need more generically: * find all documents for userID=x, where userID=x has more than y documents in the index Is that correct? If it is, I'd probably do some work at index time. First guess, I'd keep a separate core, which has a very small document per user, storing just: * userID * docCount Then, when you add/delete a document, you use atomic updates to either increase or decrease the docCount on that user doc. Then you can use a pseudo join between these two cores relatively easily. q=user_id:x {!join fromIndex=user from=user_id to=user_id}+user_id:x +doc_count:[y TO *] Worst case, if you don't want to mess with your indexing code, I wonder if you could use a ScriptUpdateProcessor to do this work - not sure if you can have one add an entirely new, additional, document to the list, but may be possible. Upayavira On Fri, Sep 27, 2013, at 09:50 PM, Matheus Salvia wrote: Sure, sorry for the inconvenience. I'm having a little trouble trying to make a query in Solr. The problem is: I must be able retrieve documents that have the same value for a specified field, but they should only be retrieved if this value appeared more than X times for a specified user. In pseudosql it would be something like: select user_id from documents where my_field=my_value and (select count(*) from documents where my_field=my_value and user_id=super.user_id) X I Know that solr return a 'numFound' for each query you make, but I dont know how to retrieve this value in a subquery. My Solr is organized in a way that a user is a document, and the properties of the user (such as name, age, etc) are grouped in another document with a 'root_id' field. So lets suppose the following query that gets all the root documents whose children have the prefix some_prefix. is_root:true AND _query_:{!join from=root_id to=id}requests_prefix:\some_prefix\ Now, how can I get the root documents (users in some sense) that have more than X children matching 'requests_prefix:some_prefix' or any other condition? Is it possible? P.S. It must be done in a single query, fields can be added at will, but the root/children structure should be preserved (preferentially). 2013/9/27 Upayavira u...@odoko.co.uk Mattheus, Given these mails form a part of an archive that are themselves self-contained, can you please post your actual question here? You're more likely to get answers that way. Thanks, Upayavira On Fri, Sep 27, 2013, at 04:36 PM, Matheus Salvia wrote: Hello everyone, I'm having a problem regarding how to make a solr query, I've posted it on stackoverflow. Can someone help me? http://stackoverflow.com/questions/19039099/apache-solr-count-of-subquery-as-a-superquery-parameter Thanks in advance! -- -- // Matheus Salvia Desenvolvedor Mobile Celular: +55 11 9-6446-2332 Skype: meta.faraday -- -- // Matheus Salvia Desenvolvedor Mobile Celular: +55 11 9-6446-2332 Skype: meta.faraday