Saravanan Chinnadurai/Actionimages is out of the office.
I will be out of the office starting 05/12/2011 and will not return until 05/01/2012. Please email to itsta...@actionimages.com for any urgent issues. Action Images is a division of Reuters Limited and your data will therefore be protected in accordance with the Reuters Group Privacy / Data Protection notice which is available in the privacy footer at www.reuters.com Registered in England No. 145516 VAT REG: 397000555
Re: Possible to facet across two indices, or document types in single index?
Well, the JoinQParserPlugin is definitely there. Turning on debug reveals why I get zero results. Given the URL: http://localhost:8091/solr/ing-content/select/?qt=partner-tmo&fq=type:node&q={!join+from=conceptId+to=id+fromIndex=partner-tmo}brca1&debugQuery=true&rows=5&fl=id,n_type,n_name I get: 0 1 true id,n_type,n_name {!join from=conceptId to=id fromIndex=partner-tmo}brca1 partner-tmo type:node 5 {!join from=conceptId to=id fromIndex=partner-tmo}brca1 {!join from=conceptId to=id fromIndex=partner-tmo}brca1 JoinQuery({!join from=conceptId to=id fromIndex=partner-tmo}n_text:brca) {!join from=conceptId to=id fromIndex=partner-tmo}n_text:brca type:node type:node ... It looks like despite qt=partner-tmo, the edismax based search hander is being bypassed for the default search handler, and is querying against the n_text field, which is the defaultSearchField for the ing-conent core. But, I don't want to use the default handler, but rather my configured edismax hander, and any specified filter queries, to determine the document set in the ing-conent core, and then join with the partner-tmo core. [Yes, the edismax handler in the ing-content core and the second core are both named partner-tmo]. Can the JoinQParserPlugin work in conjunction with edismax? Thanks, Jeff On Dec 4, 2011, at 4:12 PM, Jeff Schmidt wrote: > Hello again: > > I'm looking at the newer join functionality > (http://wiki.apache.org/solr/Join) to see if that will help me out. While > there are signs it can go cross index/core > (https://issues.apache.org/jira/browse/SOLR-2272), I doubt I can specify > facet.field params for fields in a couple of different indexes. But, perhaps > a single combined index it might work. > > Anyway, the above Jira item indicates status: resolved, resolution: fixed, > and Fix version/s: 4.0. I've been working with 3.5.0, so I checked out 4.0 > from svn today: > > [imac:svn/dev/trunk] jas% svn info > Path: . > URL: http://svn.apache.org/repos/asf/lucene/dev/trunk > Repository Root: http://svn.apache.org/repos/asf > Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 > Revision: 1210126 > ... > Last Changed Rev: 1210116 > Last Changed Date: 2011-12-04 07:35:46 -0700 (Sun, 04 Dec 2011) > > Issuing a join query looks like the local params syntax is being ignored and > is part of the search terms? I get zero results, when w/o the join I get 979. > > > >0 >1 > >id,n_type,n_name >{!join from=conceptId to=id > fromIndex=partner-tmo}brca1 >partner-tmo >type:node >5 > > > > > > I've not yet fully explored this yet, and I'm not all that familiar with the > Solr codebase, but is this functionality in 4.x trunk or not? I can see there > is the package org.apache.lucene.search.join. Is this the implementation of > SOLR-2272? > > I can see the commit was made earlier this year, and then it was reverted and > things went off the rails. I don't want to open any old wounds, but does the > join exist? I not, I'll know not to pursue it any further. If so, is there > some solrconfig.xml configuration needed to enable it? I don't see it in the > examples. > > Thanks, > > Jeff > > On Dec 1, 2011, at 9:47 PM, Jeff Schmidt wrote: > >> Hello: >> >> I'm trying to relate together two different types of documents. Currently I >> have 'node' documents that reside in one index (core), and 'product mapping' >> documents that are in another index. The product mapping index is used to >> map tenant products to nodes. The nodes are canonical content that gets >> updated every quarter, where as the product mappings can change at any time. >> >> I put them in two indexes because (1) canonical content changes rarely, and >> I don't want product mapping changes to affect it (commit, re-open searchers >> etc.), and I would like to support multiple tenants mapping products to the >> same canonical content to avoid duplication (a few GB). >> >> This arrange has worked well thus far, but only in the sense that for each >> node result returned, I can query the product mapping index to determine the >> products mapped to the node. I combine this information within my >> application and return it to the client. This works okay in that there are >> only 5-20 results returned per page (start, rows). But now I'm being asked >> to facet the product catagories (multi-valued field within a product mapping >> document) along with other facets defined in the canonical content. >> >> Can this be done with Solr 3.5.0? I've been looking into sub-queries, >> function queries etc. Also, I've seen various postings indicating that one >> nee
Re: Distributed Solr: different number of results each time
It seems like the error was caused by wrong list of shard URLs, kept in ZooKeeper. One possible workaround is to specify list of shards manually with shards=slave-node1,slave-node2,slave-node3,... (see SolrCluster documentation for details) -- View this message in context: http://lucene.472066.n3.nabble.com/Distributed-Solr-different-number-of-results-each-time-tp3550284p3560310.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Possible to facet across two indices, or document types in single index?
Hello again: I'm looking at the newer join functionality (http://wiki.apache.org/solr/Join) to see if that will help me out. While there are signs it can go cross index/core (https://issues.apache.org/jira/browse/SOLR-2272), I doubt I can specify facet.field params for fields in a couple of different indexes. But, perhaps a single combined index it might work. Anyway, the above Jira item indicates status: resolved, resolution: fixed, and Fix version/s: 4.0. I've been working with 3.5.0, so I checked out 4.0 from svn today: [imac:svn/dev/trunk] jas% svn info Path: . URL: http://svn.apache.org/repos/asf/lucene/dev/trunk Repository Root: http://svn.apache.org/repos/asf Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 Revision: 1210126 ... Last Changed Rev: 1210116 Last Changed Date: 2011-12-04 07:35:46 -0700 (Sun, 04 Dec 2011) Issuing a join query looks like the local params syntax is being ignored and is part of the search terms? I get zero results, when w/o the join I get 979. 0 1 id,n_type,n_name {!join from=conceptId to=id fromIndex=partner-tmo}brca1 partner-tmo type:node 5 I've not yet fully explored this yet, and I'm not all that familiar with the Solr codebase, but is this functionality in 4.x trunk or not? I can see there is the package org.apache.lucene.search.join. Is this the implementation of SOLR-2272? I can see the commit was made earlier this year, and then it was reverted and things went off the rails. I don't want to open any old wounds, but does the join exist? I not, I'll know not to pursue it any further. If so, is there some solrconfig.xml configuration needed to enable it? I don't see it in the examples. Thanks, Jeff On Dec 1, 2011, at 9:47 PM, Jeff Schmidt wrote: > Hello: > > I'm trying to relate together two different types of documents. Currently I > have 'node' documents that reside in one index (core), and 'product mapping' > documents that are in another index. The product mapping index is used to > map tenant products to nodes. The nodes are canonical content that gets > updated every quarter, where as the product mappings can change at any time. > > I put them in two indexes because (1) canonical content changes rarely, and I > don't want product mapping changes to affect it (commit, re-open searchers > etc.), and I would like to support multiple tenants mapping products to the > same canonical content to avoid duplication (a few GB). > > This arrange has worked well thus far, but only in the sense that for each > node result returned, I can query the product mapping index to determine the > products mapped to the node. I combine this information within my > application and return it to the client. This works okay in that there are > only 5-20 results returned per page (start, rows). But now I'm being asked > to facet the product catagories (multi-valued field within a product mapping > document) along with other facets defined in the canonical content. > > Can this be done with Solr 3.5.0? I've been looking into sub-queries, > function queries etc. Also, I've seen various postings indicating that one > needs to denormalize more. I don't want to add product information as fields > to the canonical content. Not only does that defeat my objective (1) above, > but Solr does not support incremental updates of document fields. > > So, one approach is to issue by query to the canonical index and get all of > the document IDs (could be 1000s), and then issue a filter query to the > product mapping index with all of these IDs and have Solr facet the product > categories. Is that efficient? I suppose I could use HTTP POST (via SolrJ) > to convey that payload of IDs? I could then take the facet results of that > query and combine them with the canonical index results and return them to > the client. > > That may be do-able, but then let's say the user clicks on a product category > facet value to narrow the node results to only those mapped to category XYZ. > This will not affect the query issued against the canonical content index. > Instead, I think I'd have to go through the canonical results and eliminate > the nodes that are not associated with product category XYZ. Then, if the > current page of results is inadequate (rows=10, but 3 nodes were eliminated), > I'd have to go back to the canonical index to get more rows, eliminate some > some again perhaps, get more etc. That sounds unappealing and low performing. > > Is there a Solr way to do this? My Packt "Apache Solr 3 Enterprise Search > Server" book (page 34) states regarding separate indices: > > "If you do develop separate schemas and if you need to search across > your indices in one search then you must perform a distributed search, > described in the last chapter. A distributed search is usually a feature > employed fo
Re: SolR for time-series data
Sax is attractive, but I have found it lacking in practice. My primary issue is that in order to get sufficient recall for practical matching problems, I had to do enough query expansion that the speed advantage of inverted indexes went away. The OP was asking for blob storage, however, and I think that SolR is fine for that. There is also the question of access to time series based on annotations produced by other programs. If the annotations express your intent, then SolR wins again. IF the annotations are sax annotations and that works for you, great, but I wouldn't be optimistic that this would handle a wide range of time series problems. On Sun, Dec 4, 2011 at 5:14 AM, Grant Ingersoll wrote: > Definitely should be possible. As an aside, I've also thought one could > do more time series stuff. Have a look at the iSax stuff by Shieh and > Koegh: http://www.cs.ucr.edu/~eamonn/iSAX/iSAX.html > > > On Dec 3, 2011, at 12:10 PM, Alan Miller wrote: > > > Hi, > > > > I have a webapp that plots a bunch of time series data which > > is just a series of doubles coupled with a timestamp. > > > > Every chart in my webapp has a chart_id in my db and i am wondering if it > > would be > > effective to usr solr to serve the data to my app instead of keeping the > > data in my rdbms. > > > > Currently I'm using hadoop to calc and generate the report data and the > > sticking it in my > > rdbms but I could use solrj client to upload the data to a solr index > > directly. > > > > I know solr if for indexing text documents but would it be effective to > use > > solr in this way? > > > > I want to query by chart_id and get back a series of timestamp:double > pairs. > > > > Regards > > Alan > > > Grant Ingersoll > http://www.lucidimagination.com > > > >
quoted query issue
My query for the terms road and show body:(road show) returns 4 documents. The highlighting shows several instances where road immediately precedes show. However, a query for the phrase "road show" body:("road show") returns no documents. I have similar results with "floor show" and "road house." I have verified that the indexed text field contains the phrases I'm searching. Here's the XML response 01on2.2identifier,title,year,volumec7aab49e267body100body:("road show") What do I do now? -- Carl
Re: Configuring the Distributed
On Fri, Dec 2, 2011 at 10:48 AM, Mark Miller wrote: > You always want to use the distrib-update-chain. Eventually it will > probably be part of the default chain and auto turn in zk mode. I'm working on this now... -Yonik http://www.lucidimagination.com
Re: Configuring the Distributed
On Thu, Dec 1, 2011 at 3:39 PM, Mark Miller wrote: > On Thu, Dec 1, 2011 at 10:08 AM, Jamie Johnson wrote: > >> I am currently looking at the latest solrcloud branch and was >> wondering if there was any documentation on configuring the >> DistributedUpdateProcessor? What specifically in solrconfig.xml needs >> to be added/modified to make distributed indexing work? >> > > > Hi Jaime - take a look at solrconfig-distrib-update.xml in > solr/core/src/test-files > > You need to enable the update log, add an empty replication handler def, > and an update chain with solr.DistributedUpdateProcessFactory in it. One also needs an indexed _version_ field defined in schema.xml for versioning to work. -Yonik http://www.lucidimagination.com
Penalize certain keywords but not completely forbid them
I have a situation where each doc is described by a tag field with multiple tags. Tags come pairs. So when one tag is added to the field, it means that the opposite tag in the pair is rejected for the document. Tags are also optional, so two documents may be described by different set of tags. When I match these documents, with document sharing the same tags rank higher, and documents with opposite tags rank lower, even lower than documents that share a small number of comment tags. An example of this: Document 1: Red, Big, Heavy, ... Document 2: Red,Heavy, ... Document 3: Red, Small, ... (Red/Green is a pair, Big/Small is a pair, Heavy/Light is a pair. There may be many more pairs of tags. this is just an example.) Then when I match a new Document with "Red, Big", Document 1 should be top, Document 2 in the middle, and Document 3 in the bottom. But I still want Document 3 to show up in result because it still matches on Red. If I simply add opposite tags in the query with <1 boost (search for "Red Big Small^0.1", e.g.), it still contribute positively to the final score, document 3 will be higher than document 2. If I use "-" on the opposite terms (fieldName: (Red Big) -fieldName:Small) I'll lose document 3 altogether. What is the best strategy for implementing this? If there is nothing out of box supporting this, where should I go to modify the server itself? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Penalize-certain-keywords-but-not-completely-forbid-them-tp3559425p3559425.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolR for time-series data
Definitely should be possible. As an aside, I've also thought one could do more time series stuff. Have a look at the iSax stuff by Shieh and Koegh: http://www.cs.ucr.edu/~eamonn/iSAX/iSAX.html On Dec 3, 2011, at 12:10 PM, Alan Miller wrote: > Hi, > > I have a webapp that plots a bunch of time series data which > is just a series of doubles coupled with a timestamp. > > Every chart in my webapp has a chart_id in my db and i am wondering if it > would be > effective to usr solr to serve the data to my app instead of keeping the > data in my rdbms. > > Currently I'm using hadoop to calc and generate the report data and the > sticking it in my > rdbms but I could use solrj client to upload the data to a solr index > directly. > > I know solr if for indexing text documents but would it be effective to use > solr in this way? > > I want to query by chart_id and get back a series of timestamp:double pairs. > > Regards > Alan Grant Ingersoll http://www.lucidimagination.com
Re: Memory Leak in Solr?
Hi Chris, Thanks for you reply and sorry for delay. Please find my replies below in the mail. On Sat, Dec 3, 2011 at 5:56 AM, Chris Hostetter wrote: > > : Till 3 days ago, we were running Solr 3.4 instance with following java > : command line options > : java -server -*Xms2048m* -*Xmx4096m* -Dsolr.solr.home=etc -jar start.jar > : > : Then we increased the memory with following options and restarted the > : server > : java -server *-**Xms4096m* -*Xmx10g* -Dsolr.solr.home=etc -jar start.jar >... > : Since we restarted Solr, the memory usage of application is continuously > : increasing. The swap usage goes from almost zero to as high as 4GB in > every > : 6-8 hours. We kept restarting the Solr to push it down to ~zero but the > : same memory usage trend kept repeating itself. > > do you really mean "swap" in that sentence, or do you mean the amount of > memory your OS says java is using? You said you have 16GB total > physical ram, how big is the index itself? do you have any other processes > running on that machine? (You should ideally leave at least enough ram > free to let the OS/filesystem cache the index in RAM) > > Yes, by "swap" i mean "swap". Which we can see by "free -m" on linux and many other ways. So it is not the memory for java. The index size is around 31G. We have this machine dedicated for Solr, so no other significant processes are run here, except incremental indexing script. I didn't think about filesystem cache in RAM earlier, but since we have 16G ram so in my opinion that should be enough. Since you've not only changed the Xmx (max heap size) param but also the > Xms param (min heap size) to 4GB, it doesn't seem out of the ordinary > at all for the memory usage to jump up to 4GB quickly. If the JVM did > exactly what the docs say it should, then on startup it would > *immediatley* allocated 4GB or ram, but i think in practice it allocates > as needed, but doesn't do any garbage collection if the memory used is > still below the "Xms" value. > > : Then finally I reverted the least expected change, the command line > memory > : options, back to min 2g, max 4g and I was surprised to see that the > problem > : vanished. > : java -server *-Xms2g* *-Xmx4g* -Dsolr.solr.home=etc -jar start.jar > : > : Is this a memory leak or my lack of understanding of java/linux memory > : allocation? > > I think you're just missunderstanding the allocation ... if you tell java > to use at leaast 4GB, it's going to use at least 4GB w/o blinking. > > I accept I wrote the confusing word "min" for -Xms, but I promise I really I know its meaning. :-) did you try "-Xms2g -Xmx10g" ? > > (again: don't set Xmx any higher then you actually have the RAM to > support given the filesystem cache and any other stuff you have running, > but you can increase mx w/o increasing ms if you are just worried about > how fast the heap grows on startup ... not sure why that would be > worrisome though > As I've written in the mail above that I really meant "swap", I am not really concerned about heap size at startup. > > -Hoss > My concern is that when a single machine was able to serve n1+n2 queries earlier with -Xms2g -Xmx4g why the same machine is not able to serve n2 queries with -Xms4g -Xmx10g? In fact I tried other combinations as well 2g-6g, 1g-6g, 2g-10g but nothing replicated the issue. Since yesterday I am able to see another issue in the same machine. I saw "Too many open files" error in the log thus creating problem in incremental indexing. A lot of lines of the lsof were like following - java 1232 solr 52u sock0,51805813279 can't identify protocol java 1232 solr 53u sock0,51805813282 can't identify protocol java 1232 solr 54u sock0,51805813283 can't identify protocol I searched for "can't identify protocol" and my case seemed related to a bug http://bugs.sun.com/view_bug.do?bug_id=6745052 but my java version ("1.6.0_22") did not match in the bug description. I am not sure if this problem and the memory problem could be related. I did not check the lsof earlier. Could this be a reason of memory leak? -- Regards, Samar
Re: Solr cache size information
Thanks a lot for these answers! Elisabeth 2011/12/4 Erick Erickson > See below: > > On Thu, Dec 1, 2011 at 10:57 AM, elisabeth benoit > wrote: > > Hello, > > > > If anybody can help, I'd like to confirm a few things about Solr's caches > > configuration. > > > > If I want to calculate cache size in memory relativly to cache size in > > solrconfig.xml > > > > For Document cache > > > > size in memory = size in solrconfig.xml * average size of all fields > > defined in fl parameter ??? > > pretty much. > > > > > For Filter cache > > > > size in memory = size in solrconfig.xml * WHAT (the size of an id) ??? (I > > don't use facet.enum method) > > > > It Depends(tm). Solr tries to do the best thing here, depending upon > how many docs match the filter query. One method puts in a bitset for > each > entry, which is (maxDocs/8) bytes. maxDocs is reported on the admin/stats > page. > > If the filter cache only hits a few documents, the size is smaller than > that. > > You can think of this cache as a map where the key is the > filter query (which is how they're re-used and how autowarm > works) and the value for each key is the bitset or list. The > size of the map is bounded by the size in solrconfig.xml. > > > For Query result cache > > > > size in memory = size in solrconfig.xml * the size of an id ??? > > > Pretty much. This is the maximum size, but each entry is > the query plus a list of IDs that's up to > long. This cache is, by and large, the least of your worries. > > > > > > I would also like to know relation between solr's caches sizes and JVM > max > > size? > > Don't quite know what you're asking for here. There's nothing automatic > that's sensitive to whether the JVM memory limits are about to be exceeded. > If the caches get too big, OOMs happen. > > > > > If anyone has an answer or a link for further reading to suggest, it > would > > be greatly appreciated. > > > There's some information here: http://wiki.apache.org/solr/SolrCaching, > but > it often comes down to "try your app and monitor" > > Here's a work-in-progress that Grant is working on, be aware that it's > for trunk, not 3x. > http://java.dzone.com/news/estimating-memory-and-storage > > > Best > Erick > > > Thanks, > > Elisabeth >