Re: data/index naming format
The circumstance I've most typically seen the index. show up is when an update is sent to a slave server. The replication then appears to preserve the updated slave index in a separate folder while still respecting the correct data from the master. On Sep 5, 2013, at 8:03 PM, Shawn Heisey wrote: > On 9/5/2013 6:48 PM, Aditya Sakhuja wrote: >> I am running solr 4.1 for now, and am confused about the structure and >> naming of the contents of the data dir. I do not see the index.properties >> being generated on a fresh solr node start either. >> >> Can someone clarify when should one expect to see >> >> data/index vs. data/index., and the index.properties along with >> the second version. > > I have never seen an index.properties file get created. I've used > versions from 1.4.0 through 4.4.0. > > Generally when you have an index. directory, it's because > you're doing replication. There may be other circumstances when it > appears, but I do not know what those are. > > As for the other files in the index directory, here's Lucene's file > format documentation: > > http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/codecs/lucene42/package-summary.html#package_description > > Thanks, > Shawn >
Re: data/index naming format
On 9/5/2013 6:48 PM, Aditya Sakhuja wrote: > I am running solr 4.1 for now, and am confused about the structure and > naming of the contents of the data dir. I do not see the index.properties > being generated on a fresh solr node start either. > > Can someone clarify when should one expect to see > > data/index vs. data/index., and the index.properties along with > the second version. I have never seen an index.properties file get created. I've used versions from 1.4.0 through 4.4.0. Generally when you have an index. directory, it's because you're doing replication. There may be other circumstances when it appears, but I do not know what those are. As for the other files in the index directory, here's Lucene's file format documentation: http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/codecs/lucene42/package-summary.html#package_description Thanks, Shawn
Re: subindex
Nope. You can do this if you've stored _all_ the fields (with the exception of _version_ and the destinations of copyField directives). But there's no way I know of to do what you want if you haven't. If you have, you'd be essentially spinning through all your docs and re-indexing just the fields you cared about. But if you still have access to your original docs this would be slower/more complicated than just re-indexing from scratch. Best Erick On Wed, Sep 4, 2013 at 1:51 PM, Peyman Faratin wrote: > Hi > > Is there a way to build a new (smaller) index from an existing (larger) > index where the smaller index contains a subset of the fields of the larger > index? > > thank you
data/index naming format
Hello, I am running solr 4.1 for now, and am confused about the structure and naming of the contents of the data dir. I do not see the index.properties being generated on a fresh solr node start either. Can someone clarify when should one expect to see data/index vs. data/index., and the index.properties along with the second version. -- Regards, -Aditya Sakhuja
solrcloud shards backup/restoration
Hello, I was looking for a good backup / recovery solution for the solrcloud indexes. I am more looking for restoring the indexes from the index snapshot, which can be taken using the replicationHandler's backup command. I am looking for something that works with solrcloud 4.3 eventually, but still relevant if you tested with a previous version. I haven't been successful in have the restored index replicate across the new replicas, after I restart all the nodes, with one node having the restored index. Is restoring the indexes on all the nodes the best way to do it ? -- Regards, -Aditya Sakhuja
Re: unknown _stream_source_info while indexing rich doc in solr
: yes sir i did restart the tomcat. When you look at the Schema Browser for your default solr core (i'm guessing it's collection1?), does it list ignored_* as a dynamic field? does this URL below show you that "ignored_*" is using type "ignored" ? ... http://localhost:8983/solr/#/collection1/schema-browser?dynamic-field=ignored_* ...if not, then you aren't using the schema.xml that you think you are. -Hoss
Re: SolrCloud 4.x hangs under high update volume
Update: It is a bit too soon to tell, but about 6 hours into testing there are no crashes with this patch. :) We are pushing 500 batches of 10 updates per second to a 3 node, 3 shard cluster I mentioned above. 5000 updates per second total. More tomorrow after a 24 hr soak! Tim On Wednesday, 4 September 2013, Tim Vaillancourt wrote: > Thanks so much for the explanation Mark, I owe you one (many)! > > We have this on our high TPS cluster and will run it through it's paces > tomorrow. I'll provide any feedback I can, more soon! :D > > Cheers, > > Tim >
Re: Solr 4.3 Startup with Multiple Cores Hangs on "Registering Core"
: I currently have Solr 4.3 set up with about 400 cores set to load upon : start up. When starting Solr with an empty index for each core, Solr is : able to load all of the cores and start up normally as expected. : However, after running a dataimport on all cores and restarting Solr, it : hangs at "org.apache.solr.core.CoreContainer; registering core: ..." : without any type of error message in the log. The process still exists : at this point, but doesn't make any progress even if left for a period : of time. Prior to the restart, Solr continues to function normally, and : is searchable. When solr gets into this state, can you generate a thread dump, wait 20-30 seconds, generate another thread dump, and then send both to the list so we can see what's going on at this point? The easiest way to generate a threaddump is with jstack on the same machine... jstack >> threaddumps.log : hang at the same spot. It does appear to be related to files to an : extent, since removing the index/"data" directory of half of the cores : does allow Solr to start up normally. wild shot in the dark -- is it possible you have really large transaction logs that are being replayed on startup, because you never did a hard commit after indexing? can you also include in your next email a listing of all the files in all the data dirs of the affected solr instance, including file sizes? something along the lines of this command output from your solr home dir... du -ab */data ? -Hoss
Solr substring search
Hello, I'm trying to find out how Solr runs a query for "*foo*". Google tells me that you need to use NGramFilterFactory for that kind of substring search, but I find that even with very simple fieldTypes, it just works. (Perhaps because I'm testing on very small data sets, Solr is willing to look through all the keywords.) e.g. This works on the tutorial. Can someone tell me exactly how this works and/or point me to the Lucene code that implements this? Thanks, Scott
Solr Cell Question
Is it possible to configure solr cell to only extract and store the body of a document when indexing? I'm currently doing the following which I thought would work ModifiableSolrParams params = new ModifiableSolrParams(); params.set("defaultField", "content"); params.set("xpath", "/xhtml:html/xhtml:body/descendant::node()"); ContentStreamUpdateRequest up = new ContentStreamUpdateRequest( "/update/extract"); up.setParams(params); FileStream f = new FileStream(new File("..")); up.addContentStream(f); up.setAction(ACTION.COMMIT, true, true); solrServer.request(up); But the result of content is as follows null ISO-8859-1 text/plain; charset=ISO-8859-1 Just a little test What I had hoped for was just Just a little test
Re: More on topic of Meta-search/Federated Search with Solr
Hello list, A student of a friend of mine made his masters on that topic, especially about federated ranking. I have copied his text here: http://direct.hoplahup.net/tmp/FederatedRanking-Koblischke-2009.pdf Feel free to contact me to contact Robert Koblischke for questions. Paul On 28 août 2013, at 20:35, Dan Davis wrote: > On Mon, Aug 26, 2013 at 9:06 PM, Amit Jha wrote: > >> Would you like to create something like >> http://knimbus.com >> > > I work at the National Library of Medicine. We are moving our library > catalog to a newer platform, and we will probably include articles. The > article's content and meta-data are available from a number of web-scale > discovery services such as PRIMO, Summon, EBSCO's EDS, EBSCO's "traditional > API". Most libraries use open source solutions to avoid the cost of > purchasing an expensive enterprise search platform. We are big; we > already have a closed-source enterprise search engine (and our own home > grown Entrez search used for PubMed).Since we can already do Federated > Search with the above, I am evaluating the effort of adding such to Apache > Solr. Because NLM data is used in the open relevancy project, we actually > have the relevancy decisions to decide whether we have done a good job of > it. > > I obviously think it would be "Fun" to add Federated Search to Apache Solr. > > *Standard disclosure *- my opinion's do not represent the opinions of NIH > or NLM."Fun" is no reason to spend tax-payer money.Enhancing Apache > Solr would reduce the risk of "putting all our eggs in one basket." and > there may be some other relevant benefits. > > We do use Apache Solr here for more than one other project... so keep up > the good work even if my working group decides to go with the closed-source > solution.
Re: charfilter doesn't do anything
On 9/5/2013 10:03 AM, Andreas Owen wrote: > i would like to filter / replace a word during indexing but it doesn't do > anything and i dont get a error. > > in schema.xml i have the following: > > multiValued="true"/> > > > > > pattern="Zahlungsverkehr" replacement="ASDFGHJK" /> > > > > > my 2. question is where can i say that the expression is multilined like in > javascript i can use /m at the end of the pattern? I don't know about your second question. I don't know if that will be possible, but I'll leave that to someone who's more expert than I. As for the first question, here's what I have. Did you reindex? That will be required. http://wiki.apache.org/solr/HowToReindex Assuming that you did reindex, are you trying to search for ASDFGHJK in a field that contains more than just "Zahlungsverkehr"? The keyword tokenizer might not do what you expect - it tokenizes the entire input string as a single token, which means that you won't be able to search for single words in a multi-word field without wildcards, which are pretty slow. Note that both the pattern and replacement are case sensitive. This is how regex works. You haven't used a lowercase filter, which means that you won't be able to search for asdfghjk. Use the analysis tab in the UI on your core to see what Solr does to your field text. Thanks, Shawn
Re: charfilter doesn't do anything
And show us an input string and a query that fail. -- Jack Krupansky -Original Message- From: Shawn Heisey Sent: Thursday, September 05, 2013 2:41 PM To: solr-user@lucene.apache.org Subject: Re: charfilter doesn't do anything On 9/5/2013 10:03 AM, Andreas Owen wrote: i would like to filter / replace a word during indexing but it doesn't do anything and i dont get a error. in schema.xml i have the following: multiValued="true"/> pattern="Zahlungsverkehr" replacement="ASDFGHJK" /> my 2. question is where can i say that the expression is multilined like in javascript i can use /m at the end of the pattern? I don't know about your second question. I don't know if that will be possible, but I'll leave that to someone who's more expert than I. As for the first question, here's what I have. Did you reindex? That will be required. http://wiki.apache.org/solr/HowToReindex Assuming that you did reindex, are you trying to search for ASDFGHJK in a field that contains more than just "Zahlungsverkehr"? The keyword tokenizer might not do what you expect - it tokenizes the entire input string as a single token, which means that you won't be able to search for single words in a multi-word field without wildcards, which are pretty slow. Note that both the pattern and replacement are case sensitive. This is how regex works. You haven't used a lowercase filter, which means that you won't be able to search for asdfghjk. Use the analysis tab in the UI on your core to see what Solr does to your field text. Thanks, Shawn
Re: Numeric fields and payload
Peter: I don't quite get this. Formatting to display is trivial as it's usually done for just a few docs anyway. You could also just store the original unaltered value and add an additional "normalized" field. Best Erick On Wed, Sep 4, 2013 at 2:02 PM, PETER LENAHAN wrote: > Chris Hostetter fucit.org> writes: > > > > > > > : is it possible to store (text) payload to numeric fields (class > > : solr.TrieDoubleField)? My goal is to store measure units to numeric > > : features - e.g. '1.5 cm' - and to use faceted search with these fields. > > : But the field type doesn't allow analyzers to add the payload data. I > > : want to avoid database access to load the units. I'm using Solr 4.2 . > > > > I'm not sure if it's possible to add payloads to Trie fields, but even if > > there is i don't think you really want that for your usecase -- i think > it > > would make a lot more sense to normalize your units so you do consistent > > sorting, range queries, and faceting on the values regardless of wether > > it's 100cm or 1000mm or 1m. > > > > -Hoss > > > > > > Hoss, What you suggest may be fine for specific units. But for monetary > values with formatting it is not realistic. $10,000.00 would require > formatting the number to display it. It would be much easier to store the > string as a payload with the formatted value. > > > Peter Lenahan > >
Odd behavior after adding an additional core.
using solr 4.4 , i used collection admin to create a collection 4shards replication - factor of 1 i did this so i could index my data, then bring in replicas later by adding cores via coreadmin i added a new core via coreadmin, what i noticed shortly after adding the core, the leader of the shard where the new replica was placed was marked active the new core marked as the leader and the routing was now set to implicit. i've replicated this on another solr setup as well. Any ideas? Thanks msj
Solr documents update on index
Hi, I'm having a problem when solr indexes. It is updating documents already indexed. Is this a normal behavior? If a document with the same key already exists is it supposed to be updated? I has thinking that is supposed to just update if the information on the rss has changed. Appreciate your help -- Sent from Gmail Mobile
bucket count for facets
Is there a way to get the count of buckets (ie unique values) for a field facet? the rudimentary approach of course is to get back all buckets, but in some cases this is a huge amount of data. thanks, steve
Loading a SpellCheck dynamically
I currently have multiple spellchecks configured in my solrconfig.xml to handle a variety of different spell suggestions in different languages. In the snippet below, I have a catch-all spellcheck as well as an English only one for more accurate matching (I.e. my schema.xml is set up to capture english only fields to an english-specific textSpell_en field and then I also capture to a generic textSpell field): ---solrconfig.xml--- textSpell_en default spell_en ./spellchecker_en true textSpell default spell ./spellchecker true My question is; when I query my Solr index, am I able to load, say, just spellcheck values from the spellcheck_en spellchecker rather than from both? This would be useful if I were to start implementing additional language spellchecks; E.g. spellcheck_ja, spellcheck_fr, etc. Thanks for any insights. Cheers Hayden
Solr 4.3 Startup with Multiple Cores Hangs on "Registering Core"
Hello, I currently have Solr 4.3 set up with about 400 cores set to load upon start up. When starting Solr with an empty index for each core, Solr is able to load all of the cores and start up normally as expected. However, after running a dataimport on all cores and restarting Solr, it hangs at "org.apache.solr.core.CoreContainer; registering core: ..." without any type of error message in the log. The process still exists at this point, but doesn't make any progress even if left for a period of time. Prior to the restart, Solr continues to function normally, and is searchable. Solr is currently running in master-slave replication, and this same, exact behavior occurs on the master and both slaves. I've checked all of the system log files and am also unable to find any errors or messages that would point to a particular problem. Originally, I had thought it may have been related to an open file limit, but I also tried raising the limit to 65k, and Solr continued to hang at the same spot. It does appear to be related to files to an extent, since removing the index/"data" directory of half of the cores does allow Solr to start up normally. Any help or suggestions are appreciated. Thanks!
charfilter doesn't do anything
i would like to filter / replace a word during indexing but it doesn't do anything and i dont get a error. in schema.xml i have the following: my 2. question is where can i say that the expression is multilined like in javascript i can use /m at the end of the pattern?
Re: JSON update request handler & commitWithin
Ya, looks like this is a bug in Datastax Enterprise 3.1.2. I'm using their enterprise cluster search product which is built on SOLR 4. :( On 9/5/13 11:24 AM, "Jack Krupansky" wrote: >I just tried commitWithin with the standard Solr example in Solr 4.4 and >it works fine. > >Can you reproduce your problem using the standard Solr example in Solr >4.4? > >-- Jack Krupansky > >From: Ryan, Brent >Sent: Thursday, September 05, 2013 10:39 AM >To: solr-user@lucene.apache.org >Subject: JSON update request handler & commitWithin > >I'm prototyping a search product for us and I was trying to use the >"commitWithin" parameter for posting updated JSON documents like so: > >curl -v >'http://localhost:8983/solr/proposal.solr/update/json?commitWithin=1' >--data-binary @rfp.json -H 'Content-type:application/json' > >However, the commit never seems to happen as you can see below there are >still 2 docsPending (even 1 hour later). Is there a trick to getting >this to work with submitting to the json update request handler? >
Re: JSON update request handler & commitWithin
They have modified the mechanisms for committing documents…Solr in DSE is not stock Solr...so you are likely encountering a boundary where stock Solr behavior is not fully supported. I would definitely reach out to them to find out if they support the request. On Sep 5, 2013, at 8:27 AM, "Ryan, Brent" wrote: > Ya, looks like this is a bug in Datastax Enterprise 3.1.2. I'm using > their enterprise cluster search product which is built on SOLR 4. > > :( > > > > On 9/5/13 11:24 AM, "Jack Krupansky" wrote: > >> I just tried commitWithin with the standard Solr example in Solr 4.4 and >> it works fine. >> >> Can you reproduce your problem using the standard Solr example in Solr >> 4.4? >> >> -- Jack Krupansky >> >> From: Ryan, Brent >> Sent: Thursday, September 05, 2013 10:39 AM >> To: solr-user@lucene.apache.org >> Subject: JSON update request handler & commitWithin >> >> I'm prototyping a search product for us and I was trying to use the >> "commitWithin" parameter for posting updated JSON documents like so: >> >> curl -v >> 'http://localhost:8983/solr/proposal.solr/update/json?commitWithin=1' >> --data-binary @rfp.json -H 'Content-type:application/json' >> >> However, the commit never seems to happen as you can see below there are >> still 2 docsPending (even 1 hour later). Is there a trick to getting >> this to work with submitting to the json update request handler? >> >
Re: JSON update request handler & commitWithin
I just tried commitWithin with the standard Solr example in Solr 4.4 and it works fine. Can you reproduce your problem using the standard Solr example in Solr 4.4? -- Jack Krupansky From: Ryan, Brent Sent: Thursday, September 05, 2013 10:39 AM To: solr-user@lucene.apache.org Subject: JSON update request handler & commitWithin I'm prototyping a search product for us and I was trying to use the "commitWithin" parameter for posting updated JSON documents like so: curl -v 'http://localhost:8983/solr/proposal.solr/update/json?commitWithin=1' --data-binary @rfp.json -H 'Content-type:application/json' However, the commit never seems to happen as you can see below there are still 2 docsPending (even 1 hour later). Is there a trick to getting this to work with submitting to the json update request handler?
JSON update request handler & commitWithin
I'm prototyping a search product for us and I was trying to use the "commitWithin" parameter for posting updated JSON documents like so: curl -v 'http://localhost:8983/solr/proposal.solr/update/json?commitWithin=1' --data-binary @rfp.json -H 'Content-type:application/json' However, the commit never seems to happen as you can see below there are still 2 docsPending (even 1 hour later). Is there a trick to getting this to work with submitting to the json update request handler? [cid:483C4A1C-D20D-4AAB-822E-DFCA03026572]
Re: Tweaking boosts for more search results variety
The grouping (field collapsing) feature somewhat addresses this - group by a "site" field and then if more than one or a few top pages are from the same site they get grouped or collapsed so that you can see more sites in a few results. See: http://wiki.apache.org/solr/FieldCollapsing https://cwiki.apache.org/confluence/display/solr/Result+Grouping -- Jack Krupansky -Original Message- From: Sai Gadde Sent: Thursday, September 05, 2013 2:27 AM To: solr-user@lucene.apache.org Subject: Tweaking boosts for more search results variety Our index is aggregated content from various sites on the web. We want good user experience by showing multiple sites in the search results. In our setup we are seeing most of the results from same site on the top. Here is some information regarding queries and schema site - String field. We have about 1000 sites in index sitetype - String field. we have 3 site types omitNorms="true" for both the fields Doc count varies largely based on site and sitetype by a factor of 10 - 1000 times Total index size is about 5 million docs. Solr Version: 4.0 In our queries we have a fixed and preferential boost for certain sites. sitetype has different and fixed boosts for 3 possible values. We turned off Inverse Document Frequency (IDF) for these boosts to work properly. Other text fields are boosted based on search keywords only. With this setup we often see a bunch of hits from a single site followed by next etc., Is there any solution to see results from variety of sites and still keep the preferential boosts in place?
Re: Solr Cloud hangs when replicating updates
If you run into this again, try a jstack trace. You should see evidence of being stuck in SolrCmdDistributor on a variable called "semaphore"... On current 4x this is around line 420. If you're using SolrJ, then SOLR-4816 is another thing to try. But Mark's patch would be best of all to test, If that doesn't fix it then the jstack suggestion would at least tell us if it's the issue we think it is. FWIW, Erick On Wed, Sep 4, 2013 at 12:51 PM, Mark Miller wrote: > It would be great if you could give this patch a try: > http://pastebin.com/raw.php?i=aaRWwSGP > > - Mark > > > On Wed, Sep 4, 2013 at 8:31 AM, Kevin Osborn > wrote: > > > Thanks. If there is anything I can do to help you resolve this issue, let > > me know. > > > > -Kevin > > > > > > On Wed, Sep 4, 2013 at 7:51 AM, Mark Miller > wrote: > > > > > Ill look at fixing the root issue for 4.5. I've been putting it off for > > > way to long. > > > > > > Mark > > > > > > Sent from my iPhone > > > > > > On Sep 3, 2013, at 2:15 PM, Kevin Osborn > wrote: > > > > > > > I was having problems updating SolrCloud with a large batch of > records. > > > The > > > > records are coming in bursts with lulls between updates. > > > > > > > > At first, I just tried large updates of 100,000 records at a time. > > > > Eventually, this caused Solr to hang. When hung, I can still query > > Solr. > > > > But I cannot do any deletes or other updates to the index. > > > > > > > > At first, my updates were going as SolrJ CSV posts. I have also tried > > > local > > > > file updates and had similar results. I finally slowed things down to > > > just > > > > use SolrJ's Update feature, which is basically just JavaBin. I am > also > > > > sending over just 100 at a time in 10 threads. Again, it eventually > > hung. > > > > > > > > Sometimes, Solr hangs in the first couple of chunks. Other times, it > > > hangs > > > > right away. > > > > > > > > These are my commit settings: > > > > > > > > > > > > 15000 > > > > 5000 > > > > false > > > > > > > > > > > > 3 > > > > > > > > > > > > I have tried quite a few variations with the same results. I also > tried > > > > various JVM settings with the same results. The only variable seems > to > > be > > > > that reducing the cluster size from 2 to 1 is the only thing that > > helps. > > > > > > > > I also did a jstack trace. I did not see any explicit deadlocks, but > I > > > did > > > > see quite a few threads in WAITING or TIMED_WAITING. It is typically > > > > something like this: > > > > > > > > java.lang.Thread.State: WAITING (parking) > > > >at sun.misc.Unsafe.park(Native Method) > > > >- parking to wait for <0x00074039a450> (a > > > > java.util.concurrent.Semaphore$NonfairSync) > > > >at > > > java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > > > >at > > > > > > > > > > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > > > >at > > > > > > > > > > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) > > > >at > > > > > > > > > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) > > > >at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) > > > >at > > > > > > > > > > org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) > > > >at > > > > > > > > > > org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) > > > >at > > > > > > > > > > org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) > > > >at > > > > > > > > > > org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) > > > >at > > > > > > > > > > org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:139) > > > >at > > > > > > > > > > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:474) > > > >at > > > > > > > > > > org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395) > > > >at > > > > > > > > > > org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44) > > > >at > > > > > > org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364) > > > >at > > > org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31) > > > >at > > > > > > > > > > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) > > > >at > > > > > > > > > > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > > > >at > > > > > > > > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > > > >at org.apache.solr.core.SolrCore.exec
Re: Need help on Joining and sorting syntax and limitations between multiple documents in solr-4.4.0
The very first thing I'd do is see if you can _not_ use joins. Especially if you're coming from a RDBMS background. Joins in Solr are somewhat specialized and are NOT equivalent to db joins. First of all there's no way to get fields from the "from" part of the join returned in the results. Secondly, there are a number of cases where the performance isn't stellar. Thirdly... The first approach is always to explore denormalizing the data so you can do straight searches rather than joins. Second is to think about your use case carefully and se if there are clever indexing schemes that allow you to not use joins. Only after those avenues are exhausted would I rely on joins. There's a reason they are sometimes referred to as "pseudo joins" Best, Erick On Wed, Sep 4, 2013 at 4:19 AM, Sukanta Dey wrote: > Hi Team, > > In my project I am going to use Apache solr-4.4.0 version for searching. > While doing that I need to join between multiple solr documents within the > same core on one of the common field across the documents. > Though I successfully join the documents using solr-4.4.0 join syntax, it > is returning me the expected result, but, since my next requirement is to > sort the returned result on basis of the fields from the documents > Involved in join condition's "from" clause, which I was not able to get. > Let me explain the problem in detail along with the files I am using ... > > > 1) Files being used : > > a. Picklist_1.xml > > -- > > > > t1324838 > > 7 > > 956 > > 130712901 > > Draft > > Draoft > > > > > > b. Picklist_2.xml > > --- > > > > t1324837 > > 7 > > 87749 > > 130712901 > > New > > Neuo > > > > > > c. AssetID_1.xml > > --- > > > > t1324837 > > a180894808 > > 1 > > true > > 2013-09-02T09:28:18Z > > 130713716 > > 130712901 > > > > > > d. AssetID_2.xml > > > > > > t1324838 > > a171658357 > > 1 > > 130713716 > > 2283961 > > 2290309 > > 7 > > 7 > > 13503796 > 15485964 > > 38052 > > 41133 > > 130712901 > > > > > > 2) Requirement: > > > > i. It needs to have a join between the files using > "def14227_picklist" field from AssetID_1.xml and AssetID_2.xml and > "describedObjectId" field from Picklist_1.xml and Picklist_2.xml files. > > ii. After joining we need to have all the fields from > the files AssetID_*.xml and "en","gr" fields from Picklist_*.xml files. > > iii. While joining we also sort the result based on the > "en" field value. > > > > 3) I was trying with "q={!join from=inner_id to=outer_id}zzz:vvv" > syntax but no luck. > > Any help/suggestion would be appreciated. > > Thanks, > Sukanta Dey > > > > >
Tweaking Edismax on the Phrase Fields
Hi, I have a doubt about the raw query that is parsed from a edismax query. Form example the query: _query_:"{!edismax mm=100% bf='log(div(9900,producttier))' pf='name_synonyms~100^3 name~100^6 heading~100^20' pf2='name_synonyms~100^3 name~100^6 heading~100^20' qf='name_synonyms^3 name^6 heading^20'}hotel centro lisboa" is transformed into (+((DisjunctionMaxQuery((name_synonyms:hotel^3.0 | heading:hotel^20.0 | name:hotel^6.0)) DisjunctionMaxQueryname_synonyms:semtr name_synonyms:centr)^3.0) | ((heading:semtr heading:centr)^20.0) | ((name:semtr name:centr)^6.0))) DisjunctionMaxQueryname_synonyms:lisbon name_synonyms:lisbo)^3.0) | ((heading:lisbon heading:lisbo)^20.0) | ((name:lisbon name:lisbo)^6.0~3) DisjunctionMaxQuery((name_synonyms:\"hotel (semtr centr) (lisbon lisbo)\"~100^3.0)) DisjunctionMaxQuery((name:\"hotel (semtr centr) (lisbon lisbo)\"~100^6.0)) DisjunctionMaxQuery((heading:\"hotel (semtr centr) (lisbon lisbo)\"~100^20.0)) (DisjunctionMaxQuery((name_synonyms:\"hotel (semtr centr)\"~100^3.0)) DisjunctionMaxQuery((name_synonyms:\"(semtr centr) (lisbon lisbo)\"~100^3.0))) (DisjunctionMaxQuery((name:\"hotel (semtr centr)\"~100^6.0)) DisjunctionMaxQuery((name:\"(semtr centr) (lisbon lisbo)\"~100^6.0))) (DisjunctionMaxQuery((heading:\"hotel (semtr centr)\"~100^20.0)) DisjunctionMaxQuery((heading:\"(semtr centr) (lisbon lisbo)\"~100^20.0))) FunctionQuery(log(div(const(9900),int(producttier)/no_coord As you can see for each field on a phrase query a new DisjunctionMaxQuery is created. Why the behaviour is not same as the qf? On the qf the most important field (max) is what is counts. on the phrase query all fields participate on the final score. Is there any way to emulate the qf behaviour of the qf (one DisjunctionMaxQuery for each combination) on the pf? like one DisjunctionMaxQuery for pf, another for the pf2, etc Regards Bruno -- Bruno René Santos Lisboa - Portugal
Re: Little XsltResponseWriter documentation bug (Attn: Wiki Admin)
Dimitri I've added you to the https://wiki.apache.org/solr/ContributorsGroup - feel free to improve the wiki :) - Stefan On Wednesday, September 4, 2013 at 11:46 PM, Dmitri Popov wrote: > Upayavira, > > I could edit that page myself, but need to be confirmed human according to > http://wiki.apache.org/solr/FrontPage#How_to_edit_this_Wiki > > My wiki account name is 'pin' just in case. > > On Wed, Sep 4, 2013 at 5:27 PM, Upayavira (mailto:u...@odoko.co.uk)> wrote: > > > It's a wiki. Can't you correct it? > > > > Upayavira > > > > On Wed, Sep 4, 2013, at 08:25 PM, Dmitri Popov wrote: > > > Hi, > > > > > > http://wiki.apache.org/solr/XsltResponseWriter (and reference manual PDF > > > too) become out of date: > > > > > > In configuration section > > > > > > > > name="xslt" > > > class="org.apache.solr.request.XSLTResponseWriter"> > > > 5 > > > > > > > > > class name > > > > > > org.apache.solr.request.XSLTResponseWriter > > > > > > should be replaced by > > > > > > org.apache.solr.response.XSLTResponseWriter > > > > > > Otherwise ClassNotFoundException happens. Change is result of > > > https://issues.apache.org/jira/browse/SOLR-1602 as far as I see. > > > > > > Apparently can't update that page myself, please could someone else do > > > that? > > > > > > Thanks!
Re: Solr 4.3: Recovering from "Too many values for UnInvertedField faceting on field"
We had a similar case for multivalued fields with a lot of unique values per field in some cases. Using facet.method=enum instead of facet.method=fc fixed the problem. Can run slower though. Dmitry On Tue, Sep 3, 2013 at 5:04 PM, Dennis Schafroth wrote: > We are harvesting and indexing bibliographic data, thus having many > distinct author names in our index. While testing Solr 4 I believe I had > pushed a single core to 100 million records (91GB of data) and everything > was working fine and fast. After adding a little more to the index, then > following started to happen: > > 17328668 [searcherExecutor-4-thread-1] WARN org.apache.solr.core.SolrCore > – Approaching too many values for UnInvertedField faceting on field > 'author_exact' : bucket size=16726546 > 17328701 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore > – UnInverted multi-valued field > {field=author_exact,memSize=336715415,tindexSize=5001903,time=31595,phase1=31465,nTerms=12048027,bigTerms=0,termInstances=57751332,uses=0} > 18103757 [searcherExecutor-4-thread-1] ERROR org.apache.solr.core.SolrCore > – org.apache.solr.common.SolrException: Too many values for UnInvertedField > faceting on field author_exact > at org.apache.solr.request.UnInvertedField.(UnInvertedField.java:181) > at > org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:664) > > I can see that we reached a limit of bucket size. Is there a way to adjust > this? The index also seem to explode in size (217GB). > > Thinking that I had reached a limit for what a single core could handle in > terms of facet, I deleted records in the index, but even now at 1/3 (32 > million) it will still fails with above error. I have optimised with > expungeDeleted=true. The index is somewhat larger (76GB) than I would have > expected. > > While we can still use the index and get facets back using enum method on > that field, I would still like a way to fix the index if possible. Any > suggestions? > > cheers, > :-Dennis