DIH include Fieldset in query
hello.. i have many big entities in my data-config.xml. in the many entities is the same query. the entities look like this: entity name=name transformer=DateFormatTransformer pk=id query= SELECT field as fielname, IF(bla NOT NULL, 1, 0) AS blob, fieldname, fieldname AS field, ... more and more. is it possible to include text from a file or something like this, in data-config.xml??? -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-include-Fieldset-in-query-tp3994798.html Sent from the Solr - User mailing list archive at Nabble.com.
Computed fields - can I put a function in fl?
Hi, I have 2 fields, one containing a string (product) and another containing a boolean (show_product). Is there a way of returning the product field with a value of null when the show_product field is false? I can make another field (product_computed) and index that with null where I need but I would like to understand if there is a better approach like putting a function query in the fl and make a computed field. something like: q=*:*fq=start=0rows=10fl=qt=wt=explainOther=hl.fl=/*product:(if(show_product:true, product, )*/ that obviously doesn't work. thanks for any help Maurizio -- View this message in context: http://lucene.472066.n3.nabble.com/Computed-fields-can-I-put-a-function-in-fl-tp3994799.html Sent from the Solr - User mailing list archive at Nabble.com.
4.0-ALPHA for general development use?
Hi, we are considering a long-term project (likely lifecycle of several years) with an initial production release in approximately three months. We're intending to use Solr 3.6.0, with a view for upgrading to 4.0 upon stable release. However, http://lucene.apache.org/solr/ now has 4.0-ALPHA as the main download, implying this version is for general use. But on the other hand, the release notes state This is an alpha release for early adopters. and http://wiki.apache.org/solr/Solr4.0 gives a timescale of 60 days minimum before final release. We'd like to use 4.0 features such as near real-time updates, but haven't identified these as must-haves for the initial release. Given that our first production release is likely to occur a month after that 60 days, is 4.0-ALPHA suitable for general product development, or is it recommended to stick with 3.6.0 and accept an upgrade cost when 4.0 is stable? (Perhaps this hinges on understanding why 4.0-ALPHA is now the main download option). Thanks.
Multivalued attibute grouping in SOLR
I came across a problem where one of my column is multivalued. eg: value can be (11,22) (11,33) (11,55) , (22,44) , (22,99) I want to perform a grouping operation that will yield: * 11 : count 3 * 22 : count 3 * 33 : 1 * 44 : 1 * 55 : 1 * 99 : 1 -- View this message in context: http://lucene.472066.n3.nabble.com/Multivalued-attibute-grouping-in-SOLR-tp3994785.html Sent from the Solr - User mailing list archive at Nabble.com.
DIH - incorrect datasource being picked up by XPathEntityProcessor
Hi, I am getting this error with DIH using a combination of SQLEntityProcessor and XPathEntityProcessor: Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: null Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71) ... Caused by: java.sql.SQLException: SQL statement to execute cannot be empty or null ... when I debug the DataImport code, I see that the wrong datasource is being picked up by XPathEntityProcessor. It would always pick up JDBCDataSource even though I have configured it to use FieldReaderDataSource. Below is how my data-config file looks like dataConfig dataSource name=globalDS type=JdbcDataSource driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@//host:port/dbname user=foo password=foo / datasource name=fieldSource type=FieldReaderDataSource / datasource name=filesource type=FileDataSource / document name=doc entity name=parent dataSource=globalDS query=select * from PARENT_TBL where PARENT_ID=1 field column=PARENT_ID name=id / field column=PARENT_NAME name=Parent_Name / entity name=child dataSource=globalDS query=select XML from CHILD_TBL where PARENT_ID=1 transformer=ClobTransformer field column=XML name=Message clob=true / entity name=child_xml rootEntity=true dataSource=fieldSource dataField=child.Message processor=XPathEntityProcessor forEach=/children/child field column=Child_Name xpath=/children/child/Child_Name / field column=Child_Age xpath=/children/child/Child_Age / /entity /entity /entity /document /dataConfig Appreciate any help thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-incorrect-datasource-being-picked-up-by-XPathEntityProcessor-tp3994802.html Sent from the Solr - User mailing list archive at Nabble.com.
Groups count in distributed grouping is wrong in some case
Hi, I have problem with faceting count in distributed grouping. It appears only when I make query that returns almost all of the documents. My SOLR implementation has 4 shards and my queries looks like: http://host:port /select/q?=*:*shards=shard1,shard2,shard3,shard4group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1 With query like above I get strange counts for field category1. The counts for values are very big: int name=val19659/int int name=val27015/int int name=val35676/int int name=val41180/int int name=val51105/int int name=val6979/int int name=val7770/int int name=val8701/int int name=612/int int name=val9422/int int name=val10358/int When I make query to narrow the results adding to query fq=category1:val1, etc. I get different counts than facet category1 shows for a few first values: fq=category1:val1 - counts: 22 fq=category1:val2 - counts: 22 fq=category1:val3 - counts: 21 fq=category1:val4 - counts: 19 fq=category1:val5 - counts: 19 fq=category1:val6 - counts: 20 fq=category1:val7 - counts: 20 fq=category1:val8 - counts: 25 fq=category1:val9 - counts: 422 fq=category1:val10 - counts: 358 From val9 the count is ok. First I thought that for some values in facet category1 groups count does not work and it returns counts of all documents not group by field id. But the number of all documents matches query fq=category1:val1 is 45468. So the numbers are not the same. I check the queries on each shard for val1 and the results are: shard1: query: http://shard1/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1 lst name=fcategory int name=val111/int query: http://shard1/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1 :val1 shard 2: query: http://shard2/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1 there is no value val1 in category1 facet. query: http://shard2/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1 :val1 int name=ngroups7/int shard3: query: http://shard3/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1 there is no value val1 in category1 facet query: http://shard3/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1 :val1 int name=ngroups4/int So it looks that detail query with fq=category1:val1 returns the relevant results. But Solr has problem with faceting counts when one of the shard does not return the faceting value (in this scenario val1) that exists on other shards. I checked shards for val10 and I got: shard1: count for val10 - 142 shard2: count for val10 - 131 shard3: count for val10 - 149 sum of counts 422 - ok. I'm not sure how to resolve that situation. For sure the counts of val1 to val9 should be different and they should not be on the top of the category1 facet because this is very confusing. Do you have any idea how to fix this problem? Best regards Agnieszka
Re: DataImport using last_indexed_id or getting max(id) quickly
You could also just keep a special document in your index with a known ID that contains meta-data fields. If this document had no fields in common with any other document it wouldn't satisfy searches (except the *:* search). Or you could store this info somewhere else (file, DB, etc). Or you can commit with user data, although this isn't exposed through Solr yet, see: https://issues.apache.org/jira/browse/SOLR-2701 Best Erick On Thu, Jul 12, 2012 at 5:22 AM, karsten-s...@gmx.de wrote: Hi Avenka, you asked for a HowTo to add a field inverseID which allows to calculate max(id) from its first term: If you do not use solr you have to calculate 1 - id and store it in an extra field inverseID. If you fill solr with your own code, add a TrieLongField inverseID and fill with the value -id. If you only want to change schema.xml (and add some classes): * You need a new FieldType inverseLongType and a Field inverseID of Type inverseLongType * You need a line copyField source=id dest=inverseID/ (see http://wiki.apache.org/solr/SchemaXml#Copy_Fields) For inverseLongType I see two possibilities a) use TextField and make your own filter to calculate 1 - id b) extends TrieLongField to a new FieldType InverseTrieLongField with: @Override public String readableToIndexed(String val) { return super.readableToIndexed(Long.toString( -Long.parseLong(val))); } @Override public Fieldable createField(SchemaField field, String externalVal, float boost) { return super.createField(field,Long.toString( -Long.parseLong(val)), boost ); } @Override public Object toObject(Fieldable f) { Object result = super.toObject(f); if(result instanceof Long){ return new Long( -((Long)result).longValue()); } return result; } Beste regards Karsten View this message in context: http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-or-getting-max-id-quickly-tp3993763p3994560.html Original-Nachricht Datum: Wed, 11 Jul 2012 20:59:10 -0700 (PDT) Von: avenka ave...@gmail.com An: solr-user@lucene.apache.org Betreff: Re: DataImport using last_indexed_id or getting max(id) quickly Thanks. Can you explain more the first TermsComponent option to obtain max(id)? Do I have to modify schema.xml to add a new field? How exactly do I query for the lowest value of 1 - id? -- View this message in context: http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-or-getting-max-id-quickly-tp3993763p3994560.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Capacity Planning Guidance
This question, reasonable as it appears, is just unanswerable in the abstract. About all you can do is prototype and test. Take facet queries. The hardware requirements vary drastically based on the number of unique values in the field(s) you're faceting on, as well as whether they're multi-valued or not However, your data set isn't very big as far as document count is concerned (although how big are those documents? 1K? 1G?). I'd expect, assuming that you're not talking very large documents, that you could fit this on any reasonable machine and test, see jMeter or SolrMeter. I've put 11M documents on my 2009 laptop easily (Wikipedia english). Admittedly just 5M of them have text. At any rate, set up jMeter or SolrMeter on a test machine and keep adding documents until you strain it and go from there. Best Erick On Thu, Jul 12, 2012 at 5:46 AM, Sohail Aboobaker sabooba...@gmail.com wrote: We will be using Solr 3.6
Re: shard connection timeout
Right, as Michael says you've allocated a massive amount of memory. I suspect you're just hitting a full-pause garbage collection and it's taking a lng time. Do you know whether you actually need all that memory or was it allocated on the theory that more is better? We often recommend that you _start_ with about 1/2 the memory to the JVM and the rest to the OS. In your case, I'd back that off even more unless you have evidence that this much memory is required. Best Erick On Thu, Jul 12, 2012 at 8:54 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hi, Jason, That's a huge heap. Which Directory implementation are you using? It might make more sense to drastically reduce that heap and let the OS buffer the index at a lower level. Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Wed, Jul 11, 2012 at 10:05 PM, Jason hialo...@gmail.com wrote: Hi Erick Our physical memory is 128g and jvm option is -Xms110g -Xmx110g -XX:PermSize=512m -XX:MaxPermSize=512m This is very huge memory allocation. But we are servicing patent search and that is accepted very complex queries used memory. -- View this message in context: http://lucene.472066.n3.nabble.com/shard-connection-timeout-tp3994301p3994556.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH - incorrect datasource being picked up by XPathEntityProcessor
On 13 July 2012 15:22, girishyes girish...@gmail.com wrote: Hi, I am getting this error with DIH using a combination of SQLEntityProcessor and XPathEntityProcessor: entity name=child_xml rootEntity=true dataSource=fieldSource dataField=child.Message processor=XPathEntityProcessor forEach=/children/child There might be other issues, but you probably want dataField=child.XML Regards, Gora
Re: Case-insensitive on facet prefix
You'll have to lowercase your facet.prefix. All the terms in your field are lowercased, as per your fieldType so you'll have to specify searching that way too Best Erick On Thu, Jul 12, 2012 at 2:01 PM, Nestor Oviedo oviedones...@gmail.com wrote: Hello all. I have a field configured with the LowerCaseFilterFactory as the only analyzer (for botth indexing and searching). The problem is that facet.prefix doesn't work on that field as expected. For example: Indexed term: house -- LowerCaseFilterFactory applied facet.prefix=hou -- returns a house entry as expected facet.prefix=Hou -- no match I suppose the LowerCaseFilterFactory it's not been applied on this prefix term. So ... is this the expected behavior ? How can I perform a facet with a case-insensitive prefix ? Thanks in advance Nestor SeDiCI - http://sedici.unlp.edu.ar PrEBi - http://prebi.unlp.edu.ar Universidad Nacional de La Plata La Plata, Buenos Aires, Argentina
Re: 4.0-ALPHA for general development use?
It really comes down to you. Many people run a trunk version of Solr in production. Some never would. Generally, bugs are fixed quickly, and trunk is pretty stable. The main issue is index format changes and upgrades. If you use trunk you generally have to be willing to reindex to upgrade. That's one nice thing about this Alpha - we are saying that unless there is a really bad bug, you will be able to upgrade to future versions without reindexing. Most of the code itself has been in dev and use for years - so it's not so risky in my opinion. It's almost more about Java APIs and what not than code stability when we say Alpha. In fact, just read this http://www.lucidimagination.com/blog/2012/07/03/4-0-alpha-whats-in-a-name/ That should help clarify what this release is. On Fri, Jul 13, 2012 at 6:51 AM, John Field jfi...@astreetpress.com wrote: Hi, we are considering a long-term project (likely lifecycle of several years) with an initial production release in approximately three months. We're intending to use Solr 3.6.0, with a view for upgrading to 4.0 upon stable release. However, http://lucene.apache.org/solr/ now has 4.0-ALPHA as the main download, implying this version is for general use. But on the other hand, the release notes state This is an alpha release for early adopters. and http://wiki.apache.org/solr/Solr4.0 gives a timescale of 60 days minimum before final release. We'd like to use 4.0 features such as near real-time updates, but haven't identified these as must-haves for the initial release. Given that our first production release is likely to occur a month after that 60 days, is 4.0-ALPHA suitable for general product development, or is it recommended to stick with 3.6.0 and accept an upgrade cost when 4.0 is stable? (Perhaps this hinges on understanding why 4.0-ALPHA is now the main download option). Thanks. -- - Mark http://www.lucidimagination.com
edismax not working in a core
I'm having trouble with edismax not working in one of my cores. I have three cores up and running, including the demo in Solr 3.6 on Tomcat 7.0.27 on Java 1.6. I can't get edismax to work on one of those cores, and it's configured very similar to the demo, which does work. I have different fields, but overall I'm not doing much different. I'm testing using a query with OR in it to try to get a union. On two of the cores, I get the union, on my third one I get a much smaller set than either term should return. If I tell the misbehaving core to have a defType of lucene, that does honor the OR. What could I be possibly missing? Thanks, Richard
solr-user-subscribe
solr-user-subscribe solr-user-subscr...@lucene.apache.org
Re: Email keeps bouncing
: Whenever I reply to an email to this list I got a failure notice (please see below) The message you just send is plain text, but according to this failure your other message used HTML, so i would suggest checking your mail settings and always sending plain text mails to the list. It's possible that the messages you are replying to are already in HTML, but that wasn't enough by itself for hte spam filter to stop them, and then your client is asuming it should use HTML when responding to an HTML message -- but combined with some of the other spam flags your message raised, it is enough to stop your email. In particular, you may want to disable whatever Reply-To header settings you have, as the spam filter is detecting that it is forged. http://email.about.com/od/yahoomailtips/qt/et_plain_text.htm http://email.about.com/od/yahoomailtips/qt/et_reply_to.htm : Remote host said: 552 spam score (6.0) exceeded threshold (FREEMAIL_FORGED_REPLYTO,FSL_FREEMAIL_1,FSL_FREEMAIL_2,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,URI_HEX ) [BODY] -Hoss
Re: Updating documents
On Fri, Jul 13, 2012 at 12:57 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier jonatan.fourn...@gmail.com wrote: Is there a flag for: if document does not exist, create it for me? Not currently, but it certainly makes sense. The implementation should be easy. The most difficult part is figuring out the best syntax to specify this. Another idea: we could possibly switch to create-if-not-exist by default, and use the existing optimistic concurrency mechanism to specify that the document should exist. So specify _version_=1 if the document should exist and _version_=0 (the default) if you don't care. Yes that would be neat! One more question related to partial document update. So far I'm able to append to multivalue fields, set new value to regular/multivalue fields. One thing I didn't find is the remove command, what is its JSON syntax? Thanks, -- jonatan -Yonik http://lucidimagination.com
Re: Updating documents
On Fri, Jul 13, 2012 at 1:41 PM, Jonatan Fournier jonatan.fourn...@gmail.com wrote: On Fri, Jul 13, 2012 at 12:57 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier jonatan.fourn...@gmail.com wrote: Is there a flag for: if document does not exist, create it for me? Not currently, but it certainly makes sense. The implementation should be easy. The most difficult part is figuring out the best syntax to specify this. Another idea: we could possibly switch to create-if-not-exist by default, and use the existing optimistic concurrency mechanism to specify that the document should exist. So specify _version_=1 if the document should exist and _version_=0 (the default) if you don't care. Yes that would be neat! I've just committed this change. One more question related to partial document update. So far I'm able to append to multivalue fields, set new value to regular/multivalue fields. One thing I didn't find is the remove command, what is its JSON syntax? Set it to the JSON value of null. -Yonik http://lucidimagination.com
Re: Updating documents
On Fri, Jul 13, 2012 at 1:43 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, Jul 13, 2012 at 1:41 PM, Jonatan Fournier jonatan.fourn...@gmail.com wrote: On Fri, Jul 13, 2012 at 12:57 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier jonatan.fourn...@gmail.com wrote: Is there a flag for: if document does not exist, create it for me? Not currently, but it certainly makes sense. The implementation should be easy. The most difficult part is figuring out the best syntax to specify this. Another idea: we could possibly switch to create-if-not-exist by default, and use the existing optimistic concurrency mechanism to specify that the document should exist. So specify _version_=1 if the document should exist and _version_=0 (the default) if you don't care. Yes that would be neat! I've just committed this change. Super thanks! I assume it will end up in the 4.0 release? One more question related to partial document update. So far I'm able to append to multivalue fields, set new value to regular/multivalue fields. One thing I didn't find is the remove command, what is its JSON syntax? Set it to the JSON value of null. -Yonik http://lucidimagination.com
Re: Updating documents
On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier jonatan.fourn...@gmail.com wrote: Yonik, On Thu, Jul 12, 2012 at 12:52 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Jul 12, 2012 at 12:38 PM, Jonatan Fournier jonatan.fourn...@gmail.com wrote: On Thu, Jul 12, 2012 at 11:05 AM, Erick Erickson The partial documents update that Jonatan references also requires that all the fields be stored. If my only fields with stored=false are copyField (e.g. I don't need their content to rebuild the document), are they gonna be re-copied with the partial document update? Correct - your setup should be fine. Only original source fields (non copyField targets) should have stored=true Another question I had related to partial update... $ ./post.sh foo.json {responseHeader:{status:409,QTime:0},error:{msg:Document not found for update. id=foo,code:409}} Is there a flag for: if document does not exist, create it for me? The thing is that I don't know in advance if the document already exist (of course I could query first.. but I have millions of entry to process, might exist, might be an update I don't know...) My naive approach was to have in the same request two documents, one with only set using the unique ID, and then in the second one all the add (concerning multivalue field). So it would do the following: 1. Document (with id) exist or not don't care, use the following set command to update/create 2. 2nd pass, I know you exist (with above id), please add all those to the multivalue fields (none of those fields are in the initial updates) My rationale is that if the document exists, reset some fields, and then append the multivalue fields (those multivalue fields express historical updates) Probably silly mistake on my side, but I don't seem to get the append/add JSON syntax right for multiValue fields... On my document initial creation it works great with ... mv_f:cat1, mv_f:cat2, ... But later on when I want to append cat3 to the field by doing this: mv_f:{add:cat3}, ... I end up with something like this in the index: mv_f:[{add=cat3}], Obviously something is wrong with my syntax ;) -- jonatan The reason I created 2 documents is that Solr doesn't seem happy if I mix set and add in the same document :) -- jonatan -Yonik http://lucidimagination.com
Re: Updating documents
On Fri, Jul 13, 2012 at 3:50 PM, Jonatan Fournier jonatan.fourn...@gmail.com wrote: On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier jonatan.fourn...@gmail.com wrote: But later on when I want to append cat3 to the field by doing this: mv_f:{add:cat3}, ... I end up with something like this in the index: mv_f:[{add=cat3}], Obviously something is wrong with my syntax ;) Are you using a custom update processor chain? The DistributedUpdateProcessor currently contains the logic for optimistic concurrency and updates. If you're not already, try some test commands with the stock server. If you are already using the stock server, then perhaps you're not sending what you think you are to Solr? -Yonik http://lucidimagination.com
queryResultCache not checked with fieldCollapsing
I have an index with field collapsing defined like this: lst name=default str name=group.fieldSomeField/str str name=group.maintrue/str str name=grouptrue/str /lst When I run dismax queries I see there are no lookups in the queryResultCache. If I remove the field collapsing - lookups happen. I can't find any mention of this anywhere or think of reason why this should disable caching. I've tried playing with the group.cache.percent parameter but that doesn't seem to play a role here. Anybody know what's going on here? Mike -- View this message in context: http://lucene.472066.n3.nabble.com/queryResultCache-not-checked-with-fieldCollapsing-tp3994954.html Sent from the Solr - User mailing list archive at Nabble.com.