Scoring by document size
Hi all, I have some doubts about the Solr scoring function. I'm using all default configuration, but I'm facing a wired issue with the retrieved scores. In the schema, I'm going to focus in the only field I'm interested in. Its definition is: * * (omitNorms="false", if not, the document size is not taken into account to the final score) Then, I index some documents, with the following text in the 'myField' field: doc1 = "A B C" doc2 = "A B C D" doc3 = "A B C D E" doc4 = "A B C D E F" doc5 = "A B C D E F G H" doc6 = "A B C D E F G H I" Finally, I perform the query 'myField:("A" "B" "C")' in order to recover all the documents, but with different scoring (doc1 is more similar to the query than doc2, which is more similar than doc3, ...). All the documents are retrieved (OK), but the scores are like this: *doc1 = 2,590214 doc2 = 2,590214* doc3 = 2,266437 *doc4 = 1,94266 doc5 = 1,94266* doc6 = 1,618884 So in conclussion, as you can see the score goes down, but not the way I'd like. Doc1 is getting the same scoring than Doc2, even when Doc1 matches 3/3 tokens, and Doc2 matches 3/4 tokens. Is this the normal Solr behaviour? Is there any way to get my expected behaviour? Thanks a lot, Borja. -- View this message in context: http://lucene.472066.n3.nabble.com/Scoring-by-document-size-tp4090523.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: n values in one fieldType
I'll always query on the set of 6 values, but in some cases, the matching doesn't need to be exact. I mean, an usual query (you know, 6 integer values) could be exact matching for the first 4 values, but then a range for the other 2 values. What do u think would be the best way to face it? -- View this message in context: http://lucene.472066.n3.nabble.com/n-values-in-one-fieldType-tp4034552p4034737.html Sent from the Solr - User mailing list archive at Nabble.com.
n values in one fieldType
Hi guys, I have some specific needs for an application. Each document (identified by docId) has several items from the same type (each one of these items contains 6 integer values). So each Solr doc has a docId and another multiValued attribute. My problem is that I don't know what fieldType I should use to implement in the 'item' attribute, because every input query will have the 6 integer values I told you before, to recover the docs that contains EXACTLY the 6 values. What do you think? Borja. -- View this message in context: http://lucene.472066.n3.nabble.com/n-values-in-one-fieldType-tp4034552.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: PointType multivalued query
Sorry I tried to explain it too fast. Imagine the usecase that I wrote on the first post. A document can have more than one 6-Dimensions point. So my first approach was: 1 2,2,2,2,2,2 2 3,3,3,3,3,3 3 4,4,4,4,4,4 It works fine and I don't think it gives us bad performance, but there are a lot of redundant data (high disk space cost). That's why I thought about multivalued fields: 10 2,2,2,2,2,2 3,3,3,3,3,3 4,4,4,4,4,4 The first approach to implement this was PointType. But I have the problem that I comment in my first message, the search queries will be a 6-Dimension point that I have to full-match with the indexed points, and as far as I know I cannot do it with PointType. With SpatialRecursivePrefixTreeFieldType would be perfect if I could use more than two dimensions. Regards, Borja. -- View this message in context: http://lucene.472066.n3.nabble.com/PointType-multivalued-query-tp4020445p4020616.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: PointType multivalued query
Hi, I think it's not a good idea to make Join operations between Solr cores because of the performance (we managed a lot of data). The point is that we want to store documents, each one with several information sets (let's name them Points), each one identified by 6 values (that's why I was trying to use 6-Dimensions PointType). I'm doing this to try to improve the indexing space and time (and if possible the retrieval time), because nowadays we have it implemented in another index structure with these point values represented in a individual Solr attribute. This way (showed below) I think is less efficient than what I was trying to do with PointType: ... So for the "docToReference"=1 we may have thousands of "point sets", what implies having a lot of noise in the Solr index. What do you think about that? Thank you very much, Borja. -- View this message in context: http://lucene.472066.n3.nabble.com/PointType-multivalued-query-tp4020445p4020606.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: PointType multivalued query
Hi David, thanks for your reply. I've tested this datatype and the values are indexed fine (I'm using 6-dimensions points). I'm trying to retrieve results and it works only with the 2 first dimensions (X and Y), but it's not taking into account the others 4 dimensions. I've been reading the documentation you sent me but I cannot see an attribute to define the number of dimensions I should use. Do you know what's happening? Regards, Borja. -- View this message in context: http://lucene.472066.n3.nabble.com/PointType-multivalued-query-tp4020445p4020551.html Sent from the Solr - User mailing list archive at Nabble.com.
PointType multivalued query
Hi all, I'm using a multivalued PointType (6 dimensions) in my Solr schema. Imagine that I have one doc indexed in Solr: -1 1,1,1,1,1,1 5,5,5,5,5,5 Now imagine that I launch some queries: point:[0,0,0,0,0,0 TO 2,2,2,2,2,2]: Works OK (matches with the first doc point and returns doc -1) point:[4,4,4,4,4,4 TO 6,6,6,6,6,6]: Works OK (matches with the second doc point and returns doc -1) point:[4,0,0,0,0,0 TO 6,2,2,2,2,2]: Does not work. The first query point matches with the second doc point, and the rest of query points matches with the first doc point (returns doc -1, but it must NOT return any doc!). I only want to retrieve docs which have a point that completely matches with the query point. I don't know if my problem is the PointType data type or bad behavior of the multivalued items. What do you think about that? Regards, Borja. -- View this message in context: http://lucene.472066.n3.nabble.com/PointType-multivalued-query-tp4020445.html Sent from the Solr - User mailing list archive at Nabble.com.
About solr fields (dynamic query)
Hi all, I'm facing some problems with solr fields at query time. Let's see a simplified example. I have the fields A, B and C. In a relational DB, it's possible to launch a (let's say dynamically) query: SELECT * FROM wherever WHERE wherever.A + wherever.B = wherever.C I'm trying to do this in Solr but I don't know if it's possible. Would be something like C:A+B but, obviously, Solr takes 'A+B' as a string and it does not work. Is there any approach to do this? Regards, Borja. -- View this message in context: http://lucene.472066.n3.nabble.com/About-solr-fields-dynamic-query-tp4018684.html Sent from the Solr - User mailing list archive at Nabble.com.
Regional indexing/retrieval
Hi all, I'm facing some problems with my solr index due to I have English and Spanish terms mixed. Actually I'm using Porter stemmer (works only for English terms). Btw, I've seen that I can use the Snowball stemmer with the flag language="English" or language="Spanish". Moreover, I've read something about using different fieldType elements for the different languages, for example , , BUT I'd like to avoid this solution, at least in the short-run. A fast solution I could find is using the Snowball stemmer twice in the same fieldType, I mean: ... * * ... But I do not think it can be a good solution, maybe the Spanish filter (applied first) can make some noise to an English word that should only take into account of the English filter... and moreover I don't know how bad performance it can produce. What do you think? Regards, Borja. -- View this message in context: http://lucene.472066.n3.nabble.com/Regional-indexing-retrieval-tp4014455.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Several indexes
Thank you both. At the end I decided to implement the multi-core approach. I think it's the fastest and easiest solution, and now it's working fine with two cores. By the way, to check if it's implemented properly... each 'core folder' (in my case core0, core1, ...) needs its 'bin', 'conf' and 'data' folders, right? Regards, Borja. -- View this message in context: http://lucene.472066.n3.nabble.com/Several-indexes-tp4014181p4014244.html Sent from the Solr - User mailing list archive at Nabble.com.
Several indexes
Hi all, I'm facing a problem that I think is easier to solve than I really think. Overview: I have an application working on Solr which manages indexing and retrieval operations. Everything's working fine, I can index some docs (for example schema with attributes A, B and C) in a Solr index and then perform query operations on it. The problem is that I want to implement another process in the same application to retrieve information, but with a different schema. For example, docs with attributes X and Y. I tried to set two different schemas in the schema.xml file, but it crashes the Solr instance. Moreover, I've been thinking about a workaround but it's not clear for me. Another point could be creating a new instance of Solr, so that there are two Solr instances open... but I think it's not a real solution. Regards, Borja. -- View this message in context: http://lucene.472066.n3.nabble.com/Several-indexes-tp4014181.html Sent from the Solr - User mailing list archive at Nabble.com.