Scoring by document size

2013-09-17 Thread blopez
Hi all,

I have some doubts about the Solr scoring function. I'm using all default
configuration, but I'm facing a wired issue with the retrieved scores.

In the schema, I'm going to focus in the only field I'm interested in. Its
definition is:

*

 




 





*

(omitNorms="false", if not, the document size is not taken into account to
the final score)

Then, I index some documents, with the following text in the 'myField'
field:

doc1 = "A B C"
doc2 = "A B C D"
doc3 = "A B C D E"
doc4 = "A B C D E F"
doc5 = "A B C D E F G H"
doc6 = "A B C D E F G H I"

Finally, I perform the query 'myField:("A" "B" "C")' in order to recover all
the documents, but with different scoring (doc1 is more similar to the query
than doc2, which is more similar than doc3, ...).

All the documents are retrieved (OK), but the scores are like this:

*doc1 = 2,590214
doc2 = 2,590214*
doc3 = 2,266437
*doc4 = 1,94266
doc5 = 1,94266*
doc6 = 1,618884

So in conclussion, as you can see the score goes down, but not the way I'd
like. Doc1 is getting the same scoring than Doc2, even when Doc1 matches 3/3
tokens, and Doc2 matches 3/4 tokens.

Is this the normal Solr behaviour? Is there any way to get my expected
behaviour?

Thanks a lot,
Borja.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Scoring-by-document-size-tp4090523.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: n values in one fieldType

2013-01-19 Thread blopez
I'll always query on the set of 6 values, but in some cases, the matching
doesn't need to be exact. 

I mean, an usual query (you know, 6 integer values) could be exact matching
for the first 4 values, but then a range for the other 2 values.

What do u think would be the best way to face it?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/n-values-in-one-fieldType-tp4034552p4034737.html
Sent from the Solr - User mailing list archive at Nabble.com.


n values in one fieldType

2013-01-18 Thread blopez
Hi guys,

I have some specific needs for an application. Each document (identified by
docId) has several items from the same type (each one of these items
contains 6 integer values). So each Solr doc has a docId and another
multiValued attribute.






My problem is that I don't know what fieldType I should use to implement in
the 'item' attribute, because every input query will have the 6 integer
values I told you before, to recover the docs that contains EXACTLY the 6
values.

What do you think?

Borja.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/n-values-in-one-fieldType-tp4034552.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: PointType multivalued query

2012-11-15 Thread blopez
Sorry I tried to explain it too fast.

Imagine the usecase that I wrote on the first post.

A document can have more than one 6-Dimensions point. So my first approach
was:

   1
   2,2,2,2,2,2


   2
   3,3,3,3,3,3


   3
   4,4,4,4,4,4


It works fine and I don't think it gives us bad performance, but there are a
lot of redundant data (high disk space cost). That's why I thought about
multivalued fields:


   10
   2,2,2,2,2,2
   3,3,3,3,3,3
   4,4,4,4,4,4

   
The first approach to implement this was PointType. But I have the problem
that I comment in my first message, the search queries will be a 6-Dimension
point that I have to full-match with the indexed points, and as far as I
know I cannot do it with PointType. 

With SpatialRecursivePrefixTreeFieldType would be perfect if I could use
more than two dimensions.

Regards,
Borja.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/PointType-multivalued-query-tp4020445p4020616.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: PointType multivalued query

2012-11-15 Thread blopez
Hi,

I think it's not a good idea to make Join operations between Solr cores
because of the performance (we managed a lot of data).

The point is that we want to store documents, each one with several
information sets (let's name them Points), each one identified by 6 values
(that's why I was trying to use 6-Dimensions PointType).

I'm doing this to try to improve the indexing space and time (and if
possible the retrieval time), because nowadays we have it implemented in
another index structure with these point values represented in a individual
Solr attribute. This way (showed below) I think is less efficient than what
I was trying to do with PointType:





...



So for the "docToReference"=1 we may have thousands of "point sets", what
implies having a lot of noise in the Solr index.

What do you think about that?

Thank you very much,
Borja.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/PointType-multivalued-query-tp4020445p4020606.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: PointType multivalued query

2012-11-15 Thread blopez
Hi David,

thanks for your reply.

I've tested this datatype and the values are indexed fine (I'm using
6-dimensions points).

I'm trying to retrieve results and it works only with the 2 first dimensions
(X and Y), but it's not taking into account the others 4 dimensions. 

I've been reading the documentation you sent me but I cannot see an
attribute to define the number of dimensions I should use.

Do you know what's happening?

Regards,
Borja.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/PointType-multivalued-query-tp4020445p4020551.html
Sent from the Solr - User mailing list archive at Nabble.com.


PointType multivalued query

2012-11-15 Thread blopez
Hi all,

I'm using a multivalued PointType (6 dimensions) in my Solr schema. Imagine
that I have one doc indexed in Solr:


  -1
  1,1,1,1,1,1
  5,5,5,5,5,5


Now imagine that I launch some queries:

point:[0,0,0,0,0,0 TO 2,2,2,2,2,2]: Works OK (matches with the first doc
point and returns doc -1)
point:[4,4,4,4,4,4 TO 6,6,6,6,6,6]: Works OK (matches with the second doc
point and returns doc -1)

point:[4,0,0,0,0,0 TO 6,2,2,2,2,2]: Does not work. The first query point
matches with the second doc point, and the rest of query points matches with
the first doc point (returns doc -1, but it must NOT return any doc!). I
only want to retrieve docs which have a point that completely matches with
the query point.

I don't know if my problem is the PointType data type or bad behavior of the
multivalued items. What do you think about that?

Regards,
Borja.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/PointType-multivalued-query-tp4020445.html
Sent from the Solr - User mailing list archive at Nabble.com.


About solr fields (dynamic query)

2012-11-07 Thread blopez
Hi all,

I'm facing some problems with solr fields at query time. Let's see a
simplified example.

I have the fields A, B and C. In a relational DB, it's possible to launch a
(let's say dynamically) query: SELECT * FROM wherever WHERE wherever.A +
wherever.B = wherever.C

I'm trying to do this in Solr but I don't know if it's possible. Would be
something like C:A+B but, obviously, Solr takes 'A+B' as a string and it
does not work.

Is there any approach to do this?

Regards,
Borja.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/About-solr-fields-dynamic-query-tp4018684.html
Sent from the Solr - User mailing list archive at Nabble.com.


Regional indexing/retrieval

2012-10-18 Thread blopez
Hi all,

I'm facing some problems with my solr index due to I have English and
Spanish terms mixed. 

Actually I'm using Porter stemmer (works only for English terms). Btw, I've
seen that I can use the Snowball stemmer with the flag language="English" or
language="Spanish".

Moreover, I've read something about using different fieldType elements for
the different languages, for example , , BUT
I'd like to avoid this solution, at least in the short-run.

A fast solution I could find is using the Snowball stemmer twice in the same
fieldType, I mean:



...
*
*
...



But I do not think it can be a good solution, maybe the Spanish filter
(applied first) can make some noise to an English word that should only take
into account of the English filter... and moreover I don't know how bad
performance it can produce.

What do you think?

Regards,
Borja.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regional-indexing-retrieval-tp4014455.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Several indexes

2012-10-17 Thread blopez
Thank you both.

At the end I decided to implement the multi-core approach. I think it's the
fastest and easiest solution, and now it's working fine with two cores.

By the way, to check if it's implemented properly... each 'core folder' (in
my case core0, core1, ...) needs its 'bin', 'conf' and 'data' folders,
right?

Regards,
Borja.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Several-indexes-tp4014181p4014244.html
Sent from the Solr - User mailing list archive at Nabble.com.


Several indexes

2012-10-17 Thread blopez
Hi all,

I'm facing a problem that I think is easier to solve than I really think. 

Overview: I have an application working on Solr which manages indexing and
retrieval operations. Everything's working fine, I can index some docs (for
example schema with attributes A, B and C) in a Solr index and then perform
query operations on it.

The problem is that I want to implement another process in the same
application to retrieve information, but with a different schema. For
example, docs with attributes X and Y.

I tried to set two different schemas in the schema.xml file, but it crashes
the Solr instance. Moreover, I've been thinking about a workaround but it's
not clear for me. Another point could be creating a new instance of Solr, so
that there are two Solr instances open... but I think it's not a real
solution.

Regards,
Borja.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Several-indexes-tp4014181.html
Sent from the Solr - User mailing list archive at Nabble.com.