Re: how to index GEO JSON

2018-07-26 Thread SolrUser1543
of course i saw this reference . 

but it is not clear understood , how exactly geojson looks like . 

where do I put an item Id ? 


regular json index request looks like :
{
"id":"111",
"geo_srpt": 
}

i tried to put a geo json as a geo_srpt value but it does not work . 

so far I managed to index some of WKT ( not all types ) , but no geojson at
all . 

Any help  with detailed example will be appreciated 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


how to index GEO JSON

2018-07-25 Thread SolrUser1543
I have look in reference guide and different wiki articles , but have not
found anywhere an example of how index geojson .

I have the following field definition :


how should post request looks like in order to put geojson in this field ? 

I have managed to index WKT , but not geojson . 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


how to index GEO JSON

2018-07-25 Thread SolrUser1543
I have look in reference guide and different wiki articles , but have not
found anywhere an example of how index geojson .

I have the following field definition :


how should post request looks like in order to put geojson in this field ? 

I have managed to index WKT , but not geojson . 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Performance degradation with two collection on same sole instance

2015-10-30 Thread SolrUser1543
we have 100 gb ram on each machine . 20 gb - for heap . index size of big
collection is 130 gb . the new second collection has only few documents ,
only few MB  .

When we disabled new cores , performance has improved . 

Both collection using same solr.config , so they have same filter
configurations .

But the second collection very small , only few documents , so cache can not
consume a memory .





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-degradation-with-two-collection-on-same-sole-instance-tp4236774p4237354.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr collection alias - how rank is affected

2015-10-27 Thread SolrUser1543
How is document ranking is affected when using a collection alias for
searching on two collections with same schema ? is it affected at all  ? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-collection-alias-how-rank-is-affected-tp4236776.html
Sent from the Solr - User mailing list archive at Nabble.com.


Performance degradation with two collection on same sole instance

2015-10-27 Thread SolrUser1543
we have a large solr cloud , with one collection and no replicas .
Each machine has one solr core .
Recently we decided to add a new collection , based on same schema , so now
each solr instance has two cores .
First collection has very big index , but the new one has only several
hundreds of documents.

Day after we did it , we experienced very strong performance degradation,
like long query times and server unavailability.

JVM was configured to 20GB heap , and we did not changed it during addition
of a new collection.

The question is , how Solr manages its resources when it has more than one
core ? Does it need twice memory ? Or this degradation might be a
coincidence ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-degradation-with-two-collection-on-same-sole-instance-tp4236774.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Number of requests to each shard is different with and without using of grouping

2015-08-21 Thread SolrUser1543
Ramkumar R. Aiyengar wrote
> Grouping does need 3 phases.. The phases are:
> 
>  
> (2) For the N groups, each shard is asked for the top M ids (M is
> configurable per request). 
>  

What do you exactly means by /"M is configurable per request"/ ? how exactly
is it configurable and what is the relation between N ( which is initial
rows number )  and M  ? 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Number-of-requests-to-each-shard-is-different-with-and-without-using-of-grouping-tp4224293p4224521.html
Sent from the Solr - User mailing list archive at Nabble.com.


Number of requests to each shard is different with and without using of grouping

2015-08-20 Thread SolrUser1543
I want to understand, why number of requests in SOLD CLOUD is different with
and without using of grouping feature.


1. suppose we have several shards in SOLR CLOUD ( lets say 3 shards )  
2. One of them, gets a query with rows = n 
3. This shards distributes a request among others and suppose that every
shard has a lot of results , much more than n . 
4. Then it receives an item IDs from each shards , so the number of results
in total is 3n 
5. Then it sorts the results and chooses the  best n results , when in my
case each shard  has representatives in total results . 
6. Then it send a second request to each shard , with appropriate item IDs ,
to get a stored fields .

So then in this case ,each shard will be queried twice, first one to get
item IDs , and the second to get stored fields . 

That is what I see in my logs . ( I see 6 log entries , 2 for each shard ) 

*The question is , why when I am using a grouping feature, the number of
request to each shard is 3 instead of 2 times ?*  ( I see 8 or 9 log entries
) 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Number-of-requests-to-each-shard-is-different-with-and-without-using-of-grouping-tp4224293.html
Sent from the Solr - User mailing list archive at Nabble.com.


how to ignore unavailable shards

2015-07-15 Thread SolrUser1543
I have handler configured in solr.config as shard.tolerant = true , which
means ignore unavailable shards when returning a results .

Sometime shards are not really down,but doing GC or heavy commit . 

Is it possible and how to ignore them   ? I prefer to get a partial result
instead of timeout error . 

I am using solr 4.10 with many shards and intensive indexing .



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-ignore-unavailable-shards-tp4217556.html
Sent from the Solr - User mailing list archive at Nabble.com.


Disable transaction log with SOLRCloud without replicas

2015-07-15 Thread SolrUser1543
from here :
https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance
we can learn that Transaction Log is needed when replicas are used in SOLR
cloud . 

Do I need it if I am not using a replicas ?
Could it be disabled for performance improvement ?  

What are negative influence may be in this case ?  





 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Disable-transaction-log-with-SOLRCloud-without-replicas-tp4217554.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: increase connections on tomcat

2015-03-12 Thread SolrUser1543
I investigated my tomcat7 configuration. 
I have founded that we  work in BIO mode. 
I consider to switch to NIO mode. 

 what are recommendation in this case? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/increase-connections-on-tomcat-tp4192405p4192602.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: increase connections on tomcat

2015-03-11 Thread SolrUser1543
does it apply to solr 4.10 ? or only to  solr 5 ? 







--
View this message in context: 
http://lucene.472066.n3.nabble.com/increase-connections-on-tomcat-tp4192405p4192436.html
Sent from the Solr - User mailing list archive at Nabble.com.


increase connections on tomcat

2015-03-11 Thread SolrUser1543
Client application which queries solr needs to increase a number of
simultaneously connections in order to improve performance ( in additional
to get solr results, it needs to get an internal resources like images. ) 
But this increment has improved client performance, but caused degradation
in solr  .

what I think is that I need to increase a number of connection in order to
allow to more requests run between solr shards. 

How can I prove that I need?
How can I increase it on tomcat? ( on each shard )



--
View this message in context: 
http://lucene.472066.n3.nabble.com/increase-connections-on-tomcat-tp4192405.html
Sent from the Solr - User mailing list archive at Nabble.com.


Conditional invocation of HTMLStripCharFactory

2015-03-01 Thread SolrUser1543
is it possible to make a considional invocation of a HTMLStripCharFactory? I
want to decide when to enable or disable it according to a value of specific
field in my document.  E.g. when a value of field A is true, then enable a
filter on field B,or disable otherwise. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Conditional-invocation-of-HTMLStripCharFactory-tp4190010.html
Sent from the Solr - User mailing list archive at Nabble.com.


incorrect Java version reported in solr dashboard

2015-02-23 Thread SolrUser1543
I have upgraded Java version from 1.7 to 1.8 on Linux server. 
After the upgrade,  if I run " Java -version " I can see that it really 
changed to the new one. 

But when I run Solr, it is still reporting the old version in dashboard JVM
section.  

What could be the reason? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/incorrect-Java-version-reported-in-solr-dashboard-tp4188236.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ignoring bad documents during index

2015-02-22 Thread SolrUser1543
What i tried is to make an update processor , with try / catch inside of
ProcessAdd.
This update processor was the last one last in update chain .
in catch statement I  tried to add to response the id of failed item . This
information ( about failed items ) is lost somewhere when request redirected
from shard which got the initial request to another .

What I am looking for is a place which looks like foreach statement,which
iterates over all shards and can aggregate a reponse from each one .
Including ability to handle a case when some shard is down . 

 

  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ignoring-bad-documents-during-index-tp4176947p4188041.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ignoring bad documents during index

2015-02-22 Thread SolrUser1543
I'm not using a replicas. Does this class relevant anyway? 

Is there any way to not change this class ,but inherit it and do try catch
on ProcessAdd? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ignoring-bad-documents-during-index-tp4176947p4188008.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ignoring bad documents during index

2015-02-22 Thread SolrUser1543
We are working with the following configuration :

There is Indexer service that prepares a bulk of xmls  .
Those XMLs received by a shard , which used only for distributing a request
among a shards. ( let's call it GW)
Some of shards could return OK, some 400 ( wrong field value ) some 500 (
because they were down ) 

I want to return from this GW a detailed status to Indexer and to know
exactly what items were failed. 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/ignoring-bad-documents-during-index-tp4176947p4188006.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ignoring bad documents during index

2015-02-22 Thread SolrUser1543
I think , you did not understand the question. 

The problem is indexing via cloud. 
When one shard gets a request,  distributes it among others and in case of
error on one of them this information is not passed to request initiator. 

Does anyone know where is it? 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/ignoring-bad-documents-during-index-tp4176947p4187997.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Performing DIH on predefined list of IDS

2015-02-21 Thread SolrUser1543
That's right, but I am not sure that if it is works with Get I will able to
use Post without changing it. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performing-DIH-on-predefined-list-of-IDS-tp4187589p4187838.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Performing DIH on predefined list of IDS

2015-02-21 Thread SolrUser1543
Yes,  you right,  I am not using a DB. 
 SolrEntityProcessor is using a GET method,  so I will need to send
relatively big URL ( something like a hundreds of ids ) hope it will be
possible. 

Any way I think it is the only method to perform reindex if I want to
control it and be able to continue from any point in case of failure.  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performing-DIH-on-predefined-list-of-IDS-tp4187589p4187835.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Performing DIH on predefined list of IDS

2015-02-20 Thread SolrUser1543
My index has about 110 millions of documents. The index is split over several
shards. 
May be the number it's not so big ,but each document is relatively large. 

The reason to perform the reindex is something like adding a new fields , or
adding some update processor which can extract something from one field and
put in another and etc. 

Each time I need to reindex data , I create a new collection and starting to
import data from old one .
It gives the opportunity for an update processors to act. 

The dih running with *:* query and takes some number of items each time. 
In case of exception , the process stops and the middle and I can't to
restart from this point. 

That's the reason that I want to run on predefined list of IDs. 
In this case I will able to restart from any point and to know about filed
IDs. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performing-DIH-on-predefined-list-of-IDS-tp4187589p4187753.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ignoring bad documents during index

2015-02-20 Thread SolrUser1543
I am sending a bulk of XML via http request. 

The same way like indexing via " documents " in solr interface. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ignoring-bad-documents-during-index-tp4176947p4187632.html
Sent from the Solr - User mailing list archive at Nabble.com.


Performing DIH on predefined list of IDS

2015-02-20 Thread SolrUser1543
Relatively frequently (about a once a month) we need to reindex the data, by
using DIH and copying the data from one index to another.
Because of the fact that we have a large index, it could take from 12 to 24
hours to complete. At the same time the old index is being queried by users. 
Sometimes DIH could be interrupted at the middle, because of some unexpected
exception caused by OutOfMemory or something else (many times it failed when
more than 90 % was completed). 
More than this, almost every time, some items are missing at new the  index.
It is very complicated to find them. 
At this stage I can't be sure about what documents exactly were missed and I
have to do it again and waiting for many hours. At the same time the old
index constantly receives new items. 

I want to suggest the following way to solve the problem: 
•   Get list of all item ids ( call LUCINE API , like CLUE does for example 
) 
•   Start DIH, which will iterate over those ids and each time make a
query for n items.
1.  Of course original DIH class should be changed to support it. 
•   This will give the following advantages : 
1.  I will know exactly what items were failed.
2.  I can restart the process from any point and in case of DIH failure
restart it from the point of failure.


so the main difference will be that now DIH running on *:* query and I
suggest to run it list of IDS 

for example if I have 1000 docs and want that this new DIH will take each
time 100 docs , so it will do 10 queries , each one will have 100 IDS . (
like id:(1 2 3 ... 100) then id:(101 102 ... 200) etc... ) 

The question is what do you think about it? Or all of this could be done
another way and I am trying to reinvent the wheel?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performing-DIH-on-predefined-list-of-IDS-tp4187589.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ignoring bad documents during index

2015-02-20 Thread SolrUser1543
I want to experiment with this issue , where exactly I should take a look ? 
I want to try to fix this missing aggregation . 

What class is responsible to that ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ignoring-bad-documents-during-index-tp4176947p4187587.html
Sent from the Solr - User mailing list archive at Nabble.com.


Define Id when using db dih

2015-01-28 Thread SolrUser1543
Hi,  

I am using data import handler and import data from oracle db. 
I have a problem that the table I am importing from has no one column which
is defined as a key. 
How should I define the key in the data config file ?

Thanks 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Define-Id-when-using-db-dih-tp4182797.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reindex data without creating new index.

2015-01-28 Thread SolrUser1543
By rebalancing I mean that such a big amount of updates will create a
situation which will require running optimization of index ,because each
document will be added again, instead of original one. 

But according to what you say it is should not be a problem, am I correct? 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reindex-data-without-creating-new-index-tp4182464p4182726.html
Sent from the Solr - User mailing list archive at Nabble.com.


Reindex data without creating new index.

2015-01-27 Thread SolrUser1543
I want to reindex my data in order to change a value of some field according
to value of another. ( both field are existing ) 

For this purpose I run a "clue" utility in order to get a list of IDs.  
Then I created an update processor , which can set a value of field A
according to value of field B.
I added a new request handler ,like a classic update , but with new update
chain with a new update processor

I want to run a http post request for each ID , to a new handler ,with item
id only. 
This will trigger my update processor , which will get an existing doc from
the index and do the logic. 

So in this way I can do some enrichment , without full data import and
without creating a new index .

What do you think about it ?
Could it cause a performance degradation because of it? SOLR can handle it
or it will rebalance the index ?
Does SOLR has some built in feature which can do it ?






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reindex-data-without-creating-new-index-tp4182464.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ignoring bad documents during index

2015-01-10 Thread SolrUser1543
Would it be a good solution to index single document instead of bulk ? 
In this case I will know about the status of each message . 

What is recommendation in this case : Bulk vs Single ? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ignoring-bad-documents-during-index-tp4176947p4178546.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ignoring bad documents during index

2015-01-10 Thread SolrUser1543
>From reading this (https://issues.apache.org/jira/browse/SOLR-445) I see that
there is no solution provided for the issue of aggregating responses from
several solr instances is available . 


Solr is not able to do that ? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ignoring-bad-documents-during-index-tp4176947p4178544.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ignoring bad documents during index

2015-01-07 Thread SolrUser1543
I have implemented an update processor as  described above. 

On single solr instance it works fine. 

When I testing it on solr cloud with several nodes and trying to index few
documents , when some of them are incorrect , each instance is creating its
response, but it is not aggregated by the instance which got a request . 

I also tried to use QueryReponseWriter , but it is also was not aggregated . 

The questions are : 
1.  how to make it be aggregated ? 
2. what kind of update processor it should be : UpdateRequestProcessor or
DistributedUpdateRequestProcessor ? 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/ignoring-bad-documents-during-index-tp4176947p4177911.html
Sent from the Solr - User mailing list archive at Nabble.com.


ignoring bad documents during index

2015-01-01 Thread SolrUser1543
Suppose I need to index a bulk of several documents ( D1 D2 D3 D4 )  - 4
documents in one request.

If e.g. D3 was an incorrect , so exception will be thrown and HTTP response
with 400 bad request will be returned .

Documents D1 and D2 will be indexed, but  D4 not . Also no indication will
be returned .

1. If it is possible to ignore such an error and continue to index D4 ? 
2. What will the best way to add an information about failed documents ? I
thought about an update processor , with try / catch in addCommand and in
case of exception add a doc ID to response .
Or it may be better to implement a component or response writer to add the
info ? 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/ignoring-bad-documents-during-index-tp4176947.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr collection alias - how rank is affected

2014-12-02 Thread SolrUser1543
Solr allows create an alias for few collection via its API

Suppose I have two collection C1 & C2 and an alias C3 = C1 , C2 

C1 and C2 deployed on different machines , but has a mutual ZooKeeper .

How rank is affected when searching C3 collection ? 
when they has same schema ?
different schema ? ( it is possible to search on different schemas if we use
field alias on both collections ) 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-collection-alias-how-rank-is-affected-tp4172197.html
Sent from the Solr - User mailing list archive at Nabble.com.


AbstractSubTypeFieldType as a template

2014-11-18 Thread SolrUser1543
I need to implement indexing of hierarchical data , like post and its
comments . 
Each comment has few fields like username / text / date . 

There are few more types like comment that I need too . ( the only
difference is field names and its count)


There are LatLonType filed type , which derives from
AbstractSubTypeFieldType . This type allows to index a struct of two fields
which embedded to flat document  .

Is it possible to create a generic type named , e.g. GenericSubType , which
can get in its schema configuration names of fields and its types and allows
me to index a structures inside a flat document like LatLonType ? 

The most important point is correlation between values , because LatLon
creates two multi valued fields and uses values at the same index in both
fields . 

any ideas  ? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/AbstractSubTypeFieldType-as-a-template-tp4169740.html
Sent from the Solr - User mailing list archive at Nabble.com.


Document versioning support

2014-11-09 Thread SolrUser1543
Suppose I have a text field called "myTextField" .
Sometimes the field content may change . I would like to have all versions
of this field be indexed in Solr .

What I want to do , is to make 'myTextField' contain the latest version of
the content and create additional multivalued field  called
'myTextField_history' which will contain all previous versions . 

In this way I can make a boost on 'myTextField' during a search and also
have all versions be indexed and also to know what version is the latest . 

Does Solr has some build in mechanism which can do the same ? Or my solution
is good enough ? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Document-versioning-support-tp4168417.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to choose only one best hit from several ones ?

2014-11-09 Thread SolrUser1543
Lets say that for some query there are several results , with several hits
for each one , which shown in hightligth section of the response.

Is it possible to select only one best hit for every result ? there are
hl.snippets parameter which controls number of snippets . hl.snippets=1 ,
will show the fisrt one , but not certenly the best one . 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-choose-only-one-best-hit-from-several-ones-tp4168416.html
Sent from the Solr - User mailing list archive at Nabble.com.


Get cache statistics via rest

2014-10-12 Thread SolrUser1543
I want to monitor my solr cache efficiency  :
Filter cache , queryresultcache, fieldvaluecache. 

This information available on plugin/stats page for specific core. 

How can I get this information via Rest ? 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-cache-statistics-via-rest-tp4163951.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Combining several fields for facets.

2014-09-28 Thread SolrUser1543
I have many values in each field, I cant use facet query... (I dont know all
the values)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Combining-several-fields-for-facets-tp4160679p4161539.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Combining several fields for facets.

2014-09-24 Thread SolrUser1543
Using a copy field will require reindeer of my data, I am looking for a
solution without reindex. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Combining-several-fields-for-facets-tp4160679p4160858.html
Sent from the Solr - User mailing list archive at Nabble.com.


Combining several fields for facets.

2014-09-23 Thread SolrUser1543
Hi!
How can I create a facet combining 2 (or more) different fields, without
using copy field to union them?
For example if I have this documents:
Doc1 with the field X contains the value a and field Y contains the value b
Doc2 with the field X contains the value c and field Y contains the value a
Doc3 with the field X contains the value d and field Y contains the value d
I want to get this facet:
XY
a 2
b 1
c 1
d 1



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Combining-several-fields-for-facets-tp4160679.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Altenative preview for specific fields

2014-09-14 Thread SolrUser1543
Hi , thanks for the answer. 

I tried to use this technique , but the desired result was not achieved. 

Can you please provide an example of document to index and some sample query
?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Altenative-preview-for-specific-fields-tp4158771p4158807.html
Sent from the Solr - User mailing list archive at Nabble.com.


Altenative preview for specific fields

2014-09-14 Thread SolrUser1543
Suppose I have the following fields : 

text,author,title

users performs a query on all those fileds : 

...?q=(text:XX OR author:XX OR title:XX)

if this query has a match in 'text' field , so highligter will generate a
hit preview based on this field , which is fine . 

But suppose a query matched an 'author' field , so the preview will not be 
much intresting . 
In this case I would like to show something else e.g. first 3 lines of
'text' filed. 

What will be the best way to achive this ? 







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Altenative-preview-for-specific-fields-tp4158771.html
Sent from the Solr - User mailing list archive at Nabble.com.


Performance of Boolean query with hundreds of OR clauses.

2014-08-19 Thread SolrUser1543
I am using Solr to perform search for finding similar pictures. 

For this purpose, every image indexed as a set of descriptors ( descriptor
is a string of 6 chars ) .
Number of descriptors for every image may vary ( from few to many thousands)

When I want to search  for a similar image , I am extracting the descriptors
from it and create a query like :
MyImage:( desc1 desc2 ...  desc n )

Number of descriptors in query may also vary. Usual it is about 1000.

Of course performance of this query very bad and may take few minutes to
return . 

Any ideas for performance improvement ? 

P.s I also tried to use lire , but it is not fits my use case.  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-of-Boolean-query-with-hundreds-of-OR-clauses-tp4153844.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Collection fix predefined hash range for data import.

2014-06-12 Thread SolrUser1543
The reason is the following :

I have a  collection named col1 , which has n Shards deployed on n machines.
( on each machine - one shard with one replica ) 

Now I want to create col2 , with new config and import data from col1 to
col2. 

What I need is that shards on col2 will be on the same machine like as in
col 1 ( which means with same has range ). 

 The reason is simple - I want during data import that data will be copied
locally , not via the network between machines. 

For example if shard3 was machine #7 in col2 , so I want shard of  col2 be
also on the same machine. Otherwise it will copied over the network. 

But during collection creation the order of shards could not be controlled. 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-fix-predefined-hash-range-for-data-import-tp4141365p4141527.html
Sent from the Solr - User mailing list archive at Nabble.com.


Collection fix predefined hash range for data import.

2014-06-12 Thread SolrUser1543
I have a collection containing n shards. 

Now I want to create a new collection and perform a data import from old to
new one. 

How can I make hash ranges of new collection be the same as old one , in
order to make data import be locally ( on the same machine ) ? 

I mean , if shard#3 of old collection has range 100-200 for example, so how
can I force the new shard#3 have the same range ?  





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-fix-predefined-hash-range-for-data-import-tp4141365.html
Sent from the Solr - User mailing list archive at Nabble.com.


HASH range calculation

2014-06-01 Thread SolrUser1543
I have a SOLR cloud with 5 solr instances .

Indexing of new documents is always performed against instance#1 .

Then according to hash calculation the document is being indexing on one of
instances .

How could i know on which one it will be ? 
How could I know how SOLR is calculating the hash to predict to which
instance it will be sent ? 







--
View this message in context: 
http://lucene.472066.n3.nabble.com/HASH-range-calculation-tp4139150.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Index / Query IP Address as number.

2014-05-19 Thread SolrUser1543
I have a text field containing a large piece of mixed text , like : 

test test 12/12/2001 12345 192.168.1.1 1234324


I need to  create a copy field which will capture only all IPs from the text
( may be more than one IP ) . 

What will be the best way to do ? 

I dont see any option to make WordDelimiter to not break down the IP , so as
alternative I will use a copy field . 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-Query-IP-Address-as-number-tp4136760p4136974.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Index / Query IP Address as number.

2014-05-19 Thread SolrUser1543
I dont have autogeneratephrasequeries set to true .  I tried both false /
true for it  , but nothing changed

Capture.JPG   

the same chain defined for both query / index : 


  
  
" />
   

  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-Query-IP-Address-as-number-tp4136760p4136971.html
Sent from the Solr - User mailing list archive at Nabble.com.


Index / Query IP Address as number.

2014-05-18 Thread SolrUser1543
This question was  raised here  for a few times , but no final solution was
provided . 

I'am using a combination of ClassicTokenizer and WordDelimiterFactory in my
Query / Index chain. 

as a result an IP like 192.168.1.3 is indexed as 

192 - pos1
168 - pos2 
1- pos3
3- pos4 
19216813 - pos5


So searching for a similar ,but different address like 192.168.1.4 will
return wrong item because of match for all 3 first position. 

So the question is , what is the best way do index / query by IP as number ,
but using ClassicTokenizer and WordDelimiter  ? 


actually I would like to have the IP as num , without breaking it on parts . 
( have only 19216813 ) 

Thanks .






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-Query-IP-Address-as-number-tp4136760.html
Sent from the Solr - User mailing list archive at Nabble.com.