[CVE-2020-13957] The checks added to unauthenticated configset uploads in Apache Solr can be circumvented

2020-10-12 Thread Tomas Fernandez Lobbe
Severity: High

Vendor: The Apache Software Foundation

Versions Affected:
6.6.0 to 6.6.5
7.0.0 to 7.7.3
8.0.0 to 8.6.2

Description:
Solr prevents some features considered dangerous (which could be used for
remote code execution) to be configured in a ConfigSet that's uploaded via
API without authentication/authorization. The checks in place to prevent
such features can be circumvented by using a combination of UPLOAD/CREATE
actions.

Mitigation:
Any of the following are enough to prevent this vulnerability:
* Disable UPLOAD command in ConfigSets API if not used by setting the
system property: "configset.upload.enabled" to "false" [1]
* Use Authentication/Authorization and make sure unknown requests aren't
allowed [2]
* Upgrade to Solr 8.6.3 or greater.
* If upgrading is not an option, consider applying the patch in SOLR-14663
([3])
* No Solr API, including the Admin UI, is designed to be exposed to
non-trusted parties. Tune your firewall so that only trusted computers and
people are allowed access

Credit:
Tomás Fernández Löbbe, András Salamon

References:
[1] https://lucene.apache.org/solr/guide/8_6/configsets-api.html
[2]
https://lucene.apache.org/solr/guide/8_6/authentication-and-authorization-plugins.html
[3] https://issues.apache.org/jira/browse/SOLR-14663
[4] https://issues.apache.org/jira/browse/SOLR-14925
[5] https://wiki.apache.org/solr/SolrSecurity


[SECURITY] CVE-2019-12401: XML Bomb in Apache Solr versions prior to 5.0

2019-09-09 Thread Tomas Fernandez Lobbe
Severity: Medium

Vendor: The Apache Software Foundation

Versions Affected:
1.3.0 to 1.4.1
3.1.0 to 3.6.2
4.0.0 to 4.10.4

Description: Solr versions prior to 5.0.0 are vulnerable to an XML resource
consumption attack (a.k.a. Lol Bomb) via it’s update handler. By leveraging
XML DOCTYPE and ENTITY type elements, the attacker can create a pattern
that will expand when the server parses the XML causing OOMs

Mitigation:
* Upgrade to Apache Solr 5.0 or later.
* Ensure your network settings are configured so that only trusted traffic
is allowed to post documents to the running Solr instances.

Credit: Matei "Mal" Badanoiu

References:
[1] https://issues.apache.org/jira/browse/SOLR-13750
[2] https://wiki.apache.org/solr/SolrSecurity


CVE-2019-0192 Deserialization of untrusted data via jmx.serviceUrl in Apache Solr

2019-03-06 Thread Tomas Fernandez Lobbe
Severity: High

Vendor: The Apache Software Foundation

Versions Affected:
5.0.0 to 5.5.5
6.0.0 to 6.6.5

Description:
ConfigAPI allows to configure Solr's JMX server via an HTTP POST request.
By pointing it to a malicious RMI server, an attacker could take advantage
of Solr's unsafe deserialization to trigger remote code execution on the
Solr side.

Mitigation:
Any of the following are enough to prevent this vulnerability:
* Upgrade to Apache Solr 7.0 or later.
* Disable the ConfigAPI if not in use, by running Solr with the system
property “disable.configEdit=true”
* If upgrading or disabling the Config API are not viable options, apply
patch in [1] and re-compile Solr.
* Ensure your network settings are configured so that only trusted traffic
is allowed to ingress/egress your hosts running Solr.

Credit:
Michael Stepankin

References:
[1] https://issues.apache.org/jira/browse/SOLR-13301
[2] https://wiki.apache.org/solr/SolrSecurity


[SECURITY] CVE-2017-3164 SSRF issue in Apache Solr

2019-02-12 Thread Tomas Fernandez Lobbe
CVE-2017-3164 SSRF issue in Apache Solr

Severity: High

Vendor: The Apache Software Foundation

Versions Affected:
Apache Solr versions from 1.3 to 7.6.0

Description:
The "shards" parameter does not have a corresponding whitelist mechanism,
so it can request any URL.

Mitigation:
Upgrade to Apache Solr 7.7.0 or later.
Ensure your network settings are configured so that only trusted traffic is
allowed to ingress/egress your hosts running Solr.

Credit:
dk from Chaitin Tech

References:
https://issues.apache.org/jira/browse/SOLR-12770
https://wiki.apache.org/solr/SolrSecurity


Re: Exception writing document xxxxxx to the index; possible analysis error.

2018-07-11 Thread Tomas Fernandez Lobbe
I Daphne, 
the “possible analysis error” is a misleading error message (to be addressed in 
SOLR-12477). The important piece is the 
“java.lang.ArrayIndexOutOfBoundsException”, it looks like your index may be 
corrupted in some way.

Tomás

> On Jul 11, 2018, at 3:01 PM, Liu, Daphne  wrote:
> 
> Hello Solr Expert,
>   We are using Solr 6.3.0 and lately we are unable to write documents into 
> our index. Please see below error messages. Can anyone help us?
>   Thank you.
> 
> 
> ===
> org.apache.solr.common.SolrException: Exception writing document id 
> 3b8514819e204cc7a110aa5752e29b8e to the index; possible analysis error.
>at 
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:178)
>at 
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
>at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
>at 
> org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:335)
>at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
>at 
> org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
>at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
>at 
> org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
>at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
>at 
> org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
>at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
>at 
> org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
>at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
>at 
> org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:74)
>at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
>at 
> org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
>at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
>at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:957)
>at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1112)
>at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:738)
>at 
> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
>at 
> org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)
>at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179)
>at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135)
>at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:275)
>at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
>at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:240)
>at 
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:158)
>at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:186)
>at 
> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:107)
>at 
> org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:54)
>at 
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
>at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
>at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:153)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:2213)
>at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
>at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)
>at 
> org.apache.solr.servlet.SolrDispatchFi

Re: User queries end up in filterCache if facetting is enabled

2018-05-09 Thread Tomas Fernandez Lobbe
I'd never noticed this before, but I believe it happens because, once you say 
`facet=true`, Solr will need the full docset (the set of all matching docs, not 
just the top matches) and does so by using the filter cache.

> On May 3, 2018, at 7:10 AM, Markus Jelsma  wrote:
> 
> By the way, the queries end up in the filterCache regardless of the value set 
> in useFilterForSortedQuery.
> 
> Thanks,
> Markus
> 
> -Original message-
>> From:Markus Jelsma 
>> Sent: Thursday 3rd May 2018 12:05
>> To: solr-user@lucene.apache.org; solr-user 
>> Subject: RE: User queries end up in filterCache if facetting is enabled
>> 
>> Thanks Mikhail,
>> 
>> But i thought about that setting too, but i do sort by score, as does Solr 
>> /select handler by default. The enum method accounts for all the values for 
>> a facet field, but not the user queries i see ending up in the cache.
>> 
>> Any other suggestions to shed light on this oddity?
>> 
>> Thanks!
>> Markus
>> 
>> 
>> 
>> -Original message-
>>> From:Mikhail Khludnev 
>>> Sent: Thursday 3rd May 2018 9:43
>>> To: solr-user 
>>> Subject: Re: User queries end up in filterCache if facetting is enabled
>>> 
>>> I mean
>>> https://lucene.apache.org/solr/guide/6_6/query-settings-in-solrconfig.html#QuerySettingsinSolrConfig-useFilterForSortedQuery
>>> 
>>> 
>>> On Thu, May 3, 2018 at 10:42 AM, Mikhail Khludnev  wrote:
>>> 
 Enum facets, facet refinements and https://lucene.apache.org/
 solr/guide/6_6/query-settings-in-solrconfig.html comes to my mind.
 
 On Wed, May 2, 2018 at 11:58 PM, Markus Jelsma  wrote:
 
> Hello,
> 
> Anyone here to reproduce this oddity? It shows up in all our collections
> once we enable the stats page to show filterCache entries.
> 
> Is this normal? Am i completely missing something?
> 
> Thanks,
> Markus
> 
> 
> 
> -Original message-
>> From:Markus Jelsma 
>> Sent: Tuesday 1st May 2018 17:32
>> To: Solr-user 
>> Subject: User queries end up in filterCache if facetting is enabled
>> 
>> Hello,
>> 
>> We noticed the number of entries of the filterCache to be higher than
> we expected, using showItems="1024" something unexpected was listed as
> entries of the filterCache, the complete Query.toString() of our user
> queries, massive entries, a lot of them.
>> 
>> We also spotted all entries of fields we facet on, even though we don't
> use them as filtes, but that is caused by facet.field=enum, and should be
> expected, right?
>> 
>> Now, the user query entries are not expected. In the simplest set up,
> searching for something and only enabling the facet engine with facet=true
> causes it to appears in the cache as an entry. The following queries:
>> 
>> http://localhost:8983/solr/search/select?q=content_nl:nog&facet=true
>> http://localhost:8983/solr/search/select?q=*:*&facet=true
>> 
>> become listed as:
>> 
>> CACHE.searcher.filterCache.item_*:*:
>>org.apache.solr.search.BitDocSet@​70051ee0
>> 
>> CACHE.searcher.filterCache.item_content_nl:nog:
>>org.apache.solr.search.BitDocSet@​13150cf6
>> 
>> This is on 7.3, but 7.2.1 does this as well.
>> 
>> So, should i expect this? Can i disable this? Bug?
>> 
>> 
>> Thanks,
>> Markus
>> 
>> 
>> 
>> 
> 
 
 
 
 --
 Sincerely yours
 Mikhail Khludnev
 
>>> 
>>> 
>>> 
>>> -- 
>>> Sincerely yours
>>> Mikhail Khludnev
>>> 
>> 



Re: Solr 7.2.1 DELETEREPLICA automatically NRT replica appears

2018-03-07 Thread Tomas Fernandez Lobbe
This shouldn’t be happening. Did you see anything related in the logs? Does the 
new NRT replica ever becomes active? Is there a new core created or do you just 
see the replica in the clusterstate?

Tomas 

Sent from my iPhone

> On Mar 7, 2018, at 8:18 PM, Greg Roodt  wrote:
> 
> Hi
> 
> I am running a cluster of TLOG and PULL replicas. When I call the
> DELETEREPLICA api to remove a replica, the replica is removed, however, a
> new NRT replica pops up in a down state in the cluster.
> 
> Any ideas why?
> 
> Greg


Re: solr cloud unique key query request is sent to all shards!

2018-02-18 Thread Tomas Fernandez Lobbe
In real-time get, the parameter name is “id”, regardless of the name of the 
unique key. 

The request should be in your case: 
http://:8080/api/collections/col1/get?id=69749398

See: https://lucene.apache.org/solr/guide/7_2/realtime-get.html

Sent from my iPhone

> On Feb 18, 2018, at 9:28 PM, Ganesh Sethuraman  
> wrote:
> 
> I tried this real time get on my collection using the both V1 and V2 URL
> for real time get, but did not work!!!
> 
> http://:8080/api/collections/col1/get?myid:69749398
> 
> it returned...
> 
> {
>  "doc":null}
> 
> same issue with V1 URL as well, http://
> :8080/solr/col1/get?myid:69749398
> 
> however if i do q=myid:69749398 with "select" request handler seems to
> fine. I checked my schema again and it is configured correctly.  Like below:
> 
> myid
> 
> Also i see that this implicit request handler is configured correctly Any
> thoughts, what I might be missing?
> 
> 
> 
> On Sun, Feb 18, 2018 at 11:18 PM, Tomas Fernandez Lobbe 
> wrote:
> 
>> I think real-time get should be directed to the correct shard. Try:
>> [COLLECTION]/get?id=[YOUR_ID]
>> 
>> Sent from my iPhone
>> 
>>> On Feb 18, 2018, at 3:17 PM, Ganesh Sethuraman 
>> wrote:
>>> 
>>> Hi
>>> 
>>> I am using Solr 7.2.1. I have 8 shards in two nodes (two different m/c)
>>> using Solr Cloud. The data was indexed with a unique key (default
>> composite
>>> id) using the CSV update handler (batch indexing). Note that I do NOT
>> have
>>>  while indexing.   Then when I try to  query the
>>> collection col1 based on my primary key (as below), I see that in the
>>> 'debug' response that the query was sent to all the shards and when it
>>> finds the document in one the shards it sends a GET FIELD to that shard
>> to
>>> get the data.  The problem is potentially high response time, and more
>>> importantly scalability issue as unnecessarily all shards are being
>> queried
>>> to get one document (by unique key).
>>> 
>>> http://:8080/solr/col1/select?debug=true&q=id:69749278
>>> 
>>> Is there a way to query to reach the right shard based on the has of the
>>> unique key?
>>> 
>>> Regards
>>> Ganesh
>> 


Re: solr cloud unique key query request is sent to all shards!

2018-02-18 Thread Tomas Fernandez Lobbe
I think real-time get should be directed to the correct shard. Try:  
[COLLECTION]/get?id=[YOUR_ID]

Sent from my iPhone

> On Feb 18, 2018, at 3:17 PM, Ganesh Sethuraman  
> wrote:
> 
> Hi
> 
> I am using Solr 7.2.1. I have 8 shards in two nodes (two different m/c)
> using Solr Cloud. The data was indexed with a unique key (default composite
> id) using the CSV update handler (batch indexing). Note that I do NOT have
>  while indexing.   Then when I try to  query the
> collection col1 based on my primary key (as below), I see that in the
> 'debug' response that the query was sent to all the shards and when it
> finds the document in one the shards it sends a GET FIELD to that shard to
> get the data.  The problem is potentially high response time, and more
> importantly scalability issue as unnecessarily all shards are being queried
> to get one document (by unique key).
> 
> http://:8080/solr/col1/select?debug=true&q=id:69749278
> 
> Is there a way to query to reach the right shard based on the has of the
> unique key?
> 
> Regards
> Ganesh


Re: Request routing / load-balancing TLOG & PULL replica types

2018-02-12 Thread Tomas Fernandez Lobbe


> On Feb 12, 2018, at 12:06 PM, Greg Roodt  wrote:
> 
> Thanks Ere. I've taken a look at the discussion here:
> http://lucene.472066.n3.nabble.com/Limit-search-queries-only-to-pull-replicas-td4367323.html
> This is how I was imagining TLOG & PULL replicas would wor, so if this
> functionality does get developed, it would be useful to me.
> 
> I still have 2 questions at the moment:
> 1. I am running the single shard scenario. I'm thinking of using a
> dedicated HTTP load-balancer in front of the PULL replicas only with
> read-only queries directed directly at the load-balancer. In this
> situation, the healthy PULL replicas *should* handle the queries on the
> node itself without a proxy hop (assuming state=active). New PULL replicas
> added to the load-balancer will internally proxy queries to the other PULL
> or TLOG replicas while in state=recovering until the switch to
> state=active. Is my understanding correct?

Yes

> 
> 2. Is it all worth it? Is there any advantage to running a cluster of 3
> TLOGs + 10 PULL replicas vs running 13 TLOG replicas?
> 

I don’t have a definitive answer, this will depend on your specific use case. 
As Erick said, there is very little work that non-leader TLOG replicas do for 
each update, and having all TLOG replicas means that with a single active 
replica you could in theory handle updates. It’s sometimes nice to separate 
query traffic from update traffic, but this can still be done if you have all 
TLOG replicas and you just make sure you don’t query the leader…
One nice characteristic that PULL replicas have is that they can’t go into 
Leader Initiated Recovery (LIR) state, even if there is some sort of network 
partition, they’ll remain in active state even if they can’t talk with the 
leader as long as they can reach ZooKeeper (note that this means they may be 
responding with outdated data for an undetermined amount of time, until 
replicas can replicate from the leader again). Also, since updates are not sent 
to all the replicas (only the TLOG replicas), updates should be faster with 3 
TLOG vs 13 TLOG replicas.


Tomás

> 
> 
> 
> On 12 February 2018 at 19:25, Ere Maijala  wrote:
> 
>> Your question about directing queries to PULL replicas only has been
>> discussed on the list. Look for topic "Limit search queries only to pull
>> replicas". What I'd like to see is something similar to the
>> preferLocalShards parameter. It could be something like
>> "preferReplicaTypes=TLOG,PULL". Tomás mentioned previously that
>> SOLR-10880 could be used as a base for such funtionality, and I'm
>> considering taking a stab at implementing it.
>> 
>> --Ere
>> 
>> 
>> Greg Roodt kirjoitti 12.2.2018 klo 6.55:
>> 
>>> Thank you both for your very detailed answers.
>>> 
>>> This is great to know. I knew that SolrJ had the cluster aware knowledge
>>> (via zookeeper), but I was wondering what something like curl would do.
>>> Great to know that internally the cluster will proxy queries to the
>>> appropriate place regardless.
>>> 
>>> I am running the single shard scenario. I'm thinking of using a dedicated
>>> HTTP load-balancer in front of the PULL replicas only with read-only
>>> queries directed directly at the load-balancer. In this situation, the
>>> healthy PULL replicas *should* handle the queries on the node itself
>>> without a proxy hop (assuming state=active). New PULL replicas added to
>>> the
>>> load-balancer will internally proxy queries to the other PULL or TLOG
>>> replicas while in state=recovering until the switch to state=active.
>>> 
>>> Is my understanding correct?
>>> 
>>> Is this sensible to do, or is it not worth it due to the smart proxying
>>> that SolrCloud can do anyway?
>>> 
>>> If the TLOG and PULL replicas are so similar, is there any real advantage
>>> to having a mixed cluster? I assume a bit less work is required across the
>>> cluster to propagate writes if you only have 3 TLOG nodes vs 10+ PULL
>>> nodes? Or would it be better to just have 13 TLOG nodes?
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 12 February 2018 at 15:24, Tomas Fernandez Lobbe 
>>> wrote:
>>> 
>>> On the last question:
>>>> For Writes: Yes. Writes are going to be sent to the shard leader, and
>>>> since PULL replicas can’t  be leaders, it’s going to be a TLOG replica.
>>>> If
>>>> you are using CloudSolrClient, then this routing will be done directly
>>>> from
>>>> the client (since it will send the update to the

Re: Request routing / load-balancing TLOG & PULL replica types

2018-02-11 Thread Tomas Fernandez Lobbe
On the last question:
For Writes: Yes. Writes are going to be sent to the shard leader, and since 
PULL replicas can’t  be leaders, it’s going to be a TLOG replica. If you are 
using CloudSolrClient, then this routing will be done directly from the client 
(since it will send the update to the leader), and if you are using some other 
HTTP client, then yes, the PULL replica will forward the update, the same way 
any non-leader node would.

For reads: this won’t happen today, and any replica can respond to queries. I 
do believe there is value in this kind of routing logic, sometimes you simply 
don’t want the leader to handle any queries, specially when queries can be 
expensive. You could do this today if you want, by putting some load balancer 
in front and just direct your queries to the nodes you know are PULL, but keep 
in mind that this would only work in the single shard scenario, and only if you 
hit an active replica (otherwise, as you said, the query will be routed to any 
other node of the shard, regardless of the type), if you have multiple shards 
then you need to use the “shards” parameter and tell Solr exactly which nodes 
you want to hit for each shard (the “shards” approach can also be done in the 
single shard case, although you would be adding an extra hop I believe)

Tomás 
Sent from my iPhone

> On Feb 11, 2018, at 6:35 PM, Greg Roodt  wrote:
> 
> Hi
> 
> I have a question around how queries are routed and load-balanced in a
> cluster of mixed TLOG and PULL replicas.
> 
> I thought that I might have to put a load-balancer in front of the PULL
> replicas and direct queries at them manually as nodes are added and removed
> as PULL replicas. However, it seems that SolrCloud handles this
> automatically?
> 
> If I add a new PULL replica node, it goes into state="recovering" while it
> pulls the core. As expected. What happens if queries are directed at this
> node while in this state? From what I am observing, the query gets directed
> to another node?
> 
> If SolrCloud is handling the routing of requests to active nodes, will it
> automatically favour PULL replicas for read queries and TLOG replicas for
> writes?
> 
> Thanks
> Greg


Re: 7.2.1 cluster dies within minutes after restart

2018-02-02 Thread Tomas Fernandez Lobbe
Hi Markus, 
If the same code that runs OK in 7.1 breaks 7.2.1, it is clear to me that there 
is some bug in Solr introduced between those releases (maybe an increase in 
memory utilization? or maybe some decrease in query throughput making threads 
to pile up?). I’d hate to have this issue lost in the users list, could you 
create a Jira? Maybe next time you have this issue you can post thread/heap 
dumps, that would be useful.

Tomás

> On Feb 2, 2018, at 9:38 AM, Walter Underwood  wrote:
> 
> Zookeeper 3.4.6 is not good? That was the version recommended by Solr docs 
> when I installed 6.2.0.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Feb 2, 2018, at 9:30 AM, Markus Jelsma  wrote:
>> 
>> Hello S.G.
>> 
>> We have relied in Trie* fields every since they became available, i don't 
>> think reverting to the old fieldType's will do us any good, we have a very 
>> recent problem.
>> 
>> Regarding our heap, the cluster ran fine for years with just 1.5 GB, we only 
>> recently increased it because or data keeps on growing. Heap rarely goes 
>> higher than 50 %, except when this specific problem occurs. The nodes have 
>> no problem processing a few hundred QPS continuously and can go on for days, 
>> sometimes even a few weeks.
>> 
>> I will keep my eye open for other clues when the problem strikes again!
>> 
>> Thanks,
>> Markus
>> 
>> -Original message-
>>> From:S G 
>>> Sent: Friday 2nd February 2018 18:20
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: 7.2.1 cluster dies within minutes after restart
>>> 
>>> Yeah, definitely check the zookeeper version.
>>> 3.4.6 is not a good one I know and you can say the same for all the
>>> versions below it too.
>>> We have used 3.4.9 with no issues.
>>> While Solr 7.x uses 3.4.10
>>> 
>>> Another dimension could be the use or (dis-use) of p-fields like pint,
>>> plong etc.
>>> If you are using them, try to revert back to tint, tlong etc
>>> And if you are not using them, try to use them (Although doing this means a
>>> change from your older config and less likely to help).
>>> 
>>> Lastly, did I read 2 GB for JVM heap?
>>> That seems really too less to me for any version of Solr
>>> We run with 10-16 gb of heap with G1GC collector and new-gen capped at 3-4gb
>>> 
>>> 
>>> On Fri, Feb 2, 2018 at 4:27 AM, Markus Jelsma 
>>> wrote:
>>> 
 Hello Ere,
 
 It appears that my initial e-mail [1] got lost in the thread. We don't
 have GC issues, the cluster that dies occasionally runs, in general, smooth
 and quick with just 2 GB allocated.
 
 Thanks,
 Markus
 
 [1]: http://lucene.472066.n3.nabble.com/7-2-1-cluster-dies-
 within-minutes-after-restart-td4372615.html
 
 -Original message-
> From:Ere Maijala 
> Sent: Friday 2nd February 2018 8:49
> To: solr-user@lucene.apache.org
> Subject: Re: 7.2.1 cluster dies within minutes after restart
> 
> Markus,
> 
> I may be stating the obvious here, but I didn't notice garbage
> collection mentioned in any of the previous messages, so here goes. In
> our experience almost all of the Zookeeper timeouts etc. have been
> caused by too long garbage collection pauses. I've summed up my
> observations here:
>  
> 
> So, in my experience it's relatively easy to cause heavy memory usage
> with SolrCloud with seemingly innocent queries, and GC can become a
> problem really quickly even if everything seems to be running smoothly
> otherwise.
> 
> Regards,
> Ere
> 
> Markus Jelsma kirjoitti 31.1.2018 klo 23.56:
>> Hello S.G.
>> 
>> We do not complain about speed improvements at all, it is clear 7.x is
 faster than its predecessor. The problem is stability and not recovering
 from weird circumstances. In general, it is our high load cluster
 containing user interaction logs that suffers the most. Our main text
 search cluster - receiving much fewer queries - seems mostly unaffected,
 except last Sunday. After very short but high burst of queries it entered
 the same catatonic state the logs cluster usually dies from.
>> 
>> The query burst immediately caused ZK timeouts and high heap
 consumption (not sure which came first of the latter two). The query burst
 lasted for 30 minutes, the excessive heap consumption continued for more
 than 8 hours, before Solr finally realized it could relax. Most remarkable
 was that Solr recovered on its own, ZK timeouts stopped, heap went back to
 normal.
>> 
>> There seems to be a causality between high load and this state.
>> 
>> We really want to get this fixed for ourselves and everyone else that
 may encounter this problem, but i don't know how, so i need much more
 feedback and hints from those who have deep understandi

Re: Master Slave Replication Issue

2018-02-01 Thread Tomas Fernandez Lobbe
This seems pretty serious. Please create a Jira issue

Sent from my iPhone

> On Feb 1, 2018, at 12:15 AM, dennis nalog  
> wrote:
> 
> Hi,
> We are using Solr 7.1 and are solr setup is master-slave replication.
> We encounter this issue that when we disable the replication in master via UI 
> or URL (server/solr/solr_core/replication?command=disablereplication), the 
> data in our slave servers suddenly becomes 0.
> Just wanna know if this is a known issue or this is the expected behavior. 
> Thanks in advance.
> Best regards,Dennis


Re: Mixing simple and nested docs in same update?

2018-01-30 Thread Tomas Fernandez Lobbe
I believe the problem is that:
* BlockJoin queries do not know about your “types”, in the BlockJoin query 
world, everything that’s not a parent (matches the parentFilter) is a child.
* All docs indexed before a parent are considered childs of that doc.
That’s why in your first case it considers “friend” (not a parent, then a 
child) to be a child of the first parent it can find in the segment (mother). 
In the second case, the “friend” doc would have no parent. No parent document 
matches the filter after it, so it’s not considered a match. 
Maybe if you try your query with parentFilter=-type:child, this particular 
example works (I haven’t tried it)?

Note that when you send docs with childs to Solr, Solr will make sure the 
childs are indexed before the parent. Also note that there are some other open 
bugs related to child docs, and in particular, with mixing child docs with 
non-child docs, depending on which features you need this may be a problem.

Tomás

> On Jan 30, 2018, at 5:48 AM, Jan Høydahl  wrote:
> 
> Pasting the GIST link :-) 
> https://gist.github.com/45640fe3bad696d53ef8a0930a35d163 
> 
> Anyone knows if this is expected behavior?
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
>> 15. jan. 2018 kl. 14:08 skrev Jan Høydahl :
>> 
>> Radio silence…
>> 
>> Here is a GIST for easy reproduction. Is this by design?
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>>> 11. jan. 2018 kl. 00:42 skrev Jan Høydahl :
>>> 
>>> Hi,
>>> 
>>> We index several large nested documents. We found that querying the data 
>>> behaves differently depending on how the documents are indexed.
>>> 
>>> To reproduce:
>>> 
>>> solr start
>>> solr create -c nested
>>> # Index one plain document, “friend" and a nested one, “mother” and 
>>> “daughter”, in same request:
>>> curl localhost:8983/solr/nested/update -d ‘
>>> 
>>> 
>>>   friend
>>>   other
>>> 
>>> 
>>>   mother
>>>   parent
>>>   
>>> daughter
>>> child
>>>   
>>> 
>>> '
>>> 
>>> # Query for mother’s children using either child transformer or child query 
>>> parser
>>> curl 
>>> "localhost:8983/solr/a/query?q=id:mother&fl=%2A%2C%5Bchild%20parentFilter%3Dtype%3Aparent%5D”
>>> {
>>> "responseHeader":{
>>>  "zkConnected":true,
>>>  "status":0,
>>>  "QTime":4,
>>>  "params":{
>>>"q":"id:mother",
>>>"fl":"*,[child parentFilter=type:parent]"}},
>>> "response":{"numFound":1,"start":0,"docs":[
>>>{
>>>  "id":"mother",
>>>  "type":["parent"],
>>>  "_version_":1589249812802306048,
>>>  "type_str":["parent"],
>>>  "_childDocuments_":[
>>>  {
>>>"id":"friend",
>>>"type":["other"],
>>>"_version_":1589249812729954304,
>>>"type_str":["other"]},
>>>  {
>>>"id":"daughter",
>>>"type":["child"],
>>>"_version_":1589249812802306048,
>>>"type_str":["child"]}]}]
>>> }}
>>> 
>>> As you can see, the “friend” got included as a child of “mother”.
>>> If you index the exact same request, putting “friend” after “mother” in the 
>>> xml,
>>> the query works as expected.
>>> 
>>> Inspecting the index, everything looks correct, and only “daughter” and 
>>> “mother” have _root_=mother.
>>> Is there a rule that you should start a new update request for each type of 
>>> parent/child relationship
>>> that you need to index, and not mix them in the same request?
>>> 
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>> 
>> 
> 



Re: Limit search queries only to pull replicas

2018-01-08 Thread Tomas Fernandez Lobbe
This feature is not currently supported. I was thinking in implementing it by 
extending the work done in SOLR-10880. I still didn’t have time to work on it 
though.  There is a patch for SOLR-10880 that doesn’t implement support for 
replica types, but could be used as base. 

Tomás

> On Jan 8, 2018, at 12:04 AM, Ere Maijala  wrote:
> 
> Server load alone doesn't always indicate the server's ability to serve 
> queries. Memory and cache state are important too, and they're not as easy to 
> monitor. Additionally, server load at any single point in time or a short 
> term average is not indicative of the server's ability to handle search 
> requests if indexing happens in short but intense bursts.
> 
> It can also complicate things if there are more than one Solr instance 
> running on a single server.
> 
> I'm definitely not against intelligent routing. In many cases it makes 
> perfect sense, and I'd still like to use it, just limited to the pull 
> replicas.
> 
> --Ere
> 
> Erick Erickson kirjoitti 5.1.2018 klo 19.03:
>> Actually, I think a much better option is to route queries to server load.
>> The theory of preferring pull replicas to leaders would be that the leader
>> will be doing the indexing work and the pull replicas would be doing less
>> work therefore serving queries faster. But that's a fragile assumption.
>> Let's say indexing stops totally. Now your leader is sitting there idle
>> when it could be serving queries.
>> The autoscaling work will allow for more intelligent routing, you can
>> monitor the CPU load on your servers and if the leader has some spare
>> cycles use them .vs. crudely routing all queries to pull replicas (or tlog
>> replicas for that matter). NOTE: I don't know whether this is being
>> actively worked on or not, but seems a logical extension of the increased
>> monitoring capabilities being put in place for autoscaling, but I'd rather
>> see effort put in there than support routing based solely on a node's type.
>> Best,
>> Erick
>> On Fri, Jan 5, 2018 at 7:51 AM, Emir Arnautović <
>> emir.arnauto...@sematext.com> wrote:
>>> It is interesting that ES had similar feature to prefer primary/replica
>>> but it deprecating that and will remove it - could not find explanation why.
>>> 
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>> 
>>> 
>>> 
 On 5 Jan 2018, at 15:22, Ere Maijala  wrote:
 
 Hi,
 
 It would be really nice to have a server-side option, though. Not
>>> everyone uses Solrj, and a typical fairly dummy client just queries the
>>> server without any understanding about shards etc. Solr could be clever
>>> enough to not forward the query to NRT shards when configured to prefer
>>> PULL shards and they're available. Maybe it could be something similar to
>>> the preferLocalShards parameter, like "preferShardTypes=TLOG,PULL".
 
 --Ere
 
 Emir Arnautović kirjoitti 14.12.2017 klo 11.41:
> Hi Stanislav,
> I don’t think that there is a built in feature to do this, but that
>>> sounds like nice feature of Solrj - maybe you should check if available.
>>> You can implement it outside of Solrj - check cluster state to see which
>>> shards are available and send queries only to pull replicas.
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> On 14 Dec 2017, at 09:58, Stanislav Sandalnikov <
>>> s.sandalni...@gmail.com> wrote:
>> 
>> Hi,
>> 
>> We have a Solr 7.1 setup with SolrCloud where we have multiple shards
>>> on one server (for indexing) each shard has a pull replica on other servers.
>> 
>> What are the possible ways to limit search request only to pull type
>>> replicase?
>> At the moment the only solution I found is to append shards parameter
>>> to each query, but if new shards added later it requires to change
>>> solrconfig. Is it the only way to do this?
>> 
>> Thank you
>> 
>> Regards
>> Stanislav
>> 
 
 --
 Ere Maijala
 Kansalliskirjasto / The National Library of Finland
>>> 
>>> 
> 
> -- 
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland



Re: Solr cloud optimizer

2017-09-07 Thread Tomas Fernandez Lobbe
By default Solr uses the “TieredMergePolicy”[1], but it can be configured in 
solrconfig, see [2].  Merges can be triggered for different reasons, but most 
commonly by segment flushes (commits) or other merges finishing.

Here is a nice visual demo of segment merging (a bit old but still mostly 
applies AFAIK): [3]

[1] 
https://lucene.apache.org/core/6_6_0/core/org/apache/lucene/index/TieredMergePolicy.html
[2] https://lucene.apache.org/solr/guide/6_6/indexconfig-in-solrconfig.html
[3] 
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Tomas

> On Sep 7, 2017, at 10:00 AM, calamita.agost...@libero.it wrote:
> 
> 
> Hi  all,
> I use SolrCloud with  some collections with 3  shards each. 
> Every day I insert and remove documents from collections. I  know that solr 
> starts optimizer in background to optimize indexes. 
> Which  is the policy that solr applies in order  to start optimizer 
> automatically ? Number of deleted documents? Number of segments? 
> Thanks.



Re: Request to be added to the ContributorsGroup

2017-08-23 Thread Tomas Fernandez Lobbe
I just added you to the wiki. 
Note that the official documentation is now in the "solr-ref-guide" directory 
of the code base, and you can create patches/PRs to it.

Tomás

> On Aug 23, 2017, at 10:58 AM, Kevin Grimes  wrote:
> 
> Hi there,
> 
> I would like to contribute to the Solr wiki. My username is KevinGrimes, and 
> my e-mail is kevingrim...@me.com .
> 
> Thanks,
> Kevin
> 



Re: Query not working with DatePointField

2017-06-15 Thread Tomas Fernandez Lobbe
The query field:* doesn't work with point fields (numerics or dates), only 
exact or range queries are supported, so an equivalent query would be field:[* 
TO *]


Sent from my iPhone

> On Jun 15, 2017, at 5:24 PM, Saurabh Sethi  wrote:
> 
> Hi,
> 
> We have a fieldType specified for date. Earlier it was using TrieDateField
> and we changed it to DatePointField.
> 
>  sortMissingLast="true" precisionStep="6"/>
> 
> 
> 
> Here are the fields used in the query and one of them uses the dateType:
> 
>  stored="false" required="true" multiValued="false"/>
>  stored="false" docValues="false" />
>  stored="false" multiValued="true" />
> 
> The following query was returning correct results when the field type was
> Trie but not with Point:
> 
> field1:value1 AND ((*:* NOT field2:*) AND field3:value3)
> 
> Any idea why field2:* does not return results anymore?
> 
> Thanks,
> Saurabh


Re: Solr 6: how to get SortedSetDocValues from index by field name

2017-06-14 Thread Tomas Fernandez Lobbe
Hi,
To respond your first question: “How do I get SortedSetDocValues from index by 
field name?”, DocValues.getSortedSet(LeafReader reader, String field) (which is 
what you want to use to assert the existence and type of the DV) will give you 
the dv instance for a single leaf reader. In general, a leaf reader is for a 
specific segment, so depending on what you want to do you may need to iterate 
through all the leaves (segments) if you want all values in the index (kind of 
what you’ll see in NumericFacets or IntervalFacets classes). 

SolrIndexSearcher.getSlowAtomicReader() will give you a view of all the 
segments as a single reader, that’s why in that case the code assumes there is 
only one reader that contains all the values. 

Whatever you do, make sure you test your code in cases with multiple segments 
(and with deletes), which is where bugs using this code are most likely to 
occur.

You won’t need the UninvertingReader if you plan to index docValues, that class 
is used to create a docValues-like view of a field that’s indexed=true & 
docValues=false.

Related note, the DocValues API changed from 6.x to 7 (master). See LUCENE-7407.

I hope that helps, 

Tomás

> On Jun 13, 2017, at 10:49 AM, SOLR4189  wrote:
> 
> How do I get SortedSetDocValues from index by field name?
> 
> I try it and it works for me but I didn't understand why to use
> leaves.get(0)? What does it mean? (I saw such using in
> TestUninvertedReader.java of SOLR-6.5.1):
> 
> *Map mapping = new HashMap<>();
> mapping.put(fieldName, UninvertingReader.Type.SORTED);
> 
> SolrIndexSearcher searcher = req.getSearcher();
> 
> DirectoryReader dReader = searcher.getIndexReader();
> LeafReader reader = null;
> 
> if (!dReader.leaves.isEmpty()) {
>  reader = dReader.leaves().get(0).reader;
>  return null;
> }
> 
> SortedSetDocValues sourceIndex = reader.getSortedSetDocValues(fieldName);*
> 
> Maybe do I need to use SlowAtomicReader, like it:
> 
> *
> UninvertingReader reader = new
> UninvertingReader(searcher.getSlowAtomicReader(), mapping)*;
> 
> What is right way to get SortedSetDocValues and why?
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-6-how-to-get-SortedSetDocValues-from-index-by-field-name-tp4340388.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Got a 404 trying to update a solr. 6.5.1 server. /solr/update not found.

2017-06-05 Thread Tomas Fernandez Lobbe
I think you are missing the collection name in the path.

Tomás

Sent from my iPhone

> On Jun 5, 2017, at 9:08 PM, Phil Scadden  wrote:
> 
> Simple piece of code. Had been working earlier (though against a 6.4.2 
> instance).
> 
>  ConcurrentUpdateSolrClient solr = new 
> ConcurrentUpdateSolrClient("http://myhost:8983/solr",10,2);
>   try {
>solr.deleteByQuery("*:*");
>solr.commit();
>   } catch (SolrServerException | IOException ex) {
>// logger handler stuff omitted.
>   }
> 
> Comes back with:
> 15:53:36,693 DEBUG wire:72 -  << "[\n]"
> 15:53:36,694 DEBUG wire:72 -  << " content="text/html;charset=utf-8"/>[\n]"
> 15:53:36,694 DEBUG wire:72 -  << "Error 404 Not Found[\n]"
> 15:53:36,695 DEBUG wire:72 -  << "[\n]"
> 15:53:36,695 DEBUG wire:72 -  << "HTTP ERROR 404[\n]"
> 15:53:36,696 DEBUG wire:72 -  << "Problem accessing /solr/update. 
> Reason:[\n]"
> 15:53:36,696 DEBUG wire:72 -  << "Not Found[\n]"
> 15:53:36,696 DEBUG wire:72 -  << "[\n]"
> 15:53:36,697 DEBUG wire:72 -  << "[\n]"
> 
> If I access http://myhost:8983/solr/update then I get that html too, but 
> http://myhost:8983/solr comes up with admin page as normal so Solr appears to 
> be running okay.
> Notice: This email and any attachments are confidential and may not be used, 
> published or redistributed without the prior written consent of the Institute 
> of Geological and Nuclear Sciences Limited (GNS Science). If received in 
> error please destroy and immediately notify GNS Science. Do not copy or 
> disclose the contents.


Re: Searching with AND + OR and spaces

2010-11-12 Thread Tomas Fernandez Lobbe
Hi Jon, for the first query:

title:"Call of Duty" OR subhead:"Call of Duty"

If you are sure that you have documents with the same phrase, make sure you 
don't have a problem with stop words and with token positions. I recommend you 
to check the analysis page at the Solr admin. pay special attention to the 
"enablePositionIncrements" attribute of the StopFilterFactory which defaults to 
false. 
(http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StopFilterFactory).


for your second query:

title:Call of Duty OR subhead:Call of Duty AND type:4

make sure that you add parentheses like:

title:(Call of Duty) OR subhead:(Call of Duty) AND type:4

otherwise it will be translated to the query (supposing you have OR as your 
default parameter):

title:Call OR your_default_field:of OR  your_default_field:Duty OR subhead:Call 
OR  your_default_field:of OR  your_default_field:Duty AND type:4

Tomás





De: Jon Drukman 
Para: solr-user@lucene.apache.org
Enviado: viernes, 12 de noviembre, 2010 15:22:21
Asunto: Searching with AND + OR and spaces

I want to search two fields for the phrase Call Of Duty.  I tried this:

(title:"Call of Duty" OR subhead:"Call of Duty")

No matches, despite the fact that there are many documents that should match.

So I left out the quotes, and it seems to work.  But now when I try doing things
like

title:Call of Duty OR subhead:Call of Duty AND type:4

I get a lot of things like "called it!" and "i'm taking calls" but call of duty
doesn't surface.

How can I get what I want?

-jsd-


  

Re: analyzer type

2010-11-12 Thread Tomas Fernandez Lobbe
For a field type the anslysis applied at index time (when you are adding 
documents to Solr) can be a slightly different than the analysis applied at 
query time (when a user executes a query). For example, if you know you are 
going to be indexing html pages, you might need to use the 
HTMLStripCharFilterFactor to strip the html tags, but the user wont be querieng 
with html tags, right? so in that case you might need the 
HTMLStripCharFilterFactory only at index time (on the "index" type analyzer).

If you don't specify the analyzer type, by default, the same analysis chain 
(all 
the same token filters, char filters and the tokenizer) will be applied to 
both, 
indexing and querying.

I hope I made myself clear





De: gauravshetti 
Para: solr-user@lucene.apache.org
Enviado: viernes, 12 de noviembre, 2010 13:46:49
Asunto: analyzer type


Can you please help me distinguish between analyzer types. i am not able to
find document for the same.

I want to add solr.HTMLStripCharFilterFactory in the schema.xml file.

And i can see two types defined in my schema.xml for analyzer


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/analyzer-type-tp1890002p1890002.html
Sent from the Solr - User mailing list archive at Nabble.com.



  

Re: Search with accent

2010-11-10 Thread Tomas Fernandez Lobbe
You have to modify the field type you are using in your schema.xml file. This 
is 
the "text" field type of Solr 1.4.1 exmple with this filter added:


  






  
  







  









De: Claudio Devecchi 
Para: solr-user@lucene.apache.org
Enviado: miércoles, 10 de noviembre, 2010 17:44:01
Asunto: Re: Search with accent

Ok tks,

I'm new with solr, my doubt is how can I enable theses feature. Or these
feature is already working by default?

Is this something to config on my schema.xml?

Tks!!


On Wed, Nov 10, 2010 at 6:40 PM, Tomas Fernandez Lobbe <
tomasflo...@yahoo.com.ar> wrote:

> That's what the ASCIIFoldingFilter does, it removes the accents, that's why
> you
> have to add it to the query analisis chain and to the index analysis chain,
> to
> search the same way you index.
>
>
>
> You can see how it works from the Analysis page on Solr Admin.
>
>
>
>
>
> 
> De: Savvas-Andreas Moysidis 
> Para: solr-user@lucene.apache.org
> Enviado: miércoles, 10 de noviembre, 2010 17:27:24
> Asunto: Re: Search with accent
>
> have you tried using a TokenFilter which removes accents both at
> indexing and searching time? If you index terms without accents and
> search the same
> way you should be able to find all documents as you require.
>
>
>
> On 10 November 2010 20:25, Tomas Fernandez Lobbe
> wrote:
>
> > It looks like ISOLatin1AccentFilter is deprecated on Solr 1.4.1, If you
> are
> > on
> > that version, you should use the ASCIIFoldingFilter instead.
> >
> > Like with any other filter, to use it, you have to add the filter factory
> > to the
> > analysis chain of the field type you are using:
> >
> > 
> >
> > Make sure you add it to the query and index analysis chain, otherwise
> > you'll
> > have extrage results.
> >
> > You'll have to perform a full reindex.
> >
> > Tomás
> >
> >
> >
> >
> > 
> > De: Claudio Devecchi 
> > Para: solr-user@lucene.apache.org
> > Enviado: miércoles, 10 de noviembre, 2010 17:08:06
> > Asunto: Re: Search with accent
> >
> > Tomas,
> >
> > Let me try to explain better.
> >
> > For example.
> >
> > - I have 10 documents, where 7 have the word pereque (without accent) and
> 3
> > have the word perequê (with accent)
> >
> > When I do a search pereque, solr is returning just 7, and when I do a
> > search
> > perequê solr is returning 3.
> >
> > But for me, these words are the same, and when I do some search for
> perequê
> > or pereque, it should show me 10 results.
> >
> >
> > About the ISOLatin you told, do you know how can I enable it?
> >
> > tks,
> > Claudio
> >
> > On Wed, Nov 10, 2010 at 5:00 PM, Tomas Fernandez Lobbe <
> > tomasflo...@yahoo.com.ar> wrote:
> >
> > > I don't understand, when the user search for perequê you want the
> results
> > > for
> > > perequê and pereque?
> > >
> > > If thats the case, any field type with ISOLatin1AccentFilterFactory
> > should
> > > work.
> > > The accent should be removed at index time and at query time (Make sure
> > the
> > > filter is being applied on both cases).
> > >
> > > Tomás
> > >
> > >
> > >
> > >
> > >
> > > 
> > > De: Claudio Devecchi 
> > > Para: Lista Solr 
> > > Enviado: miércoles, 10 de noviembre, 2010 15:16:24
> > > Asunto: Search with accent
> > >
> > > Hi all,
> > >
> > > Somebody knows how can I config my solr to make searches with and
> without
> > > accents?
> > >
> > > for example:
> > >
> > > pereque and perequê
> > >
> > >
> > > When I do it I need the same result, but its not working.
> > >
> > > tks
> > > --
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> > --
> > Claudio Devecchi
> > flickr.com/cdevecchi
> >
> >
> >
> >
>
>
>
>
>



-- 
Claudio Devecchi
flickr.com/cdevecchi



  

Re: Search with accent

2010-11-10 Thread Tomas Fernandez Lobbe
That's what the ASCIIFoldingFilter does, it removes the accents, that's why you 
have to add it to the query analisis chain and to the index analysis chain, to 
search the same way you index. 



You can see how it works from the Analysis page on Solr Admin.






De: Savvas-Andreas Moysidis 
Para: solr-user@lucene.apache.org
Enviado: miércoles, 10 de noviembre, 2010 17:27:24
Asunto: Re: Search with accent

have you tried using a TokenFilter which removes accents both at
indexing and searching time? If you index terms without accents and
search the same
way you should be able to find all documents as you require.



On 10 November 2010 20:25, Tomas Fernandez Lobbe
wrote:

> It looks like ISOLatin1AccentFilter is deprecated on Solr 1.4.1, If you are
> on
> that version, you should use the ASCIIFoldingFilter instead.
>
> Like with any other filter, to use it, you have to add the filter factory
> to the
> analysis chain of the field type you are using:
>
> 
>
> Make sure you add it to the query and index analysis chain, otherwise
> you'll
> have extrage results.
>
> You'll have to perform a full reindex.
>
> Tomás
>
>
>
>
> 
> De: Claudio Devecchi 
> Para: solr-user@lucene.apache.org
> Enviado: miércoles, 10 de noviembre, 2010 17:08:06
> Asunto: Re: Search with accent
>
> Tomas,
>
> Let me try to explain better.
>
> For example.
>
> - I have 10 documents, where 7 have the word pereque (without accent) and 3
> have the word perequê (with accent)
>
> When I do a search pereque, solr is returning just 7, and when I do a
> search
> perequê solr is returning 3.
>
> But for me, these words are the same, and when I do some search for perequê
> or pereque, it should show me 10 results.
>
>
> About the ISOLatin you told, do you know how can I enable it?
>
> tks,
> Claudio
>
> On Wed, Nov 10, 2010 at 5:00 PM, Tomas Fernandez Lobbe <
> tomasflo...@yahoo.com.ar> wrote:
>
> > I don't understand, when the user search for perequê you want the results
> > for
> > perequê and pereque?
> >
> > If thats the case, any field type with ISOLatin1AccentFilterFactory
> should
> > work.
> > The accent should be removed at index time and at query time (Make sure
> the
> > filter is being applied on both cases).
> >
> > Tomás
> >
> >
> >
> >
> >
> > 
> > De: Claudio Devecchi 
> > Para: Lista Solr 
> > Enviado: miércoles, 10 de noviembre, 2010 15:16:24
> > Asunto: Search with accent
> >
> > Hi all,
> >
> > Somebody knows how can I config my solr to make searches with and without
> > accents?
> >
> > for example:
> >
> > pereque and perequê
> >
> >
> > When I do it I need the same result, but its not working.
> >
> > tks
> > --
> >
> >
> >
> >
> >
>
>
>
> --
> Claudio Devecchi
> flickr.com/cdevecchi
>
>
>
>



  

Re: Search with accent

2010-11-10 Thread Tomas Fernandez Lobbe
It looks like ISOLatin1AccentFilter is deprecated on Solr 1.4.1, If you are on 
that version, you should use the ASCIIFoldingFilter instead.

Like with any other filter, to use it, you have to add the filter factory to 
the 
analysis chain of the field type you are using:



Make sure you add it to the query and index analysis chain, otherwise you'll 
have extrage results.

You'll have to perform a full reindex.

Tomás





De: Claudio Devecchi 
Para: solr-user@lucene.apache.org
Enviado: miércoles, 10 de noviembre, 2010 17:08:06
Asunto: Re: Search with accent

Tomas,

Let me try to explain better.

For example.

- I have 10 documents, where 7 have the word pereque (without accent) and 3
have the word perequê (with accent)

When I do a search pereque, solr is returning just 7, and when I do a search
perequê solr is returning 3.

But for me, these words are the same, and when I do some search for perequê
or pereque, it should show me 10 results.


About the ISOLatin you told, do you know how can I enable it?

tks,
Claudio

On Wed, Nov 10, 2010 at 5:00 PM, Tomas Fernandez Lobbe <
tomasflo...@yahoo.com.ar> wrote:

> I don't understand, when the user search for perequê you want the results
> for
> perequê and pereque?
>
> If thats the case, any field type with ISOLatin1AccentFilterFactory should
> work.
> The accent should be removed at index time and at query time (Make sure the
> filter is being applied on both cases).
>
> Tomás
>
>
>
>
>
> 
> De: Claudio Devecchi 
> Para: Lista Solr 
> Enviado: miércoles, 10 de noviembre, 2010 15:16:24
> Asunto: Search with accent
>
> Hi all,
>
> Somebody knows how can I config my solr to make searches with and without
> accents?
>
> for example:
>
> pereque and perequê
>
>
> When I do it I need the same result, but its not working.
>
> tks
> --
>
>
>
>
>



-- 
Claudio Devecchi
flickr.com/cdevecchi



  

Re: Search with accent

2010-11-10 Thread Tomas Fernandez Lobbe
I don't understand, when the user search for perequê you want the results for 
perequê and pereque?

If thats the case, any field type with ISOLatin1AccentFilterFactory should 
work. 
The accent should be removed at index time and at query time (Make sure the 
filter is being applied on both cases).

Tomás






De: Claudio Devecchi 
Para: Lista Solr 
Enviado: miércoles, 10 de noviembre, 2010 15:16:24
Asunto: Search with accent

Hi all,

Somebody knows how can I config my solr to make searches with and without
accents?

for example:

pereque and perequê


When I do it I need the same result, but its not working.

tks
--