Re: How to return more fields on Solr 4.5.1 Suggester?

2014-03-17 Thread Lajos

Hi Omer,

That's not how its meant to work; the suggester is giving you 
potentially matching terms by looking at the set of terms for the given 
field across the index.


Possibly you want to look at the MoreLikeThis component or handler? It 
will return matching documents, from which you have access to the fields 
you want.


Regards,

Lajos


On 17/03/2014 14:05, omer sonmez wrote:


I am using Solr 4.5.1 to suggest movies for my system. What i need solr to return not 
only the move_title but also the movie_id that belongs to the movie. As an example; 
this is kind of what i need:response
 lst name=responseHeader
 int name=status0/int
 int name=QTime1/int
 /lst
 lst name=spellcheck
 lst name=suggestions
 lst name=har
 int name=numFound6/int
 int name=startOffset0/int
 int name=endOffset3/int
 arr name=suggestion
 doc
 str name=name_autocompletehard eight (1996)/str
 str name=movie_id144/str
 /doc
 doc
 str name=name_autocompletehard rain (1998)/str
 str name=movie_id14/str
 /doc
 doc
 str name=name_autocompleteharlem (1993)/str
 str name=movie_id1044/str
 /doc
 /arr
 /lst
 /lst
 /lst
/response
My search component config is like :searchComponent name=suggest 
class=solr.SpellCheckComponent
 lst name=spellchecker
 str name=namesuggest/str
 str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
 str 
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
 str name=fieldname_autocomplete/str
 str name=spellcheck.onlyMorePopulartrue/str
 /lst
/searchComponent
My request hadler config is like:requestHandler name=/suggest 
class=org.apache.solr.handler.component.SearchHandler
 lst name=defaults
 str name=spellchecktrue/str
 str name=spellcheck.dictionarysuggest/str
 str name=spellcheck.count10/str
 /lst
 arr name=components
 strsuggest/str
 /arr
/requestHandler
and my shema config is like below:field name=movie_id type=string indexed=true stored=true 
multiValued=false required=true/
field name=movie_title type=text indexed=true stored=true 
multiValued=false /

!--field name=name_auto type=text_auto indexed=true stored=true 
multiValued=false /--
field name=name_autocomplete type=text_auto indexed=true stored=true 
multiValued=false /

copyField source=movie_title dest=name_autocomplete /
how can i manage to get other fiels using suggester in solr 4.5.1?
Thanks, 



Re: /suggest

2014-03-17 Thread Lajos

Hi Steve,

I've posted previously about a nice Stackoverflow exception I got when 
using this component ... can you post what you see?


I've used it successfully in with a custom dictionary like this:

  searchComponent name=newsuggester class=solr.SuggestComponent
lst name=suggester
  str name=namenewsuggester/str
  str name=lookupImplFuzzyLookupFactory/str
  str name=sourceLocationsuggestions.dict/str
  str name=storeDirnewsuggester/str
  str name=suggestAnalyzerFieldTypetext/str
  str name=buildOnCommittrue/str
  float name=threshold0.0/float
/lst
  /searchComponent

And that works fine, and is a nice improvement over the 
SpellCheckComponent because it supports fuzzy searching.


But this way, I get the overflow when using a text field:

  searchComponent name=suggest2 class=solr.SuggestComponent
lst name=suggester
  str name=namedefault/str
  str name=lookupImplFuzzyLookupFactory/str
  str name=dictionaryImplDocumentDictionaryFactory/str
  str name=fieldtitle/str
  str name=weightFieldprice/str
  str name=suggestAnalyzerFieldTypestring/str
/lst
  /searchComponent


Sure would like to see it work!

Regards,

Lajos Moczar




On 17/03/2014 22:11, Steve Huckle wrote:

Hi,

The Suggest Search Component that comes preconfigured in Solr 4.7.0
solrconfig.xml seems to thread dump when I call it:

http://localhost:8983/solr/suggest?spellcheck=onq=acwt=jsonindent=true

msg:No suggester named default was configured,

Can someone tell me what's going on there?

However, I can stop that happening if I replace the preconfigured
Suggest Search Component and Request Handler with the Search Component
and Request Handler configuration detailed here:

https://cwiki.apache.org/confluence/display/solr/Suggester

...but after indexing the data in exampledocs, it doesn't seem to return
any suggestions either. Can anyone help suggest how I might get suggest
suggesting suggestions?

Thanks,



Re: Problems using solr.SpatialRecursivePrefixTreeFieldType

2014-03-16 Thread Lajos

Hi Hamish,

Are you running Jetty?

In Tomcat, I've put jts-1.13.jar in the WEB-INF/lib directory of the 
unpacked distribution and restarted. It worked fine.


Maybe check file permissions as well ...

Regards,

Lajos



On 16/03/2014 10:18, Hamish Campbell wrote:

Hey all,

Trying to use SpatialRecursivePrefixTreeFieldType to store extent polygons
but I can't seem to get it configured correctly. Hitting this on start up:

4670 [main] ERROR org.apache.solr.core.SolrCore  - Error loading

core:java.util.concurrent.ExecutionException:
java.lang.NoClassDefFoundError:
com/vividsolutions/jts/geom/CoordinateSequenceFactory



Per the manual, and David Smiley's previous responses, I've added the jst
.jar files to WEB-INF/lib in the solr .war file.

I'm still getting the error above, any other clues?



Re: Best practice to support multi-tenant with Solr

2014-03-15 Thread Lajos

Hi Shushuai,

Just a few thoughts.

I would guess that most people would argue for implementing 
multi-tenancy within your core (via some unique filter ID) or collection 
(via document routing) because of the headache of managing individual 
cores at the scale you are talking about.


There are disadvantages the other way too: having a core/collection 
support multiple tenants does affect scoring, since TF-IDF is calculated 
across the index, and can open up security implications that you have to 
address (i.e. making sure a malicious query cannot get another tenants 
documents).


The most important thing you have to lock down is whether there is a 
need to customize the schema/solrconfig for each tenant. If there is, 
then having individual cores per tenant is going to be a stronger 
argument. If I was to guess, and based on my own multi-tenant 
experience, you'll have some high-end tenants who need their own 
cores/collections, and a larger number that can all share a 
configuration. Its like any kind of hosted solution: the cheapest 
version is one-size-fits-all and involves the minimum of management 
overhead, while the higher end are more expensive and require more 
management.


My own preference is for a blended environment. While the management of 
individual cores/collections is not to be taken lightly, I've done it in 
a variety of hosting situations and it all comes down to smart 
management and the intelligent use of administrative scripts. I've 
developed my own set of tools over the years and they work quite well.


Finally, I would (in general) argue for cloud-based implementations to 
give you data redundancy, but that decision would require more information.


HTH,

Lajos Moczar


theconsultantcto.com
Enterprise Lucene/Solr



On 14/03/2014 23:10, shushuai zhu wrote:

Hi,

I am looking into Solr 4.7 for best practice of multi-tenancy support. Our use 
cases require support of thousands of tenants (say 10,000) and the incoming 
data rate could be more than 10k documents per second. I did some research and 
found people talked about scaling tenants at all four levels:

Solr Cloud
Collection
Shard
Core

I am listing them plus some quoted comments from the links.

1) Solr Cloud and Collection

http://find.searchhub.org/document/c7caa34d807a8a1b#c7caa34d807a8a1b

---
Are you trying to do multi-tenant? If so, you should be talking
 multi-cluster where you externally manage your tenants,
 assigning them to clusters, but keeping tenants per cluster down in
 the dozens/hundreds, and archiving inactive tenants and spinning
 up (and down) clusters as inactive tenants become active or fall
 into inactivity. But keeping 1,000 or more tenants active in a
 single cluster as separate collections is... a no-go.
---

2) Shard

http://searchhub.org/2013/06/13/solr-cloud-document-routing/

---
Document routing can be used to achieve a more efficient
 multi-tenant environment. This can be done by making the tenant id
 the shard key, which would group all documents from the same tenant
 on the same shard.
---

3) Core

http://find.searchhub.org/document/4312991db2dd90e9#4312991db2dd90e9

---
Every multitenant situation is going to be different, but at the
 extreme a single core per tenant is the cleanest and provides the
 best separation, optimal performance, and supports full tf-idf
 relevancy of document fields for each tenant.
---

http://find.searchhub.org/document/fc5b734fba135e83#fc5b734fba135e83

---
Well, we try to use Solr to run a multi-tenant index/search
 service.  We assigns each client a different core with their own
 config and schema. It would be good for us if we can just let the
 customer to be able to create cores with their own schema and
 config.
---

I also saw slides talking about scaling time along Collection: timed
 collections (slides 50 ~ 58)

http://www.slideshare.net/sematext/solr-for-indexing-and-searching-logs

According to these, I am thinking about the following approach:

In a single Solr Cloud, the multi-tenant support is at Core level
 (one or more cores per tenant), and for better performance, will
 create a collection every day. When a tenant grows too big, will
 migrate it from this Solr cloud to a new Solr Cloud.

Any potential issue with this approach? Is there better approach
 based on your experience?

A few questions related to proposed approach:

1) When a core is replicated to multiple nodes via multiple shards,
 the query submitted against a particular core (tenant) should be
 executed distributed, right?
2) What is the best way to move a core from one Solr Cloud to
 another?
3) If we create one collection per day and want to keep data for
 three years for example, is it OK to have so many collections? If
 yes, is it cheap to maintain the collection alias for easy querying?

Thanks.

Shushuai



Re: Best practice to support multi-tenant with Solr

2014-03-15 Thread Lajos

Hi Shushuai,



---
Finally, I would (in general) argue for cloud-based implementations to give you 
data redundancy ...
---
Do you mean using multi-sharding to have multiple replicas of cores 
(corresponding to tenants) across nodes?

Shushuai





What I means first and foremost is that using SolrCloud with replication 
ensures that your data isn't lost if you lose a note. So in a hosted 
solution, that's a good thing.


If you are using SolrCloud, then its up to you to choose whether to have 
one collection per tenant, or one collection that supports multiple 
tenants via document routing.


Obviously the former has implications on the number of shards you'll 
have. For example, if you have a 3-node cluster with replication factor 
of 2, that's 6 shards per collection. If you have 1,000 tenant 
collections, that's 6,000 shards. Hence my argument for multiple low-end 
tenants per collection, and then only give your higher-end tenants their 
own collections. Just to make things simpler for you ;)


Regards,

Lajos





From: Lajos la...@protulae.com
To: solr-user@lucene.apache.org
Sent: Saturday, March 15, 2014 5:37 AM
Subject: Re: Best practice to support multi-tenant with Solr


Hi Shushuai,

Just a few thoughts.

I would guess that most people would argue for implementing
multi-tenancy within your core (via some unique filter ID) or collection
(via document routing) because of the headache of managing individual
cores at the scale you are talking about.

There are disadvantages the other way too: having a core/collection
support multiple tenants does affect scoring, since TF-IDF is calculated
across the index, and can open up security implications that you have to
address (i.e. making sure a malicious query cannot get another tenants
documents).

The most important thing you have to lock down is whether there is a
need to customize the schema/solrconfig for each tenant. If there is,
then having individual cores per tenant is going to be a stronger
argument. If I was to guess, and based on my own multi-tenant
experience, you'll have some high-end tenants who need their own
cores/collections, and a larger number that can all share a
configuration. Its like any kind of hosted solution: the cheapest
version is one-size-fits-all and involves the minimum of management
overhead, while the higher end are more expensive and require more
management.

My own preference is for a blended environment. While the management of
individual cores/collections is not to be taken lightly, I've done it in
a variety of hosting situations and it all comes down to smart
management and the intelligent use of administrative scripts. I've
developed my own set of tools over the years and they work quite well.

Finally, I would (in general) argue for cloud-based implementations to
give you data redundancy, but that decision would require more information.

HTH,

Lajos Moczar


theconsultantcto.com
Enterprise Lucene/Solr




On 14/03/2014 23:10, shushuai zhu wrote:

Hi,

I am looking into Solr 4.7 for best practice of multi-tenancy support. Our use 
cases require support of thousands of tenants (say 10,000) and the incoming 
data rate could be more than 10k documents per second. I did some research and 
found people talked about scaling tenants at all four levels:

Solr Cloud
Collection
Shard
Core

I am listing them plus some quoted comments from the links.

1) Solr Cloud and Collection

http://find.searchhub.org/document/c7caa34d807a8a1b#c7caa34d807a8a1b

---
Are you trying to do multi-tenant? If so, you should be talking
   multi-cluster where you externally manage your tenants,
   assigning them to clusters, but keeping tenants per cluster down in
   the dozens/hundreds, and archiving inactive tenants and spinning
   up (and down) clusters as inactive tenants become active or fall
   into inactivity. But keeping 1,000 or more tenants active in a
   single cluster as separate collections is... a no-go.
---

2) Shard

http://searchhub.org/2013/06/13/solr-cloud-document-routing/

---
Document routing can be used to achieve a more efficient
   multi-tenant environment. This can be done by making the tenant id
   the shard key, which would group all documents from the same tenant
   on the same shard.
---

3) Core

http://find.searchhub.org/document/4312991db2dd90e9#4312991db2dd90e9

---
Every multitenant situation is going to be different, but at the
   extreme a single core per tenant is the cleanest and provides the
   best separation, optimal performance, and supports full tf-idf
   relevancy of document fields for each tenant.
---

http://find.searchhub.org/document/fc5b734fba135e83#fc5b734fba135e83

---
Well, we try to use Solr to run a multi-tenant index/search
   service.  We assigns each client a different core with their own
   config

Re: Best practice to support multi-tenant with Solr

2014-03-15 Thread Lajos

Hi Shushuai,

Yes, as Robi noted, you have to be careful with terminology: core 
generally refers to the traditional Solr configuration of a single index 
+ configuration on a single node (optionally replicated to others). A 
collection is a distributed index that is associated with a 
configuration (but multiple collections can be associated with the same 
configuration).


A collection is still a single index, however, just like a core - its 
just spread out across however many nodes you have and replicated 
according to your chosen replication factor. You can do multi-tenancy 
with cores and collections, but via different strategies.


More inline ...


On 15/03/2014 19:17, shushuai zhu wrote:

Hi Lajos, thanks again.

Your suggestion is to support multi-tenant via collection in a Solr Cloud: 
putting small tenants in one collection and big tenants in their own 
collections.

My original question was to find out which approach is better: supporting 
multi-tenant at collection level or core level. Based on the links below and a 
few comments there, it seems people more prefer at core level. Collection is 
logical and core is physical. I am trying to figure out the trade-offs between 
the approaches regarding to scalability, security, performance, and 
flexibility. My understanding might be wrong, the belows are some rough 
comparison:

1) Scalability
Core is more scalable than collection by number: we can have much more cores 
than collections in one Solr Cloud? Or collection is more scalable than core by 
size: a collection could be much bigger than a core? Not sure which one is 
better: having ~1000 cores or ~1000 collections in a Solr Cloud.



SolrCloud is more scalable in terms of index size. Plus you get 
redundancy which can't be underestimated in a hosted solution.




2) Security
Core is more isolated than collection: core is physical and has its own index, 
but collection is logical so multiple collections may contain the same cores?



No: cores are not less or more isolated than collections. Both support 
multi-tenancy, albeit in different ways. If you do it in a core with 
some prefix or special field, you just have to be aware of security 
implications. As Robi said is easily enforced by the middle tier; I use 
Spring for this, in my case.



3) Performance
Core has better performance control since it has its own index? Collection 
index is bigger so performance is not as good as smaller core index?



Not really. You might want to test this, however, to verify with your 
specific hardware configuration.



4) Flexibilty
Core is more flexible since it has its own schema/config, but one collection 
may have multiple cores hence multiple schemas/configs? Or it does not matter 
since we can set same schema/config for the whole collection?



One could argue that the easiest configuration will be one big 
collection (or maybe divided up intelligently amongst several big 
collections). More complex is 1000s of cores or collections.


The issue is management. 1000s of cores/collections require a level of 
automation. On the other hand, having a single core/collection means if 
you make one change to the schema or solrconfig, it affects everyone. 
That might not work if you have frequent changes or differing tenant needs.


This is a decision you'll have to make yourself, based on your client 
needs, change management, index sizes, management system, etc, etc.



Regards,

Lajos



Basically, I just want to get opinions about which approach might be better for 
the given use case.

Regards.

Shushuai



From: Lajos la...@protulae.com
To: solr-user@lucene.apache.org
Sent: Saturday, March 15, 2014 1:19 PM
Subject: Re: Best practice to support multi-tenant with Solr


Hi Shushuai,



---
Finally, I would (in general) argue for cloud-based implementations to give you 
data redundancy ...
---
Do you mean using multi-sharding to have multiple replicas of cores 
(corresponding to tenants) across nodes?

Shushuai





What I means first and foremost is that using SolrCloud with replication
ensures that your data isn't lost if you lose a note. So in a hosted
solution, that's a good thing.

If you are using SolrCloud, then its up to you to choose whether to have
one collection per tenant, or one collection that supports multiple
tenants via document routing.

Obviously the former has implications on the number of shards you'll
have. For example, if you have a 3-node cluster with replication factor
of 2, that's 6 shards per collection. If you have 1,000 tenant
collections, that's 6,000 shards. Hence my argument for multiple low-end
tenants per collection, and then only give your higher-end tenants their
own collections. Just to make things simpler for you ;)

Regards,


Lajos





From: Lajos la...@protulae.com
To: solr-user@lucene.apache.org
Sent: Saturday, March 15, 2014 5:37 AM
Subject: Re

Re: SOLR cloud disaster recovery

2014-02-28 Thread Lajos

Hi Jan,

There are a few ways to do that, but no, nothing is automatic.

1) If your node is alive, you can create new replicas on the new node, 
let them replicate, verify they are ok, then delete the replicas on the 
old node and shut it down.


2) If your node is dead, create new replicas on the new node, let them 
replicate. You'll have to hand-edit clusterstate.json however, to fix 
the entries for the shards.


3) If you have a fully up-to-date backup of your dead node, just use the 
same hostname for your new node and restore the backups there. It should 
be fine. Just verify that the replicas for that node, as listed in 
clusterstate.json, are present and accounted for.


HTH,

Lajos


On 28/02/2014 16:17, Jan Van Besien wrote:

Hi,

I am a bit confused about how solr cloud disaster recovery is supposed
to work exactly in the case of loosing a single node completely.

Say I have a solr cloud cluster with 3 nodes. My collection is created
with numShards=3replicationFactor=3maxShardsPerNode=3, so there is
no data loss when I loose a node.

However, how do configure a new node to take the place of the dead
node? I bring up a new node (same hostname, ip, as the dead node)
which is completely empty (empty data dir, empty solr.xml), install
solr, and connect it to zookeeper.

Is it supposed to work automatically from there? In my tests, the
server has no cores and the solr-cloud graph overview simply shows all
the shards/replicas on this node as down. Do I need to recreate the
cores first? Note that these cores were initially created indirectly
by creating the collection.

Thanks,
Jan



Re: SOLR cloud disaster recovery

2014-02-28 Thread Lajos

Hi Jan,

There are a few ways to do that, but no, nothing is automatic.

1) If your node is alive, you can create new replicas on the new node, 
let them replicate, verify they are ok, then delete the replicas on the 
old node and shut it down.


2) If your node is dead, create new replicas on the new node, let them 
replicate. You'll have to hand-edit clusterstate.json however, to fix 
the entries for the shards.


3) If you have a fully up-to-date backup of your dead node, just use the 
same hostname for your new node and restore the backups there. It should 
be fine. Just verify that the replicas for that node, as listed in 
clusterstate.json, are present and accounted for.


HTH,

Lajos


On 28/02/2014 16:17, Jan Van Besien wrote:

Hi,

I am a bit confused about how solr cloud disaster recovery is supposed
to work exactly in the case of loosing a single node completely.

Say I have a solr cloud cluster with 3 nodes. My collection is created
with numShards=3replicationFactor=3maxShardsPerNode=3, so there is
no data loss when I loose a node.

However, how do configure a new node to take the place of the dead
node? I bring up a new node (same hostname, ip, as the dead node)
which is completely empty (empty data dir, empty solr.xml), install
solr, and connect it to zookeeper.

Is it supposed to work automatically from there? In my tests, the
server has no cores and the solr-cloud graph overview simply shows all
the shards/replicas on this node as down. Do I need to recreate the
cores first? Note that these cores were initially created indirectly
by creating the collection.

Thanks,
Jan



StackOverflow ... the errors, not the site

2014-02-28 Thread Lajos

All,

Just playing around with the SuggestComponent, trying to compare results 
with the old-style spell-check-based suggester. Tried this config 
against a string field:


  requestHandler name=/suggest2 class=solr.SearchHandler
 lst name=defaults
   str name=wtjson/str
   str name=indenttrue/str
   str name=suggesttrue/str
   str name=suggest.count10/str
   str name=suggest.dictionarydefault/str
 /lst
 arr name=components
   strsuggest2/str
 /arr
  /requestHandler

  searchComponent name=suggest2 class=solr.SuggestComponent
lst name=suggester
  str name=namedefault/str
  str name=lookupImplFuzzyLookupFactory/str
  str name=dictionaryImplDocumentDictionaryFactory/str
  str name=fieldtitle/str
  str name=weightFieldprice/str
  str name=suggestAnalyzerFieldTypestring/str
/lst
  /searchComponent

I hit this URL:

/suggest2?q=absuggest.build=true

and that works, but because title was as StrField, it wasn't quite 
what I wanted.


So I tried a TextField, description. And I get this, with the same URL:

ERROR - 2014-02-28 17:29:49.618; org.apache.solr.common.SolrException; 
null:java.lang.RuntimeException: java.lang.StackOverflowError^M
at 
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:796)^M
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:448)^M
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)^M
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)^M

at
...
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)^M
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)^M

at java.lang.Thread.run(Thread.java:662)^M
Caused by: java.lang.StackOverflowError^M
at 
org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:244)^M
at 
org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:259)^M
at 
org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:259)^M
at 
org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:259)^M
at 
org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:259)^M
at 
org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:259)^M


etc etc


Any ideas??

Thanks,

Lajos




excludeIds in QueryElevationComponent (4.7)

2014-02-25 Thread Lajos

Guys,

I've been testing out https://issues.apache.org/jira/browse/SOLR-5541 on 
4.7RC4.


I previously had an elevate.xml that elevated 3 documents for a specific 
query. My understanding is that I could, at runtime, exclude one of 
those. So I tried that like this:


http://localhost:8080/solr/ecommerce/search?q=canonexcludeIds=208464207

and now NONE of my documents are elevated. What I would have expected is 
that I'd have 2 elevated documents, but the 208464207 would not be 
amongst them.


Sadly, what happens is that now nothing is elevated.

Am I misunderstanding something or should I open a JIRA? Looking at the 
source code I can't immediately see what would be wrong.


Thanks,

Lajos


Re: excludeIds in QueryElevationComponent (4.7)

2014-02-25 Thread Lajos

Hit the send button too fast ...

What is seems that is happening is that excludeIds or elevateIds ignores 
what's in elevate.xml. I would have expected (hoped) that it would layer 
on top of that, which makes a bit more sense I think.


Thanks,

Lajos


On 25/02/2014 22:58, Lajos wrote:

Guys,

I've been testing out https://issues.apache.org/jira/browse/SOLR-5541 on
4.7RC4.

I previously had an elevate.xml that elevated 3 documents for a specific
query. My understanding is that I could, at runtime, exclude one of
those. So I tried that like this:

http://localhost:8080/solr/ecommerce/search?q=canonexcludeIds=208464207

and now NONE of my documents are elevated. What I would have expected is
that I'd have 2 elevated documents, but the 208464207 would not be
amongst them.

Sadly, what happens is that now nothing is elevated.

Am I misunderstanding something or should I open a JIRA? Looking at the
source code I can't immediately see what would be wrong.

Thanks,

Lajos


Re: excludeIds in QueryElevationComponent (4.7)

2014-02-25 Thread Lajos

Thanks Hoss, that makes sense.

Anyway, I like the new paradigm better ... it allows for more 
intelligent elevation control.


Cheers,

L


On 25/02/2014 23:26, Chris Hostetter wrote:


: What is seems that is happening is that excludeIds or elevateIds ignores
: what's in elevate.xml. I would have expected (hoped) that it would layer on
: top of that, which makes a bit more sense I think.

That's not how it's implemented -- i believe Joel implemented this way
intentional because otherwise if the elevate.xml said elevate A,B and
exclude X,Y there would be no simple way to say instead of what's in
elevate.xml, i want to elevate X,Y and i don't wnat to exclude *anything*

I made sure this was explicitly documented in the ref guide...

https://cwiki.apache.org/confluence/display/solr/The+Query+Elevation+Component#TheQueryElevationComponent-TheelevateIdsandexcludeIdsParameters

If either one of these parameters is specified at request time, the the
entire elevation configuration for the query is ignored.



-Hoss
http://www.lucidworks.com/



Unloading a SolrCloud core in 4.6.0

2014-02-13 Thread Lajos

Hi all,

I just want to verify that it is no longer possible to unload a Cloud 
core via the Core API UNLOAD command, correct?


I had two situations: one where I wanted to remove old replicas in a 
node that I was deactivating (and I had already created new replicas) 
and one where I needed to remove a shard I split.


In both cases I got this nice stack trace:

response
lst name=responseHeaderint name=status500/intint 
name=QTime1/int/lstlst name=errorstr 
name=tracejava.lang.NullPointerException
at 
org.apache.solr.core.CorePropertiesLocator.delete(CorePropertiesLocator.java:95)
at 
org.apache.solr.core.CoreContainer.remove(CoreContainer.java:754)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleUnloadAction(CoreAdminHandler.java:589)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:162)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1041)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:603)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)

at java.lang.Thread.run(Thread.java:662)
/strint name=code500/int/lst
/response

I had to resort to DELETEREPLICA, which worked fine, but I just wanted 
to verify whether this is a bug or intended behavior. Lots of older docs 
say to use UNLOAD for these situations.


Thanks,

Lajos



Re: Announce list

2014-02-03 Thread Lajos

There's always http://projects.apache.org/feeds/rss.xml.

L


On 03/02/2014 14:59, Arie Zilberstein wrote:

Hi,

Is there a mailing list for getting just announcements about new versions?

Thanks,
Arie



Re: Solr middle-ware?

2014-01-22 Thread Lajos

I always go for SolrJ as the intermediate layer, usually in a Spring app.

I have sometimes proxied directly to Solr itself, but since we use a lot 
of Ajax, I'm not comfortable with exposing the Solr URIs directly, even 
if controlled via a proxy.


Having it go through a webapp gives me a layer I can use to validate 
input; if ever the situation warranted, I could use a filter to check 
for anything malicious. I can also layer security on top as well.


Cheers,

Lajos


On 22/01/2014 06:45, Alexandre Rafalovitch wrote:

So, everybody so far is exposing Solr directly to the web, but with
proxy/rewriting. Which means the html/JS libraries are Solr
query-format aware as well?

Is anybody using Solr clients (SolrNet, SolrJ) as a base?

Regards,
Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Tue, Jan 21, 2014 at 9:05 PM, Artem Karpenko gooy...@gmail.com wrote:

Hello. Not really middle-ware but might be of interest concerning possible
ways implementing security.

We use custom built Solr with web.xml including Spring Security filter and
appropriate infrastructure classes for authentication added as a dependency
into project. We pass token from frontend in each request. If it's accepted
in security filter then later user role (identified from token) is used in
custom request handler that modifies query according to role permissions.

Regards,
Artem.

21.01.2014 15:08, Markus Jelsma пишет:


Hi - We use Nginx to expose the index to the internet. It comes down to
putting some limitations on input parameters and on-the-fly rewrite of
queries using embedded Perl scripting. Limitations and rewrites are usually
just a bunch of regular expressions, so it is not that hard.

Cheers
Markus
 -Original message-


From:Alexandre Rafalovitch arafa...@gmail.com
Sent: Tuesday 21st January 2014 14:01
To: solr-user@lucene.apache.org
Subject: Solr middle-ware?

Hello,

All the Solr documents talk about not running Solr directly to the
cloud. But I see people keep asking for a thin secure layer in front
of Solr they can talk from JavaScript to, perhaps with some basic
extension options.

Has anybody actually written one? Open source or in a community part
of larger project? I would love to be able to point people at
something.

Is there something particularly difficult about writing one? Does
anybody has a story of aborted attempt or mid-point reversal? I would
like to know.

Regards,
 Alex.
P.s. Personal context: I am thinking of doing a series of lightweight
examples of how to use Solr. Like I did for a book, but with a bit
more depth and something that can actually be exposed to the live web
with live data. I don't want to reinvent the wheel of the thin Solr
middleware.
P.p.s. Though I keep thinking that Dart could make an interesting
option for the middleware as it could have the same codebase on the
server and in the client. Like NodeJS, but with saner syntax.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)





Solr Cloud on HDFS

2014-01-22 Thread Lajos

Hi all,

I've been running Solr on HDFS, and that's fine.

But I have a Cloud installation I thought I'd try on HDFS. I uploaded 
the configs for the core that runs in standalone mode already on HDFS 
(on another cluster). I specify the HdfsDirectoryFactory, HDFS data dir, 
solr.hdfs.home, and HDFS update log path:


  dataDirhdfs://master:9000/solr/test/data/dataDir

  directoryFactory name=DirectoryFactory
class=solr.HdfsDirectoryFactory
 str name=solr.hdfs.homehdfs://master:9000/solr/str
  /directoryFactory

  updateHandler class=solr.DirectUpdateHandler2
updateLog
  str name=dirhdfs://master:9000/solr/test/ulog/str
/updateLog
  /updateHandler

Question is: should I create my collection differently than I would a 
normal collection?


If I just try that, Solr will initialise the directory in HDFS as if it 
were a single core. It will create shard directories on my nodes, but 
not actually put anything in there. And then it will complain mightily 
about not being able to forward updates to other nodes. (This same 
cluster hosts regular collections, and everything is working fine).


Am I missing a step? Do I have to manually create HDFS directories for 
each replica?


Thanks,

L


Re: Solr Cloud on HDFS

2014-01-22 Thread Lajos
Uugh. I just realised I should have take out the data dir and update log 
definitions! Now it works fine.


Cheers,

L


On 22/01/2014 11:47, Lajos wrote:

Hi all,

I've been running Solr on HDFS, and that's fine.

But I have a Cloud installation I thought I'd try on HDFS. I uploaded
the configs for the core that runs in standalone mode already on HDFS
(on another cluster). I specify the HdfsDirectoryFactory, HDFS data dir,
solr.hdfs.home, and HDFS update log path:

   dataDirhdfs://master:9000/solr/test/data/dataDir

   directoryFactory name=DirectoryFactory
 class=solr.HdfsDirectoryFactory
  str name=solr.hdfs.homehdfs://master:9000/solr/str
   /directoryFactory

   updateHandler class=solr.DirectUpdateHandler2
 updateLog
   str name=dirhdfs://master:9000/solr/test/ulog/str
 /updateLog
   /updateHandler

Question is: should I create my collection differently than I would a
normal collection?

If I just try that, Solr will initialise the directory in HDFS as if it
were a single core. It will create shard directories on my nodes, but
not actually put anything in there. And then it will complain mightily
about not being able to forward updates to other nodes. (This same
cluster hosts regular collections, and everything is working fine).

Am I missing a step? Do I have to manually create HDFS directories for
each replica?

Thanks,

L


Re: Solr Cloud on HDFS

2014-01-22 Thread Lajos

Thanks Mark ... indeed, some doc updates would help.

Regarding what seems to be a popular question on sharding. It seems that 
it would be a Good Thing that the shards for a collection running HDFS 
essentially be pointers to the HDFS-replicated index. Is that what your 
thinking is?


I've been following your work recently, would be interested in helping 
out on this if there's the chance.


Is there a JIRA yet on this issue?

Thanks,

lajos


On 22/01/2014 16:57, Mark Miller wrote:

Right - solr.hdfs.home is the only setting you should use with SolrCloud.

The documentation should probably be improved.

If you set the data dir or ulog location in solrconfig.xml explicitly, it will 
be the same for every collection. SolrCloud shares the solrconfig.xml across 
SolrCore’s, and this will not work out.

By setting solr.hdfs.home and leaving the relative defaults, all of the 
locations are correctly set for each different collection under solr.hdfs.home 
without any effort on your part.

- Mark



On Jan 22, 2014, 7:22:22 AM, Lajos la...@protulae.com wrote: Uugh. I just 
realised I should have take out the data dir and update log
definitions! Now it works fine.

Cheers,

L


On 22/01/2014 11:47, Lajos wrote:

Hi all,

I've been running Solr on HDFS, and that's fine.

But I have a Cloud installation I thought I'd try on HDFS. I uploaded
the configs for the core that runs in standalone mode already on HDFS
(on another cluster). I specify the HdfsDirectoryFactory, HDFS data dir,
solr.hdfs.home, and HDFS update log path:

dataDirhdfs://master:9000/solr/test/data/dataDir

directoryFactory name=DirectoryFactory
class=solr.HdfsDirectoryFactory
str name=solr.hdfs.homehdfs://master:9000/solr/str
/directoryFactory

updateHandler class=solr.DirectUpdateHandler2
updateLog
str name=dirhdfs://master:9000/solr/test/ulog/str
/updateLog
/updateHandler

Question is: should I create my collection differently than I would a
normal collection?

If I just try that, Solr will initialise the directory in HDFS as if it
were a single core. It will create shard directories on my nodes, but
not actually put anything in there. And then it will complain mightily
about not being able to forward updates to other nodes. (This same
cluster hosts regular collections, and everything is working fine).

Am I missing a step? Do I have to manually create HDFS directories for
each replica?

Thanks,

L




Re: Solr Cloud on HDFS

2014-01-22 Thread Lajos

Cool Mark, I'll keep an eye on this one.

L


On 22/01/2014 22:36, Mark Miller wrote:

Whoops, hit the send keyboard shortcut.

I just created a JIRA issue for the first bit I’ll be working on:

SOLR-5656: When using HDFS, the Overseer should have the ability to reassign 
the cores from failed nodes to running nodes.

- Mark



On Jan 22, 2014, 12:57:46 PM, Lajos la...@protulae.com wrote: Thanks Mark ... 
indeed, some doc updates would help.

Regarding what seems to be a popular question on sharding. It seems that
it would be a Good Thing that the shards for a collection running HDFS
essentially be pointers to the HDFS-replicated index. Is that what your
thinking is?

I've been following your work recently, would be interested in helping
out on this if there's the chance.

Is there a JIRA yet on this issue?

Thanks,

lajos


On 22/01/2014 16:57, Mark Miller wrote:

Right - solr.hdfs.home is the only setting you should use with SolrCloud.

The documentation should probably be improved.

If you set the data dir or ulog location in solrconfig.xml explicitly, it will 
be the same for every collection. SolrCloud shares the solrconfig.xml across 
SolrCore’s, and this will not work out.

By setting solr.hdfs.home and leaving the relative defaults, all of the 
locations are correctly set for each different collection under solr.hdfs.home 
without any effort on your part.

- Mark



On Jan 22, 2014, 7:22:22 AM, Lajos la...@protulae.com wrote: Uugh. I just 
realised I should have take out the data dir and update log
definitions! Now it works fine.

Cheers,

L


On 22/01/2014 11:47, Lajos wrote:

Hi all,

I've been running Solr on HDFS, and that's fine.

But I have a Cloud installation I thought I'd try on HDFS. I uploaded
the configs for the core that runs in standalone mode already on HDFS
(on another cluster). I specify the HdfsDirectoryFactory, HDFS data dir,
solr.hdfs.home, and HDFS update log path:

dataDirhdfs://master:9000/solr/test/data/dataDir

directoryFactory name=DirectoryFactory
class=solr.HdfsDirectoryFactory
str name=solr.hdfs.homehdfs://master:9000/solr/str
/directoryFactory

updateHandler class=solr.DirectUpdateHandler2
updateLog
str name=dirhdfs://master:9000/solr/test/ulog/str
/updateLog
/updateHandler

Question is: should I create my collection differently than I would a
normal collection?

If I just try that, Solr will initialise the directory in HDFS as if it
were a single core. It will create shard directories on my nodes, but
not actually put anything in there. And then it will complain mightily
about not being able to forward updates to other nodes. (This same
cluster hosts regular collections, and everything is working fine).

Am I missing a step? Do I have to manually create HDFS directories for
each replica?

Thanks,

L






Re: Advantages of different Servlet Containers

2009-10-02 Thread Lajos
Just go for Tomcat. For all its problems, and I should know having used 
it since it was originally JavaWebServer, it is perfectly capable of 
handling high-end production environments provided you tune it 
correctly. We use it with our customized Solr 1.3 version without any 
problems.


Lajos


Simon Wistow wrote:
I know that the Solr FAQ says 

Users should decide for themselves which Servlet Container they 
consider the easiest/best for their use cases based on their 
needs/experience. For high traffic scenarios, investing time for tuning 
the servlet container can often make a big difference.


but is there anywhere that lists some of the variosu advantages and 
disadvantages of, say, Tomcat over Jetty for someone who isn't current 
with the Java ecosystem?


Also, I'm currently using Jetty but I've had to do a horrific hack to 
make it work under init.d in that I start it up in the background and 
then tail the output waiting for the line that says the SocketConnector 
has been started


   while [ '' = $(tail -1 $LOG | grep 'Started SocketConnector')  ] ; 
   do

   sleep 1
   done

There's *got* to be a better way of doing this, right? 


Thanks,

Simon






No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 8.5.409 / Virus Database: 270.14.2/2408 - Release Date: 10/01/09 18:23:00




Help! Issue with tokens in custom synonym filter

2009-08-31 Thread Lajos

Hi all,

I've been writing some custom synonym filters and have run into an issue 
with returning a list of tokens. I have a synonym filter that uses the 
WordNet database to extract synonyms. My problem is how to define the 
offsets and position increments in the new Tokens I'm returning.


For an input token, I get a list of synonyms from the WordNet database. 
I then create a ListToken of those results. Each Token is created with 
the same startOffset, endOffset and positionIncrement of the input 
Token. Is this correct? My understanding from looking at the Lucene 
codebase is that the startOffset/endOffset should be the same, as we are 
referring to the same term in the original text. However, I don't quite 
get the positionIncrement. I understand that it is relative to the 
previous term ... does this mean all my synonyms should have a 
positionIncrement of 0? But whether I use 0 or the positionIncrement of 
the original input Token, Solr seems to ignore the returned tokens ...


This is a summary of what is in my filter:

*

private IteratorToken output;
private ArrayListToken synonyms = null;

public Token next(Token in) throws IOException {
  if (output != null) {
// Here we are just outputing matched synonyms
// that we previously created from the input token
// The input token has already been returned
if (output.hasNext()) {
  return output.next();
} else {
  return null;
}
  }

  synonyms = new ArrayListToken();

  Token t = input.next(in);
  if (t == null) return null;

  String value = new String(t.termBuffer(), 0,
t.termLength()).toLowerCase();

  // Get list of WordNet synonyms (code removed)
  // Iterate thru WordNet synonyms
  for (String wordNetSyn : wordNetSyns) {
Token synonym = new Token(t.startOffset(), t.endOffset(), 
t.type());	synonym.setPositionIncrement(t.getPositionIncrement());

synonym.setTermBuffer(wordNetSyn .toCharArray(), 0,
  wordNetSyn .length());
synonyms.add(synonym);
  }

  output = synonyms.iterator();

  // Return the original word, we want it
  return t;
}


Re: Help! Issue with tokens in custom synonym filter

2009-08-31 Thread Lajos

Hi David  Ahmet,

I hadn't seen the SynonymTokenFilter from Lucene, so that helped. 
Ultimately, however, it seems I was pretty much doing the right thing, 
although my token type might have been wrong.


Unfortunately, while the tokens are being returned properly (AFAIK), 
when I do a query using one of the synonyms, I can't get any results. 
This is not the case if I just directly code in the synonym into the 
synonyms file with the standard solr synonym filter.


So I'll have to keep on hacking away ;)

Regarding generating the file from WordNet, we'd considered that but our 
requirements essentially mean we have to do the heavy lifting within the 
filter itself. Not that I'm opposed, it is just that I'm apparently 
missing something simple still.


Thanks for the replies.

Lajos


Smiley, David W. wrote:

Although this is not a direct answer to your question, you may want to consider 
generating a synonyms file from wordnet.  Then, you can use the standard 
synonym filter in Solr.  The only downside to this is that the synonym file 
might be pretty large... but you've probably got some large file for wordnet 
data any way.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/31/09 10:32 AM, Lajos la...@protulae.com wrote:

Hi all,

I've been writing some custom synonym filters and have run into an issue
with returning a list of tokens. I have a synonym filter that uses the
WordNet database to extract synonyms. My problem is how to define the
offsets and position increments in the new Tokens I'm returning.

For an input token, I get a list of synonyms from the WordNet database.
I then create a ListToken of those results. Each Token is created with
the same startOffset, endOffset and positionIncrement of the input
Token. Is this correct? My understanding from looking at the Lucene
codebase is that the startOffset/endOffset should be the same, as we are
referring to the same term in the original text. However, I don't quite
get the positionIncrement. I understand that it is relative to the
previous term ... does this mean all my synonyms should have a
positionIncrement of 0? But whether I use 0 or the positionIncrement of
the original input Token, Solr seems to ignore the returned tokens ...

This is a summary of what is in my filter:

*

private IteratorToken output;
private ArrayListToken synonyms = null;

public Token next(Token in) throws IOException {
   if (output != null) {
 // Here we are just outputing matched synonyms
 // that we previously created from the input token
 // The input token has already been returned
 if (output.hasNext()) {
   return output.next();
 } else {
   return null;
 }
   }

   synonyms = new ArrayListToken();

   Token t = input.next(in);
   if (t == null) return null;

   String value = new String(t.termBuffer(), 0,
 t.termLength()).toLowerCase();

   // Get list of WordNet synonyms (code removed)
   // Iterate thru WordNet synonyms
   for (String wordNetSyn : wordNetSyns) {
 Token synonym = new Token(t.startOffset(), t.endOffset(),
t.type());  synonym.setPositionIncrement(t.getPositionIncrement());
 synonym.setTermBuffer(wordNetSyn .toCharArray(), 0,
   wordNetSyn .length());
 synonyms.add(synonym);
   }

   output = synonyms.iterator();

   // Return the original word, we want it
   return t;
}







No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 8.5.409 / Virus Database: 270.13.71/2334 - Release Date: 08/29/09 17:51:00