Re: A bad idea to store core data directory over NAS?

2014-11-05 Thread Toke Eskildsen
On Tue, 2014-11-04 at 22:57 +0100, Gili Nachum wrote:
 My data center is out of SAN or local disk storage - is it a big no-no to
 store Solr core data folder over NAS?

It depends on your NAS speed. Both Walter and David are right: It can
perform really bad or quite satisfactory. We briefly experimented with
using 400GB of Isilon ( http://www.emc.com/isilon ) SSD cache as backend
for a searcher. As far as I remember, speed was surprisingly fine; about
3 times slower than with similar local storage. As we needed 20TB+ of
index, it would be too expensive for us to use the enterprise NAS system
though (long story).

 The NAS mount would be accessed by a single machine. I do care about
 performance.

I have a vision of a off-the-shelf 4-drive box Gorilla-taped to the side
of a server rack :-)

Or in other words: If the SAN is only to be used by a single machine,
this will be more of a kludge than a solid solution. Is it not possible
to upgrade local storage to hold the data? How large an index are we
talking about?

 If I do go with NAS. Should I expect index corruption and other oddities?

Not that I know of. As the NAS is dedicated, you won't compete for
performance there. Do check if your network is fast enough though.


- Toke Eskildsen, State and University Library, Denmark
I highly recommend Gorilla Tape for semi-permanent kludges.



add and then delete same document before commit,

2014-11-05 Thread Matteo Grolla
Can anyone tell me the behavior of solr (and if it's consistent) when I do what 
follows:
1) add document x
2) delete document x
3) commit

I've tried with solr 4.5.0 and document x get's indexed 

Matteo

Re: Analytics result for each Result Group

2014-11-05 Thread Talat Uyarer
I searched wiki pages about that. I do not find any documentation. If
you help me I will be glad.

Thanks

2014-11-04 11:34 GMT+02:00 Talat Uyarer ta...@uyarer.com:
 Hi folks,

 We use Analytics Component for median, max etc. I wonder if I use
 group.field parameter with analytics component, How to calculate
 analytics for each result group ?

 Thanks

 --
 Talat UYARER
 Websitesi: http://talat.uyarer.com
 Twitter: http://twitter.com/talatuyarer
 Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304



-- 
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304


Best way to map holidays to corresponding date

2014-11-05 Thread Patrick Kirsch
Hey,
 maybe someone already faced the situation and could give me a hint.

Given one query includes Easter or Sylvester I search for the best
place to translate the string to the corresponding date.

Is there any solr.Mapping*Factory for that?
Do I need to implement it in a custom Solr Query Parser etc.?

Regards,
 Patrick


Re: indexing errors when storeOffsetsWithPositions=true in solr 4.9.1

2014-11-05 Thread Alan Woodward
Hi Min,

Do you have the specific bit of text that caused this exception to be thrown?

Alan Woodward
www.flax.co.uk


On 4 Nov 2014, at 23:15, Min L wrote:

 Hi All:
 
 I am using solr 4.9.1. and trying to use PostingsSolrHighlighter. But I got
 errors during indexing. I thought LUCENE-5111 has fixed issues with
 WordDelimitedFilter. The error is as below:
 
 Caused by: java.lang.IllegalArgumentException: startOffset must be
 non-negative, and endOffset must be = startOffset, and offsets must
 not go backwards startOffset=31,endOffset=44,lastStartOffset=37 for
 field 'description_texts'
   at 
 org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:630)
   at 
 org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:342)
   at 
 org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:301)
   at 
 org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:451)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1539)
   at 
 org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240)
   at 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)
 
 
 My schema.xml looks like below:
 
 dynamicField name=*_texts stored=true type=text multiValued=true
 indexed=true storeOffsetsWithPositions=true/
 
 fieldType name=text class=solr.TextField omitNorms=false
 
  analyzer type=index
 
charFilter class=solr.HTMLStripCharFilterFactory/
 
tokenizer class=solr.WhitespaceTokenizerFactory/
 
filter class=solr.LowerCaseFilterFactory/
 
filter class=solr.StemmerOverrideFilterFactory dictionary=
 stemdict_en.txt /
 
filter class=solr.PatternReplaceFilterFactory pattern=
 ^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2/
 
filter class=solr.KStemFilterFactory/
 
filter class=solr.StopFilterFactory words=stopwords_english.txt
 ignoreCase=true enablePositionIncrements=true /
 
filter class=solr.WordDelimiterFilterFactory preserveOriginal=1
 splitOnNumerics=0 catenateWords=1 /
 
  /analyzer
 
  analyzer type=query
 
tokenizer class=solr.WhitespaceTokenizerFactory/
 
filter class=solr.LowerCaseFilterFactory/
 
filter class=solr.StopFilterFactory words=stopwords_english.txt
 ignoreCase=true enablePositionIncrements=true /
 
filter class=solr.WordDelimiterFilterFactory preserveOriginal=1
 splitOnNumerics=0 catenateWords=1 /
 
filter class=solr.StemmerOverrideFilterFactory dictionary=
 stemdict_en.txt /
 
filter class=solr.KStemFilterFactory/
 
  /analyzer
 
/fieldType
 
 
 Any help is appreciated.
 
 
 Thanks.
 
 Min



Re: add and then delete same document before commit,

2014-11-05 Thread Alexandre Rafalovitch
Do you have soft commits enabled by any chance in solrconfig.XML?

Regards,
Alex
On 05/11/2014 4:48 am, Matteo Grolla matteo.gro...@gmail.com wrote:

 Can anyone tell me the behavior of solr (and if it's consistent) when I do
 what follows:
 1) add document x
 2) delete document x
 3) commit

 I've tried with solr 4.5.0 and document x get's indexed

 Matteo


Re: Best way to map holidays to corresponding date

2014-11-05 Thread Jack Krupansky
Unfortunately, a date is a non-analyzed field, so you can't do something 
like a synonym.


Further, Holidays are repeating - every year - and the dates can vary, so 
they won't match exactly.


Use an update request processor to examine the date field at values index 
time and look up and store the holiday name in a text field so that you can 
do a query such as holiday:easter. It could be a string field, but they 
case would have to match exactly You could code this in a JavaScript script 
that has the logic or hard dates for various holidays using the stateless 
script update processor.


-- Jack Krupansky

-Original Message- 
From: Patrick Kirsch

Sent: Wednesday, November 5, 2014 6:12 AM
To: solr-user@lucene.apache.org
Subject: Best way to map holidays to corresponding date

Hey,
maybe someone already faced the situation and could give me a hint.

Given one query includes Easter or Sylvester I search for the best
place to translate the string to the corresponding date.

Is there any solr.Mapping*Factory for that?
Do I need to implement it in a custom Solr Query Parser etc.?

Regards,
Patrick 



Re: add and then delete same document before commit,

2014-11-05 Thread Jack Krupansky
Document x doesn't exist - in terms of visibility - until the commit, so the 
delete will no-op since a query of Lucene will not see the uncommitted new 
document.


-- Jack Krupansky

-Original Message- 
From: Matteo Grolla

Sent: Wednesday, November 5, 2014 4:47 AM
To: solr-user@lucene.apache.org
Subject: add and then delete same document before commit,

Can anyone tell me the behavior of solr (and if it's consistent) when I do 
what follows:

1) add document x
2) delete document x
3) commit

I've tried with solr 4.5.0 and document x get's indexed

Matteo= 



on regards to Solr and NoSQL storages integration

2014-11-05 Thread andrey prokopenko
Greetings Comrades.
There were numerous requests and considerations on using Solr as both
search engine and NoSQL store at the same time.
While being an excellent tool as a search engine, Solr is looking not so
good when it comes to storing documents and various stored fields,
especially with big amount of data. Index quickly grows to unmanageable
sizes. Then, there is ever-coming PITA problem
with partial document update: due to the nature of Lucene/Solr index,
documents can't be updated, they always need to be deleted  inserted.
All in all, Solr desperately need a tight integration with some document
storage, offloading stored fields of the document and
transactionally coupled with search index itself, so that stored field are
at all times synced with the other parts
of the index (terms, doc values etc.).

Unfortunately, unlike Lucene, Solr does not offer full set of distribiuted
transaction API commands, thus seriously complicating this matter. Luckily,
with advent of Solr 4.0 now we have abilitu to create not only the custom
Directory, but also completely tweak the index structure any way we like.
Based on this new feature I've created my custom Directory + custom codec,
integrating Solr with Oracle NoSQL key-value store.
My codec is based on Solr 4.10.1 API and Oracle NoSQL 1.2.1.8 Community
Edition. Fields in NoSQL storage are persisted using primary key, derived
from the document fields. The codec relays stored fields to the NOSQL store
while keeping all other index components in usual file-based storage
layout. The codec has been made with SolrCloud and NoSQL own fault
tolerance usage in mind, hence it's tried to ignore wrote commands to NoSQL
storage if index is being created at replica node which is not a Solr shard
leader currently. First stable version of the codec transparently supports
full index life cycle, includung segment creation, merging and deletion.
Source code and readme, detaling usage instructions for the codec can be
found at github: https://github.com/andrey42/onsqlcodec

I assume, there might be other developers, trying to solve similar
problems, so I'd be interested to hear about similar attempts  issues
encountered while trying to implement such an integration between Solr and
other NoSQL databases.


Re: Best practice to setup schemas for documents having different structures

2014-11-05 Thread Erick Erickson
It Depends (tm).

You have a lot of options, and it all depends on your data and
use-case. In general, there is very little cost involved when a doc
does _not_ use a field you've defined in a schema. That is, if you
have 100's of fields defined and only use 10, the other 90 don't take
up space in each doc. There is some overhead with many many fields,
but probably not so you'd notice.

1 you could have a single schema that contains all your fields and
use it amongst a bunch of indexes (cores). This is particularly easy
in the new configset pattern.

2 You could have a single schema that contains all your fields and
use it in a single index. That index could contain all your different
docs with, say, a type field to let you search subsets easily.

3 You could have a different schema for each index and put all of the
docs in the same index.

1 I don't really like at all. If you're going to have different
indexes, I think it's far easier to maintain if there are individual
schemas.

Between, 2 and 3 it's a tossup. 2 will skew the relevance
calculations because all the terms are in a single index. So your
relevance calculations for students will be influenced by the terms in
courses docs and vice-versa. That said, you may not notice as it's
subtle.

I generally prefer 3 but I've seen 2 serve as well.

Best,
Erick

On Tue, Nov 4, 2014 at 9:34 PM, Vishal Sharma vish...@grazitti.com wrote:
 This is something I have been thinking for a long time now.

 What is the best practice for setting up the Schemas for documents having
 different fields?

 Should we just create one schema with lot of fields or multiple schemas for
 different data structures?

 Here is an example: I have two objects students and courses:

 Student:

- Student Name
- Student Registration number
- Course Enrolled for

 Course:

- Course ID
- Course Name
- Course duration

 What should the ideal schema setup should look like?

 Any guidance would is strongly appreciated.



 *Vishal Sharma** Team Lead,   Grazitti Interactive*T: +1
 650­ 641 1754
 E: vish...@grazitti.com
 www.grazitti.com [image: Description: LinkedIn]
 http://www.linkedin.com/company/grazitti-interactive[image: Description:
 Twitter] https://twitter.com/grazitti[image: fbook]
 https://www.facebook.com/grazitti.interactive


Re: on regards to Solr and NoSQL storages integration

2014-11-05 Thread Alexandre Rafalovitch
On 5 November 2014 08:52, andrey prokopenko andrey4...@gmail.com wrote:
 I assume, there might be other developers, trying to solve similar
 problems, so I'd be interested to hear about similar attempts  issues
 encountered while trying to implement such an integration between Solr and
 other NoSQL databases.

I think DataStax does Solr+Cassandra and Cloudera does Solr+Hadoop
with underlying content stored in the databases. Also Neo4J has
graph+search integration, but I think it's directly using Lucene
engine, not Solr.

Disclaimer: this is very high level understanding, hopefully the other
people can confirm.

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


Re: A bad idea to store core data directory over NAS?

2014-11-05 Thread Walter Underwood
My experience was with Solr 1.2 and regular old NFS, so that was probably worst 
case. I was very surprised that it was that bad, though.

So benchmark it before you assume it is fast enough. 

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/

On Nov 5, 2014, at 12:27 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote:

 On Tue, 2014-11-04 at 22:57 +0100, Gili Nachum wrote:
 My data center is out of SAN or local disk storage - is it a big no-no to
 store Solr core data folder over NAS?
 
 It depends on your NAS speed. Both Walter and David are right: It can
 perform really bad or quite satisfactory. We briefly experimented with
 using 400GB of Isilon ( http://www.emc.com/isilon ) SSD cache as backend
 for a searcher. As far as I remember, speed was surprisingly fine; about
 3 times slower than with similar local storage. As we needed 20TB+ of
 index, it would be too expensive for us to use the enterprise NAS system
 though (long story).
 
 The NAS mount would be accessed by a single machine. I do care about
 performance.
 
 I have a vision of a off-the-shelf 4-drive box Gorilla-taped to the side
 of a server rack :-)
 
 Or in other words: If the SAN is only to be used by a single machine,
 this will be more of a kludge than a solid solution. Is it not possible
 to upgrade local storage to hold the data? How large an index are we
 talking about?
 
 If I do go with NAS. Should I expect index corruption and other oddities?
 
 Not that I know of. As the NAS is dedicated, you won't compete for
 performance there. Do check if your network is fast enough though.
 
 
 - Toke Eskildsen, State and University Library, Denmark
 I highly recommend Gorilla Tape for semi-permanent kludges.
 



Re: any difference between using collection vs. shard in URL?

2014-11-05 Thread Shalin Shekhar Mangar
There's no difference between the two. Even if you send updates to a shard
url, it will still be forwarded to the right shard leader according to the
hash of the id (assuming you're using the default compositeId router). Of
course, if you happen to hit the right shard leader then it is just an
internal forward and not an extra network hop.

The advantage with using the collection name is that you can hit any
SolrCloud node (even the ones not hosting this collection) and it will
still work. So for a non Java client, a load balancer can be setup in front
of the entire cluster and things will just work.

On Wed, Nov 5, 2014 at 8:50 PM, Ian Rose ianr...@fullstory.com wrote:

 If I add some documents to a SolrCloud shard in a collection alpha, I can
 post them to /solr/alpha/update.  However I notice that you can also post
 them using the shard name, e.g. /solr/alpha_shard4_replica1/update - in
 fact this is what Solr seems to do internally (like if you send documents
 to the wrong node so Solr needs to forward them over to the leader of the
 correct shard).

 Assuming you *do* always post your documents to the correct shard, is there
 any difference between these two, performance or otherwise?

 Thanks!
 - Ian




-- 
Regards,
Shalin Shekhar Mangar.


EarlyTerminatingCollectorException

2014-11-05 Thread Dirk Högemann
Our production Solr-Slaves-Cores (we have about 40 Cores (each has a
moderate size about 10K documents to  90K documents)) produce many
exceptions of type:

014-11-05 15:06:06.247 [searcherExecutor-158-thread-1] ERROR
org.apache.solr.search.SolrCache: Error during auto-warming of
key:org.apache.solr.search.QueryResultKey@62340b01
:org.apache.solr.search.EarlyTerminatingCollectorException

Our relevant solrconfig is

  updateHandler class=solr.DirectUpdateHandler2
autoCommit
  maxTime18/maxTime!-- in ms --
/autoCommit
  /updateHandler

  query
maxWarmingSearchers2/maxWarmingSearchers
filterCache
  class=solr.FastLRUCache
  size=8192
  initialSize=8192
  autowarmCount=4096/

   !-- queryResultCache caches results of searches - ordered lists of
 document ids (DocList) based on a query, a sort, and the range
 of documents requested.  --
queryResultCache
  class=solr.FastLRUCache
  size=8192
  initialSize=8192
  autowarmCount=4096/

  !-- documentCache caches Lucene Document objects (the stored fields for
each document).
   Since Lucene internal document ids are transient, this cache will
not be autowarmed.  --
documentCache
  class=solr.FastLRUCache
  size=8192
  initialSize=8192
  autowarmCount=4096/
  /query

What exactly does the exception mean?
Thank you!

-- Dirk --


Re: any difference between using collection vs. shard in URL?

2014-11-05 Thread Ian Rose
Awesome, thanks.  That's what I was hoping.

Cheers,
Ian


On Wed, Nov 5, 2014 at 10:33 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 There's no difference between the two. Even if you send updates to a shard
 url, it will still be forwarded to the right shard leader according to the
 hash of the id (assuming you're using the default compositeId router). Of
 course, if you happen to hit the right shard leader then it is just an
 internal forward and not an extra network hop.

 The advantage with using the collection name is that you can hit any
 SolrCloud node (even the ones not hosting this collection) and it will
 still work. So for a non Java client, a load balancer can be setup in front
 of the entire cluster and things will just work.

 On Wed, Nov 5, 2014 at 8:50 PM, Ian Rose ianr...@fullstory.com wrote:

  If I add some documents to a SolrCloud shard in a collection alpha, I
 can
  post them to /solr/alpha/update.  However I notice that you can also
 post
  them using the shard name, e.g. /solr/alpha_shard4_replica1/update - in
  fact this is what Solr seems to do internally (like if you send documents
  to the wrong node so Solr needs to forward them over to the leader of the
  correct shard).
 
  Assuming you *do* always post your documents to the correct shard, is
 there
  any difference between these two, performance or otherwise?
 
  Thanks!
  - Ian
 



 --
 Regards,
 Shalin Shekhar Mangar.



Indexing nested document to support blockjoin queries in solr 4.10.1

2014-11-05 Thread henry cleland
Hello Guys,
Im a noob on this mailing list so bear with me.
Could i kindly get some help on this very elaborate problem?
http://stackoverflow.com/questions/26759366/solr-blockjoin-indexing-for-solr-4-10-1

Thanks


Re: Best practice to setup schemas for documents having different structures

2014-11-05 Thread Ryan Cooke
We define all fields as wildcard fields with a suffix indicating field
type. Then we can use something like Java annotations to map pojo variables
to field types to append the correct suffix. This allows us to use one very
generic schema among all of our collections and we rarely need to update
it. Our inspiration for this method comes from the ruby library Sunspot.

- Ryan



---
Ryan Cooke
VP of Engineering
Docurated
(646) 535-4595

On Wed, Nov 5, 2014 at 9:59 AM, Erick Erickson erickerick...@gmail.com
wrote:

 It Depends (tm).

 You have a lot of options, and it all depends on your data and
 use-case. In general, there is very little cost involved when a doc
 does _not_ use a field you've defined in a schema. That is, if you
 have 100's of fields defined and only use 10, the other 90 don't take
 up space in each doc. There is some overhead with many many fields,
 but probably not so you'd notice.

 1 you could have a single schema that contains all your fields and
 use it amongst a bunch of indexes (cores). This is particularly easy
 in the new configset pattern.

 2 You could have a single schema that contains all your fields and
 use it in a single index. That index could contain all your different
 docs with, say, a type field to let you search subsets easily.

 3 You could have a different schema for each index and put all of the
 docs in the same index.

 1 I don't really like at all. If you're going to have different
 indexes, I think it's far easier to maintain if there are individual
 schemas.

 Between, 2 and 3 it's a tossup. 2 will skew the relevance
 calculations because all the terms are in a single index. So your
 relevance calculations for students will be influenced by the terms in
 courses docs and vice-versa. That said, you may not notice as it's
 subtle.

 I generally prefer 3 but I've seen 2 serve as well.

 Best,
 Erick

 On Tue, Nov 4, 2014 at 9:34 PM, Vishal Sharma vish...@grazitti.com
 wrote:
  This is something I have been thinking for a long time now.
 
  What is the best practice for setting up the Schemas for documents having
  different fields?
 
  Should we just create one schema with lot of fields or multiple schemas
 for
  different data structures?
 
  Here is an example: I have two objects students and courses:
 
  Student:
 
 - Student Name
 - Student Registration number
 - Course Enrolled for
 
  Course:
 
 - Course ID
 - Course Name
 - Course duration
 
  What should the ideal schema setup should look like?
 
  Any guidance would is strongly appreciated.
 
 
 
  *Vishal Sharma** Team Lead,   Grazitti Interactive*T: +1
  650­ 641 1754
  E: vish...@grazitti.com
  www.grazitti.com [image: Description: LinkedIn]
  http://www.linkedin.com/company/grazitti-interactive[image:
 Description:
  Twitter] https://twitter.com/grazitti[image: fbook]
  https://www.facebook.com/grazitti.interactive



create new core based on named config set using the admin page

2014-11-05 Thread Andreas Hubold

Hi,

I'm trying to use named config sets with a standalone Solr server (4.10.1).

But it seems there's no way to create a new core based on a named config 
set using the Solr admin page. Or did I miss something?

Should I open a JIRA issue?

Regards,
Andreas


Re: A bad idea to store core data directory over NAS?

2014-11-05 Thread Charlie Hull
In our experience yes, it's a bad idea.

Charlie

On 5 November 2014 10:27, Walter Underwood wun...@wunderwood.org wrote:

 My experience was with Solr 1.2 and regular old NFS, so that was probably
 worst case. I was very surprised that it was that bad, though.

 So benchmark it before you assume it is fast enough.

 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/

 On Nov 5, 2014, at 12:27 AM, Toke Eskildsen t...@statsbiblioteket.dk
 wrote:

  On Tue, 2014-11-04 at 22:57 +0100, Gili Nachum wrote:
  My data center is out of SAN or local disk storage - is it a big no-no
 to
  store Solr core data folder over NAS?
 
  It depends on your NAS speed. Both Walter and David are right: It can
  perform really bad or quite satisfactory. We briefly experimented with
  using 400GB of Isilon ( http://www.emc.com/isilon ) SSD cache as backend
  for a searcher. As far as I remember, speed was surprisingly fine; about
  3 times slower than with similar local storage. As we needed 20TB+ of
  index, it would be too expensive for us to use the enterprise NAS system
  though (long story).
 
  The NAS mount would be accessed by a single machine. I do care about
  performance.
 
  I have a vision of a off-the-shelf 4-drive box Gorilla-taped to the side
  of a server rack :-)
 
  Or in other words: If the SAN is only to be used by a single machine,
  this will be more of a kludge than a solid solution. Is it not possible
  to upgrade local storage to hold the data? How large an index are we
  talking about?
 
  If I do go with NAS. Should I expect index corruption and other
 oddities?
 
  Not that I know of. As the NAS is dedicated, you won't compete for
  performance there. Do check if your network is fast enough though.
 
 
  - Toke Eskildsen, State and University Library, Denmark
  I highly recommend Gorilla Tape for semi-permanent kludges.
 




Re: create new core based on named config set using the admin page

2014-11-05 Thread Ramzi Alqrainy
Sorry, I did not get your point, can you please elaborate more 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/create-new-core-based-on-named-config-set-using-the-admin-page-tp4167850p4167860.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing nested document to support blockjoin queries in solr 4.10.1

2014-11-05 Thread Ramzi Alqrainy
You can model this in different ways, depending on your searching/faceting
needs. Usually you'll use multivalued or dynamic fields. In the next
examples I'll omit the field type, indexed and stored flags:

field name=name type=text indexed=true stored=true /
field name=c_name type=string indexed=true stored=true
multiValued=true/
field name=c_age type=int indexed=true stored=true
multiValued=true /
field name=c_sex type=string indexed=true stored=true
multiValued=true/


Another one:


 dynamicField name=c_name_* type=string indexed=true 
stored=true/
 dynamicField name=c_age_* type=string indexed=true 
stored=true/
 dynamicField name=c_sex_* type=string indexed=true 
stored=true/

Here you would store fields 'c_name_1', 'c_age_1', 'c_name_2', 'c_age_2',
etc. Again it's up to you to correlate values, but at least you have an
index. With some code you could make this transparent.

Solr wiki says: Solr provides one table. Storing a set database tables in
an index generally requires denormalizing some of the tables. Attempts to
avoid denormalizing usually fail. It's up to you to denormalize your data
according to your search needs.

UPDATE: Since version 4.10.1 or so Solr supports nested documents directly:
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-nested-document-to-support-blockjoin-queries-in-solr-4-10-1-tp4167831p4167861.html
Sent from the Solr - User mailing list archive at Nabble.com.


Schemaless configuration using 4.10.2/API returning 404

2014-11-05 Thread nbosecker
Hi all,

I'm working on updating legacy Solr to 4.10.2 to use schemaless
configuration. As such, I have added this snippet to solrconfig.xml per the
docs:

schemaFactory class=ManagedIndexSchemaFactory
 bool name=mutabletrue/bool
 str name=managedSchemaResourceNamemanaged-schema/str
/schemaFactory

I see that schema.xml is renamed to schema-xml.bak and managed-schema file
is present on Solr restart.

My Solr Dashboard is accessible via:
https://myserver:9943/solr/#/

However, I still cannot access the schema via API - keep receiving 404 [The
requested resource (/solr/schema/fields) is not available] error:
https://myserver:9943/solr/collection1/schema/fields


What am I missing to access the schema API?

Much thanks!





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schemaless-configuration-using-4-10-2-API-returning-404-tp4167869.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: EarlyTerminatingCollectorException

2014-11-05 Thread Mikhail Khludnev
I'm wondered too, but it seems it warmups queryResultCache
https://github.com/apache/lucene-solr/blob/20f9303f5e2378e2238a5381291414881ddb8172/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L522
at least this ERRORs broke nothing  see
https://github.com/apache/lucene-solr/blob/20f9303f5e2378e2238a5381291414881ddb8172/solr/core/src/java/org/apache/solr/search/FastLRUCache.java#L165

anyway, here are two usability issues:
 - of key:org.apache.solr.search.QueryResultKey@62340b01 lack of readable
toString()
 - I don't think regeneration exceptions are ERRORs, they seem WARNs for me
or even lower. also for courtesy, particularly
EarlyTerminatingCollectorExcepions can be recognized, and even ignored,
providing SolrIndexSearcher.java#L522

Would you mind to raise a ticket?

On Wed, Nov 5, 2014 at 6:51 PM, Dirk Högemann dhoeg...@gmail.com wrote:

 Our production Solr-Slaves-Cores (we have about 40 Cores (each has a
 moderate size about 10K documents to  90K documents)) produce many
 exceptions of type:

 014-11-05 15:06:06.247 [searcherExecutor-158-thread-1] ERROR
 org.apache.solr.search.SolrCache: Error during auto-warming of
 key:org.apache.solr.search.QueryResultKey@62340b01
 :org.apache.solr.search.EarlyTerminatingCollectorException

 Our relevant solrconfig is

   updateHandler class=solr.DirectUpdateHandler2
 autoCommit
   maxTime18/maxTime!-- in ms --
 /autoCommit
   /updateHandler

   query
 maxWarmingSearchers2/maxWarmingSearchers
 filterCache
   class=solr.FastLRUCache
   size=8192
   initialSize=8192
   autowarmCount=4096/

!-- queryResultCache caches results of searches - ordered lists of
  document ids (DocList) based on a query, a sort, and the range
  of documents requested.  --
 queryResultCache
   class=solr.FastLRUCache
   size=8192
   initialSize=8192
   autowarmCount=4096/

   !-- documentCache caches Lucene Document objects (the stored fields for
 each document).
Since Lucene internal document ids are transient, this cache will
 not be autowarmed.  --
 documentCache
   class=solr.FastLRUCache
   size=8192
   initialSize=8192
   autowarmCount=4096/
   /query

 What exactly does the exception mean?
 Thank you!

 -- Dirk --




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: add and then delete same document before commit,

2014-11-05 Thread Matteo Grolla
Perfectly clear,
thanks a lot!

Il giorno 05/nov/2014, alle ore 13:48, Jack Krupansky ha scritto:

 Document x doesn't exist - in terms of visibility - until the commit, so the 
 delete will no-op since a query of Lucene will not see the uncommitted new 
 document.
 
 -- Jack Krupansky
 
 -Original Message- From: Matteo Grolla
 Sent: Wednesday, November 5, 2014 4:47 AM
 To: solr-user@lucene.apache.org
 Subject: add and then delete same document before commit,
 
 Can anyone tell me the behavior of solr (and if it's consistent) when I do 
 what follows:
 1) add document x
 2) delete document x
 3) commit
 
 I've tried with solr 4.5.0 and document x get's indexed
 
 Matteo= 



Re: A bad idea to store core data directory over NAS?

2014-11-05 Thread Gili Nachum
So NFS it's doable, and performance will vary by the grade of storage I'm
getting and the volume of other activity on the NAS. Good to know it's not
attributed to index corruptions in Lucene (failures to sync to disk and
such).

Update: Turns out that someone did find 50TB over SAN laying around the
data center for me to use, so I won't find out for my self how's life with
NFS/NAS in the near future.

Cheers!

On Wed, Nov 5, 2014 at 8:51 PM, Charlie Hull char...@flax.co.uk wrote:

 In our experience yes, it's a bad idea.

 Charlie

 On 5 November 2014 10:27, Walter Underwood wun...@wunderwood.org wrote:

  My experience was with Solr 1.2 and regular old NFS, so that was probably
  worst case. I was very surprised that it was that bad, though.
 
  So benchmark it before you assume it is fast enough.
 
  wunder
  Walter Underwood
  wun...@wunderwood.org
  http://observer.wunderwood.org/
 
  On Nov 5, 2014, at 12:27 AM, Toke Eskildsen t...@statsbiblioteket.dk
  wrote:
 
   On Tue, 2014-11-04 at 22:57 +0100, Gili Nachum wrote:
   My data center is out of SAN or local disk storage - is it a big no-no
  to
   store Solr core data folder over NAS?
  
   It depends on your NAS speed. Both Walter and David are right: It can
   perform really bad or quite satisfactory. We briefly experimented with
   using 400GB of Isilon ( http://www.emc.com/isilon ) SSD cache as
 backend
   for a searcher. As far as I remember, speed was surprisingly fine;
 about
   3 times slower than with similar local storage. As we needed 20TB+ of
   index, it would be too expensive for us to use the enterprise NAS
 system
   though (long story).
  
   The NAS mount would be accessed by a single machine. I do care about
   performance.
  
   I have a vision of a off-the-shelf 4-drive box Gorilla-taped to the
 side
   of a server rack :-)
  
   Or in other words: If the SAN is only to be used by a single machine,
   this will be more of a kludge than a solid solution. Is it not possible
   to upgrade local storage to hold the data? How large an index are we
   talking about?
  
   If I do go with NAS. Should I expect index corruption and other
  oddities?
  
   Not that I know of. As the NAS is dedicated, you won't compete for
   performance there. Do check if your network is fast enough though.
  
  
   - Toke Eskildsen, State and University Library, Denmark
   I highly recommend Gorilla Tape for semi-permanent kludges.
  
 
 



Re: on regards to Solr and NoSQL storages integration

2014-11-05 Thread Jack Krupansky
Take a look at DataStax Enterprise, which is basically Cassandra with Solr 
tightly integrated as an embedded search engine. Write and update your data 
in Cassandra and it will automatically be indexed in Solr, all in one 
cluster, so no need to build and maintain a separate SolrCloud cluster just 
to search your NoSQL data in Cassandra. The data is stored in Cassandra but 
indexed in Solr. Solr runs in the same JVM as Cassandra for efficient 
indexing of new and updated data - none of the synchronization issues of an 
ETL or trigger-based approach with a separate search platform.


(Disclosure: I am a contractor for DataStax. I'm their Domain Expert for 
Search/Solr.)


-- Jack Krupansky

-Original Message- 
From: andrey prokopenko

Sent: Wednesday, November 5, 2014 8:52 AM
To: solr-user@lucene.apache.org
Subject: on regards to Solr and NoSQL storages integration

Greetings Comrades.
There were numerous requests and considerations on using Solr as both
search engine and NoSQL store at the same time.
While being an excellent tool as a search engine, Solr is looking not so
good when it comes to storing documents and various stored fields,
especially with big amount of data. Index quickly grows to unmanageable
sizes. Then, there is ever-coming PITA problem
with partial document update: due to the nature of Lucene/Solr index,
documents can't be updated, they always need to be deleted  inserted.
All in all, Solr desperately need a tight integration with some document
storage, offloading stored fields of the document and
transactionally coupled with search index itself, so that stored field are
at all times synced with the other parts
of the index (terms, doc values etc.).

Unfortunately, unlike Lucene, Solr does not offer full set of distribiuted
transaction API commands, thus seriously complicating this matter. Luckily,
with advent of Solr 4.0 now we have abilitu to create not only the custom
Directory, but also completely tweak the index structure any way we like.
Based on this new feature I've created my custom Directory + custom codec,
integrating Solr with Oracle NoSQL key-value store.
My codec is based on Solr 4.10.1 API and Oracle NoSQL 1.2.1.8 Community
Edition. Fields in NoSQL storage are persisted using primary key, derived
from the document fields. The codec relays stored fields to the NOSQL store
while keeping all other index components in usual file-based storage
layout. The codec has been made with SolrCloud and NoSQL own fault
tolerance usage in mind, hence it's tried to ignore wrote commands to NoSQL
storage if index is being created at replica node which is not a Solr shard
leader currently. First stable version of the codec transparently supports
full index life cycle, includung segment creation, merging and deletion.
Source code and readme, detaling usage instructions for the codec can be
found at github: https://github.com/andrey42/onsqlcodec

I assume, there might be other developers, trying to solve similar
problems, so I'd be interested to hear about similar attempts  issues
encountered while trying to implement such an integration between Solr and
other NoSQL databases. 



SolrCloud shard distribution with Collections API

2014-11-05 Thread CTO직속IsabellePhan
Hello,

I am testing a small SolrCloud cluster on 2 servers. I started 2 nodes on
each server, so that each collection can have 2 shards with replication
factor of 2.

I am using below command from Collections API to create collection:

curl '
http://serveraddress/solr/admin/collections?action=CREATEname=cp_collectionnumShards=2replicationFactor=2collection.configName=cp_config
'

Is there a way to ensure that for each shard, leader and replica are on a
different server?
This command sometimes put them on 2 nodes from the same server.


Thanks a lot for your help,

Isabelle


Problem getting ngroups with format simple in Solr Cloud

2014-11-05 Thread Judith Silverman

I am seeing the same problem.  I suspect that the patch for SOLR-5634 does not 
address the sharded case.

Cheers,
Judith


Is there a way to stop some hyphenated terms from being tokenized

2014-11-05 Thread Tang, Rebecca
Hi there,

For some hyphenated terms, I want them to stay as is instead of being 
tokenized.  For example: e-cigarette, e-cig, I-pad.  I don't want them to be 
split into e and cig or I and pad  because the single letter e and I produces 
too many false positive matches.

Is there a way to tell the standard tokenizer to skip tokenizing some terms?

Rebecca Tang
Applications Developer, UCSF CKM
Legacy Tobacco Document Librarylegacy.library.ucsf.edu/
E: rebecca.t...@ucsf.edu


WordDelimiterFilterFactory and PatternReplaceCharFilterFactory

2014-11-05 Thread Jae Joo
Hi,

Once I apply PatternReplaceCharFilterFactory to the input string, the
position of token is changed.
Here is an example.
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=(lt;/?ce:italic[^]*) replacement=/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
splitOnCaseChange=0
splitOnNumerics=0
catenateWords=1
catenateNumbers=0
catenateAll=0
preserveOriginal=1
/

In the analysis page,
ce:italicp/ce:italic-xylene and p-xylene (without xml tags) have
different positions.

for ce:italicp/ce:italic-xylene,
p-xylene -- 1
xylene -- 2
p -- 2
pxylene --

However, for the term (without tags) p-xylene,
p-xylene -- 1
p -- 1
xylene -- 2
pxylene -- 3

Only difference I can see is the start and end position because of xml tag.

Does any one know why?

Thanks,

Jae Joo


Re: Is there a way to stop some hyphenated terms from being tokenized

2014-11-05 Thread Michael Della Bitta

Pretty sure what you need is called KeywordMarkerFilterFactory.

|filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt /|


On 11/5/14 17:24, Tang, Rebecca wrote:

Hi there,

For some hyphenated terms, I want them to stay as is instead of being 
tokenized.  For example: e-cigarette, e-cig, I-pad.  I don't want them to be 
split into e and cig or I and pad  because the single letter e and I produces 
too many false positive matches.

Is there a way to tell the standard tokenizer to skip tokenizing some terms?

Rebecca Tang
Applications Developer, UCSF CKM
Legacy Tobacco Document Librarylegacy.library.ucsf.edu/
E: rebecca.t...@ucsf.edu





Re: SolrCloud shard distribution with Collections API

2014-11-05 Thread Erick Erickson
They should be pretty well distributed by default, but if you want to
take manual control, you can use the createNodeSet param on CREATE
(with replication factor of 1) and then ADDREPLICA with the node param
to put replicas for shards exactly where you want.

Best,
Erick

On Wed, Nov 5, 2014 at 2:12 PM, CTO직속IsabellePhan ip...@coupang.com wrote:
 Hello,

 I am testing a small SolrCloud cluster on 2 servers. I started 2 nodes on
 each server, so that each collection can have 2 shards with replication
 factor of 2.

 I am using below command from Collections API to create collection:

 curl '
 http://serveraddress/solr/admin/collections?action=CREATEname=cp_collectionnumShards=2replicationFactor=2collection.configName=cp_config
 '

 Is there a way to ensure that for each shard, leader and replica are on a
 different server?
 This command sometimes put them on 2 nodes from the same server.


 Thanks a lot for your help,

 Isabelle


solr security patch

2014-11-05 Thread kuttan palliyalil
Hi, 
I am trying to apply the security patch(Solr-4470.patch) on solr 4.10.1 tag. 
SOLR-4470.patch 14/Mar/14 16:15278 kB


Getting error with the hunk failure. Could any one confirm if this the right 
patch for 4.10.1.
Thank you so much
RegardsRaj



Re: Indexing nested document to support blockjoin queries in solr 4.10.1

2014-11-05 Thread henry cleland
Hi Ramzi,
Thanks for the response.
I should have pointed out that this is an overly simplified view of my
scenario at hand. Denormalisation is not an option for me as advised
because of the sheer volume, nature and spread/skewness of the
relations/schema of my actual data scenario. Also multivalued will return a
lot of false positives to queries.

For instance i wont be able to find parents who have daughters of age 10yrs
accurately.



On Wed, Nov 5, 2014 at 7:33 PM, Ramzi Alqrainy ramzi.alqra...@gmail.com
wrote:

 You can model this in different ways, depending on your searching/faceting
 needs. Usually you'll use multivalued or dynamic fields. In the next
 examples I'll omit the field type, indexed and stored flags:

 field name=name type=text indexed=true stored=true /
 field name=c_name type=string indexed=true stored=true
 multiValued=true/
 field name=c_age type=int indexed=true stored=true
 multiValued=true /
 field name=c_sex type=string indexed=true stored=true
 multiValued=true/


 Another one:


  dynamicField name=c_name_* type=string indexed=true
 stored=true/
  dynamicField name=c_age_* type=string indexed=true
 stored=true/
  dynamicField name=c_sex_* type=string indexed=true
 stored=true/

 Here you would store fields 'c_name_1', 'c_age_1', 'c_name_2', 'c_age_2',
 etc. Again it's up to you to correlate values, but at least you have an
 index. With some code you could make this transparent.

 Solr wiki says: Solr provides one table. Storing a set database tables in
 an index generally requires denormalizing some of the tables. Attempts to
 avoid denormalizing usually fail. It's up to you to denormalize your data
 according to your search needs.

 UPDATE: Since version 4.10.1 or so Solr supports nested documents directly:

 https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Indexing-nested-document-to-support-blockjoin-queries-in-solr-4-10-1-tp4167831p4167861.html
 Sent from the Solr - User mailing list archive at Nabble.com.



转发: the solr shard-replica policy

2014-11-05 Thread sunjt















hi all,i have two machine and each of them has two solr instance.the problem is 
if i set the numShard =2 and replicationFactor=2,how could i ensure the shard 
leader and replica exist on different machine.could solr help me do it or i 
must do it myself ?






Solr Cloud Cross-Core Joins

2014-11-05 Thread Steve Davids
I have a use-case where I would like to capture click events for individual
users so I can answer questions like show me everything with x text and
that I have clicked before + the inverse of show me everything with x text
that I have *not* clicked. I am currently doing this by sticking the event
into the main index which resides with the rest of the document.

We have recently made some modifications to make a smaller sub collection
of the main document index but still would like to ask the same questions,
so I thought a cross-core join with a click metadata collection could be
a decent trick so that we can consolidate this quickly changing data in a
separate collection without needing to worry about merging this information
into multiple document collections.

I am trying to write some unit tests to simply stand up multiple
collections (doc + click) in SolrCloud via the collections CREATE API
though this assigns unique core names so a query of {!join ...
fromIndex=clicks} user=foo doesn't work since no core name is actually
called clicks but rather clicks_shard1_replica1,
clicks_shard2_replica1, etc. The CREATE request doesn't allow you to
specify core names, so I attempted to rename the cores via the cores rename
API and it failed
on various rename operations saying Leader Election - Fatal Error, SolrCore
not found: clicks_shard1_replica1  in [clicks]. So since that didn't seem
to work I tried the collections DELETEREPLICA followed by a subsequent
ADDREPLICA, unfortunately when I attempt to set properties for the core (
property.name=value) on the request it doesn't appear to actually get set
since the core refuses to load due to a solrconfig.xml property
substitution failure.

Long story short, I have been banging my head against the wall to get a
consistent core name via API calls and keep running into gotchyas. I would
like to take a step back and ask if this approach (cross-core metadata
join) is even reasonable in the SolrCloud architecture. If it is still
reasonable does anyone have ideas on how a common core name can be achieved
via API calls? If it isn't an advised approach are there suggestions on an
optimal indexing strategy for this particular scenario?

Thanks for the help,

-Steve


Re: Solr Cloud Cross-Core Joins

2014-11-05 Thread Walter Underwood
I am curious why you are trying to do this with Solr. This is straightforward 
with other systems. I would use HBase for this. This could be really hard with 
Solr.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/

On Nov 5, 2014, at 5:08 PM, Steve Davids sdav...@gmail.com wrote:

 I have a use-case where I would like to capture click events for individual
 users so I can answer questions like show me everything with x text and
 that I have clicked before + the inverse of show me everything with x text
 that I have *not* clicked. I am currently doing this by sticking the event
 into the main index which resides with the rest of the document.
 
 We have recently made some modifications to make a smaller sub collection
 of the main document index but still would like to ask the same questions,
 so I thought a cross-core join with a click metadata collection could be
 a decent trick so that we can consolidate this quickly changing data in a
 separate collection without needing to worry about merging this information
 into multiple document collections.
 
 I am trying to write some unit tests to simply stand up multiple
 collections (doc + click) in SolrCloud via the collections CREATE API
 though this assigns unique core names so a query of {!join ...
 fromIndex=clicks} user=foo doesn't work since no core name is actually
 called clicks but rather clicks_shard1_replica1,
 clicks_shard2_replica1, etc. The CREATE request doesn't allow you to
 specify core names, so I attempted to rename the cores via the cores rename
 API and it failed
 on various rename operations saying Leader Election - Fatal Error, SolrCore
 not found: clicks_shard1_replica1  in [clicks]. So since that didn't seem
 to work I tried the collections DELETEREPLICA followed by a subsequent
 ADDREPLICA, unfortunately when I attempt to set properties for the core (
 property.name=value) on the request it doesn't appear to actually get set
 since the core refuses to load due to a solrconfig.xml property
 substitution failure.
 
 Long story short, I have been banging my head against the wall to get a
 consistent core name via API calls and keep running into gotchyas. I would
 like to take a step back and ask if this approach (cross-core metadata
 join) is even reasonable in the SolrCloud architecture. If it is still
 reasonable does anyone have ideas on how a common core name can be achieved
 via API calls? If it isn't an advised approach are there suggestions on an
 optimal indexing strategy for this particular scenario?
 
 Thanks for the help,
 
 -Steve



Re: Is there a way to stop some hyphenated terms from being tokenized

2014-11-05 Thread Michael Sokolov
You didn't describe your analysis chain, but maybe you are using 
WordDelimiterFilter to break up hyphenated words? If so, it has a 
protwords.txt feature that lets you specify exceptions


-Mike

On 11/5/2014 5:36 PM, Michael Della Bitta wrote:

Pretty sure what you need is called KeywordMarkerFilterFactory.

|filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt /|


On 11/5/14 17:24, Tang, Rebecca wrote:

Hi there,

For some hyphenated terms, I want them to stay as is instead of being 
tokenized.  For example: e-cigarette, e-cig, I-pad.  I don't want 
them to be split into e and cig or I and pad  because the single 
letter e and I produces too many false positive matches.


Is there a way to tell the standard tokenizer to skip tokenizing some 
terms?


Rebecca Tang
Applications Developer, UCSF CKM
Legacy Tobacco Document Librarylegacy.library.ucsf.edu/
E: rebecca.t...@ucsf.edu








Re: solr security patch

2014-11-05 Thread Shawn Heisey
On 11/5/2014 5:04 PM, kuttan palliyalil wrote:
 I am trying to apply the security patch(Solr-4470.patch) on solr 4.10.1 tag. 
 SOLR-4470.patch 14/Mar/14 16:15278 kB
 
 Getting error with the hunk failure. Could any one confirm if this the right 
 patch for 4.10.1.

The latest patch is almost 8 months old.  The pace of change in the
Lucene/Solr codebase is extremely fast, so this is VERY outdated.

That patch will successfully apply to trunk revision 1577540, but you
won't be able to use svn up to bring the code up to date.

The patch will not apply successfully to any up-to-date branch or tag.
It's very likely that you will need to examine any patch hunks that
don't apply, and make the changes manually.  There is no automated way
to handle this.

Thanks,
Shawn



Re: solr security patch

2014-11-05 Thread kuttan palliyalil
Got it. Thank you Shawn.
RegardsRaj 

 On Wednesday, November 5, 2014 10:39 PM, Shawn Heisey 
apa...@elyograg.org wrote:
   

 On 11/5/2014 5:04 PM, kuttan palliyalil wrote:
 I am trying to apply the security patch(Solr-4470.patch) on solr 4.10.1 tag. 
 SOLR-4470.patch 14/Mar/14 16:15278 kB
 
 Getting error with the hunk failure. Could any one confirm if this the right 
 patch for 4.10.1.

The latest patch is almost 8 months old.  The pace of change in the
Lucene/Solr codebase is extremely fast, so this is VERY outdated.

That patch will successfully apply to trunk revision 1577540, but you
won't be able to use svn up to bring the code up to date.

The patch will not apply successfully to any up-to-date branch or tag.
It's very likely that you will need to examine any patch hunks that
don't apply, and make the changes manually.  There is no automated way
to handle this.

Thanks,
Shawn



   

Re: SolrCloud shard distribution with Collections API

2014-11-05 Thread CTO직속IsabellePhan
Thanks for the advice Erick.

Would you know what the underlying logic doing the shard distribution is?
Does it depend on the order in which each node joined the cluster or does
the collections api logic actually checks the node host IP to ensure even
distribution?

Best Regards,

Isabelle


On Wed, Nov 5, 2014 at 3:10 PM, Erick Erickson erickerick...@gmail.com
wrote:

 They should be pretty well distributed by default, but if you want to
 take manual control, you can use the createNodeSet param on CREATE
 (with replication factor of 1) and then ADDREPLICA with the node param
 to put replicas for shards exactly where you want.

 Best,
 Erick



Re: A bad idea to store core data directory over NAS?

2014-11-05 Thread Toke Eskildsen
On Wed, 2014-11-05 at 23:04 +0100, Gili Nachum wrote:
 Update: Turns out that someone did find 50TB over SAN laying around the
 data center for me to use, so I won't find out for my self how's life with
 NFS/NAS in the near future.

There seems to be issues especially with NFS that you need to consider.
The thread Effectiveness MMapDirectory on  NFS Mounted indexes on the
lucene-users mailing list is about that.

- Toke Eskildsen, State and University Library, Denmark




Re: SolrCloud shard distribution with Collections API

2014-11-05 Thread Lee Chunki
Hi Isabelle,

If I understood correctly your question, you can check shard distribution 
status at admin page 
http://localhost:8983/solr/#/~cloud 

if you started solr by using command like 
$ java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar
( 
https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud 
)

then the nodes are assigned with the order
1-shard leader - 2-shard leader - 1-shard replica - 2-shard replica

of course, form the second time, the node’s status does not change.

Best,
Chunki


 On Nov 6, 2014, at 2:46 PM, CTO직속IsabellePhan ip...@coupang.com wrote:
 
 Thanks for the advice Erick.
 
 Would you know what the underlying logic doing the shard distribution is?
 Does it depend on the order in which each node joined the cluster or does
 the collections api logic actually checks the node host IP to ensure even
 distribution?
 
 Best Regards,
 
 Isabelle
 
 
 On Wed, Nov 5, 2014 at 3:10 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
 They should be pretty well distributed by default, but if you want to
 take manual control, you can use the createNodeSet param on CREATE
 (with replication factor of 1) and then ADDREPLICA with the node param
 to put replicas for shards exactly where you want.
 
 Best,
 Erick
 



What's the most efficient way to sort by number of terms matched?

2014-11-05 Thread Trey Grainger
Just curious if there are some suggestions here. The use case is fairly
simple:

Given a query like  python OR solr OR hadoop, I want to sort results by
number of keywords matched first, and by relevancy separately.

I can think of ways to do this, but not efficiently. For example, I could
do:
q=python OR solr OR hadoop
  p1=python
  p2=solr
  p3=hadoop
  sort=sum(if(query($p1,0),1,0),if(query($p2,0),1,0),if(query($p3,0),1,0))
desc, score desc

Other than the obvious downside that this requires me to pre-parse the
user's query, it's also somewhat inefficient to run the query function once
for each term in the original query since it is re-executing multiple
queries and looping through every document in the index during scoring.

Ideally, I would be able to do something like the below that could just
pull the count of unique matched terms from the main query (q parameter)
execution::
q=python OR solr OR hadoopsort=uniquematchedterms() desc,score desc.

I don't think anything like this exists, but would love some suggestions if
anyone else has solved this before.

Thanks,

-Trey


Re: Faceting return value of a function query?

2014-11-05 Thread Tom
Turns out that update processors perfectly suit me needs. I ended up using
the StatelessScriptUpdateProcessor with a simple js script :-)

On Mon Nov 03 2014 at 下午10:40:52 Yubing (Tom) Dong 董玉冰 
tom.tung@gmail.com wrote:

 I see. Thank you! :-)

 Sent from my Android phone
 On Nov 3, 2014 9:35 PM, Erick Erickson erickerick...@gmail.com wrote:

 Yep. It's almost always easier and faster if you can pre-compute as
 much as possible during indexing time. It'll take longer to   index of
 course, but the ratio of writing to the index to searching is usually
 hugely in favor of doing the work during indexing.

 Best,
 Erick

 On Mon, Nov 3, 2014 at 8:52 PM, Yubing (Tom) Dong 董玉冰
 tom.tung@gmail.com wrote:
  Hi Erik,
 
  Thanks for the reply! Do you mean parse and modify the documents before
  sending them to Solr?
 
  Cheers,
  Yubing
 
  On Mon, Nov 3, 2014 at 8:48 PM, Erick Erickson erickerick...@gmail.com
 
  wrote:
 
  Wouldn't it be easiest to compute the span at index time? Then it's
  very straight-forward.
 
  Best,
  Erick
 
  On Mon, Nov 3, 2014 at 8:18 PM, Yubing (Tom) Dong 董玉冰
  tom.tung@gmail.com wrote:
   Hi,
  
   I'm new to Solr, and I'm having a problem with faceting. I would
 really
   appreciate it if you could help :)
  
   I have a set of documents in JSON format, which I could post to my
 Solr
   core using the post.jar tool. Each document contains two fields,
 namely
   startDate and endDate, both of which are of type date.
  
   Conceptually, I would like to have a third field timeSpan that is
   automatically generated from the return value of function query
   ms(endDate, startDate), and do range facet on it, i.e. compute the
   distribution of timeSpan, among either all of or a filtered subset
 of
  the
   documents.
  
   I have tried to find ways of both directly faceting the function
 return
   values and automatically generate the timeSpan field during
 indexing,
  but
   without luck yet.
  
   Suggestions are greatly appreciated!
  
   Best,
   Yubing
 




Re: create new core based on named config set using the admin page

2014-11-05 Thread Andreas Hubold

Hi,

Solr 4.8 introduced named config sets with 
https://issues.apache.org/jira/browse/SOLR-4478. You can create a new 
core based on a config set with the CoreAdmin API as described in 
https://cwiki.apache.org/confluence/display/solr/Config+Sets


The Solr Admin page allows the creation of new cores as well. There's a 
Add Core button in the Core Admin tab. This will open a dialog where 
you can enter name, instanceDir, dataDir and the names of solrconfig.xml 
/ schema.xml. It would be cool and consistent if one could create a core 
based on a named config set here as well.


I'm asking because I might have overlooked something or maybe somebody 
is already working on this. But probably I should just create a JIRA 
issue, right?


Regards,
Andreas

Ramzi Alqrainy wrote on 11/05/2014 08:24 PM:

Sorry, I did not get your point, can you please elaborate more



--
View this message in context: 
http://lucene.472066.n3.nabble.com/create-new-core-based-on-named-config-set-using-the-admin-page-tp4167850p4167860.html
Sent from the Solr - User mailing list archive at Nabble.com.




--
Andreas Hubold
Software Architect

tel +49.40.325587.519
fax +49.40.325587.999
andreas.hub...@coremedia.com

CoreMedia AG
content | context | conversion

Ludwig-Erhard-Str. 18
20459 Hamburg, Germany
www.coremedia.com

Executive Board: Gerrit Kolb (CEO), Dr. Klemens Kleiminger (CFO)
Supervisory Board: Prof. Dr. Florian Matthes (Chairman)
Trade Register: Amtsgericht Hamburg, HR B 76277