DIH deleting documents

2013-02-21 Thread cveres
I am adding documents with data import handler from a mysql database. I
create a unique id for each document by concatenating a couple of fields in
the database. Every id is unique.

After the import, over half the documents which were imported are deleted
again, leaving me with less then half the documents in the database ending
up in the Solr index.

Is there a way to get a list of the deleted documents, so that I can start
troubleshooting what went wrong? 

thanks,

Csaba



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-deleting-documents-tp4041809.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Slaves always replicate entire index Index versions

2013-02-21 Thread Amit Nithian
A few others have posted about this too apparently and SOLR-4413 is the
root problem. Basically what I am seeing is that if your index directory is
not index/ but rather index.timestamp set in the index.properties a new
index will be downloaded all the time because the download is expecting
your index to be in solr_data_dir/index. Sounds like a quick solution
might be to rename your index directory to just index and see if the
problem goes away.

To confirm, look at line 728 in the SnapPuller.java file (in
downloadIndexFiles)

I am hoping that the patch and a more unified getIndexDir can be added to
the next release of Solr as this is a fairly significant bug to me.

Cheers
Amit

On Thu, Feb 21, 2013 at 12:56 AM, Amit Nithian anith...@gmail.com wrote:

 So the diff in generation numbers are due to the commits I believe that
 Solr does when it has the new index files but the fact that it's
 downloading a new index each time is baffling and I just noticed that too
 (hit the replicate button and noticed a full index download). I'm going to
 pop in to the source and see what's going on to see why unless there's a
 known bug filed about this?


 On Tue, Feb 19, 2013 at 1:48 AM, Raúl Grande Durán 
 raulgrand...@hotmail.com wrote:


 Hello.
 We have recently updated our Solr from 3.5 to 4.1 and everything is
 running perfect except the replication between nodes. We have a
 master-repeater-2slaves architecture and we have seen some things that
 weren't happening before:
 When a Slave (repeater or slaves) starts to replicate it needs to
 download the entire index. Even when some little changes has been made to
 the index at master. This takes such a long time since our index is more
 than 20 Gb.After replication cycle we have different index generations in
 master, repeater and slaves. For example:Master: gen. 64590Repeater: gen.
 64591Both slaves: gen. 64592
 My replicationHandler configuration is like this:requestHandler
 name=/replication class=solr.ReplicationHandler  lst
 name=master   str name=enable${enable.master:false}/str
 str name=replicateAftercommit/str   str
 name=replicateAfterstartup/str   str
 name=confFilesschema.xml,stopwords.txt/str /lst lst
 name=slave   str name=enable${enable.slave:false}/str
 str name=masterUrl${solr.master.url:http://localhost/solr}/str
 str name=pollInterval00:03:00/str /lst /requestHandler
 Our problems are very similar to those explained here:
 http://lucene.472066.n3.nabble.com/Problem-with-replication-td2294313.html
 Any ideas?? Thanks





Re: Slaves always replicate entire index Index versions

2013-02-21 Thread raulgrande83
Hi Amit,

I have came across some JIRAs that may be useful in this issue:
https://issues.apache.org/jira/browse/SOLR-4471
https://issues.apache.org/jira/browse/SOLR-4354
https://issues.apache.org/jira/browse/SOLR-4303
https://issues.apache.org/jira/browse/SOLR-4413
https://issues.apache.org/jira/browse/SOLR-2326

Please, let us know if you find any solution.

Regards.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Slaves-always-replicate-entire-index-Index-versions-tp4041256p4041817.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Slaves always replicate entire index Index versions

2013-02-21 Thread Amit Nithian
Thanks for the links... I have updated SOLR-4471 with a proposed solution
that I hope can be incorporated or amended so we can get a clean fix into
the next version so our operations and network staff will be happier with
not having gigs of data flying around the network :-)


On Thu, Feb 21, 2013 at 1:24 AM, raulgrande83 raulgrand...@hotmail.comwrote:

 Hi Amit,

 I have came across some JIRAs that may be useful in this issue:
 https://issues.apache.org/jira/browse/SOLR-4471
 https://issues.apache.org/jira/browse/SOLR-4354
 https://issues.apache.org/jira/browse/SOLR-4303
 https://issues.apache.org/jira/browse/SOLR-4413
 https://issues.apache.org/jira/browse/SOLR-2326

 Please, let us know if you find any solution.

 Regards.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Slaves-always-replicate-entire-index-Index-versions-tp4041256p4041817.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr UIMA

2013-02-21 Thread Tommaso Teofili
Hi Bart,

I think the only way you can do that is by reindexing, or maybe by just
doing a dummy atomic update [1] to each of the documents (e.g. adding or
changing a field of type 'ignored' or something like that) that weren't
tagged by UIMA before.

Regards,
Tommaso

[1] : http://wiki.apache.org/solr/Atomic_Updates


2013/2/21 jazzsalsa jazzsa...@me.com

 Reposted because I did not arrive at the list (I didn't see it)


 On Feb 20, 2013, at 12:42 PM, jazz jazzsa...@me.com wrote:

 Hi,

 I managed to get Solr and UIMA work together. When I send a document to
 Solr it annotates the field contents and adds the result of the UIMA
 annotations to e.g. a field location. My question is: how do I annotate
 the contents of an already existing solr database without triggering an
 /update ? My UIMA processor defaults for an /update command.
 I was thinking about exporting the contents and re-importing it but that
 seems too complex using the DIH. Is there a smarter way?

 Regards Bart




Re: Slaves always replicate entire index Index versions

2013-02-21 Thread raulgrande83
Thanks for the patch, we'll try to install these fixes and post if
replication works or not.

I renamed 'index.timestamp' folders to just 'index' but it didn't work.
These lines appeared in the log:
INFO: Master's generation: 64594 
21-feb-2013 10:42:00 org.apache.solr.handler.SnapPuller fetchLatestIndex 
INFO: Slave's generation: 64593 
21-feb-2013 10:42:00 org.apache.solr.handler.SnapPuller fetchLatestIndex 
INFO: Starting replication process 
21-feb-2013 10:42:00 org.apache.solr.handler.SnapPuller fetchFileList 
SEVERE: No files to download for index generation: 64594 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Slaves-always-replicate-entire-index-Index-versions-tp4041256p4041827.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: If we Open Source our platform, would it be interesting to you?

2013-02-21 Thread David Quarterman
Hi Marcelo,

Looked through your site and the framework looks very powerful as an 
aggregator. We do a lot of data aggregation from many different sources in many 
different formats (XML, JSON, text, CSV, etc) using RDBMS as the main 
repository for eventual SOLR indexing. A 'one-stop-shop' for all this would be 
very appealing.

Have you looked at products like Talend  Jitterbit? These offer transformation 
from almost anything to almost anything using graphical interfaces (Jitterbit 
is better) and a PHP-like coding format for trickier work. If you (or somebody) 
could add a graphical interface, the world would beat a path to your door!

Regards,

DQ

-Original Message-
From: Marcelo Elias Del Valle [mailto:marc...@s1mbi0se.com.br] 
Sent: 20 February 2013 18:18
To: solr-user@lucene.apache.org
Subject: If we Open Source our platform, would it be interesting to you?

Hello All,

I’m sending this email because I think it may be interesting for Solr users, as 
this project have a strong usage of Solr platform.

We are strongly considering opening the source of our DMP (Data Management 
Platform), if it proves to be technically interesting to other developers / 
companies.

More details: http://www.s1mbi0se.com/s1mbi0se_DMP.html

All comments, questions and critics happening at HN:
http://news.ycombinator.com/item?id=5251780

Please, feel free to send questions, comments and critics... We will try to 
reply them all.

Regards,
Marcelo


How to retrive all terms with their frequency in that website.

2013-02-21 Thread search engn dev
I have indexed data of 10 websites in solr. Now i want to dump data of each
website with following format : [Terms,Frequency of terms in that website
,IDF]

Can i do this with solr admin, or i need to write any script for that? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-retrive-all-terms-with-their-frequency-in-that-website-tp4041848.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to retrive all terms with their frequency in that website.

2013-02-21 Thread Miguel

Hi

 Look up the luke page in admin Solr .. /admin/luke?show=index
That page show topTerms of terms, so I suppose is possible get frecuency 
all terms.




El 21/02/2013 12:58, search engn dev escribió:

I have indexed data of 10 websites in solr. Now i want to dump data of each
website with following format : [Terms,Frequency of terms in that website
,IDF]

Can i do this with solr admin, or i need to write any script for that?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-retrive-all-terms-with-their-frequency-in-that-website-tp4041848.html
Sent from the Solr - User mailing list archive at Nabble.com.






Re: How to retrive all terms with their frequency in that website.

2013-02-21 Thread Alexander Golubowitsch
I guess the Term Vector Component might satisfy all or most of what 
you're trying to do: http://wiki.apache.org/solr/TermVectorComponent



On 21.02.2013 12:58, search engn dev wrote:

I have indexed data of 10 websites in solr. Now i want to dump data of each
website with following format : [Terms,Frequency of terms in that website
,IDF]

Can i do this with solr admin, or i need to write any script for that?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-retrive-all-terms-with-their-frequency-in-that-website-tp4041848.html
Sent from the Solr - User mailing list archive at Nabble.com.




SolrCloud vs. distributed suggester

2013-02-21 Thread AlexeyK
In pre-cloud version of SOLR it was necessary to pass shards and shards.qt
parameters in order to make /suggest handler work standalone.
How should it work in SolrCloud?
SpellCheckComponent skips the distributed stage of processing and thus I get
suggestions only when I force distrib=false mode.
Setting parameters like in previous releases doesn't work either.
The only way that worked so far is forcing a 'query' component on the
/suggest handler. Is there any other (better) way?

Thanks,
Alexey



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-vs-distributed-suggester-tp4041859.html
Sent from the Solr - User mailing list archive at Nabble.com.


multiple facet.prefix for the same facet.field VS multiple facet.query

2013-02-21 Thread Bill Au
There have been requests for supporting multiple facet.prefix for the same
facet.field.  There is an open JIRA with a patch:

https://issues.apache.org/jira/browse/SOLR-1351

Wouldn't using multiple facet.query achieve the same result?  I mean
something like:

facet.query=lastName:A*facet.query=lastName:B*facet.query=lastName:C*


Bill


Re: multiple facet.prefix for the same facet.field VS multiple facet.query

2013-02-21 Thread Bill Au
Never mind.  I just realized the difference between the two.  Sorry for the
noise.

Bill


On Thu, Feb 21, 2013 at 8:42 AM, Bill Au bill.w...@gmail.com wrote:

 There have been requests for supporting multiple facet.prefix for the same
 facet.field.  There is an open JIRA with a patch:

 https://issues.apache.org/jira/browse/SOLR-1351

 Wouldn't using multiple facet.query achieve the same result?  I mean
 something like:

 facet.query=lastName:A*facet.query=lastName:B*facet.query=lastName:C*


 Bill




Re: DIH deleting documents

2013-02-21 Thread cveres
Thanks Gora,

Sorry I might not have been sufficiently clear.

I start with an empty index, then add documents.
9000 are added and 6000 immediately deleted again, leaving 3000.
I assume this can only happen with duplicate IDs, but that should not be
possible! So I wanted to get a list of deleted documents so that I could try
and figure out why they were deleted immediately.

thanks,

Csaba



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-deleting-documents-tp4041811p4041887.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud vs. distributed suggester

2013-02-21 Thread Mark Miller
It's not really any different in SolrCloud as the pre-cloud - distrib search is 
still the same code done the same way by and large.

shards.qt should be just as valid an option as forcing a query component.

- Mark

On Feb 21, 2013, at 7:56 AM, AlexeyK lex.kudi...@gmail.com wrote:

 In pre-cloud version of SOLR it was necessary to pass shards and shards.qt
 parameters in order to make /suggest handler work standalone.
 How should it work in SolrCloud?
 SpellCheckComponent skips the distributed stage of processing and thus I get
 suggestions only when I force distrib=false mode.
 Setting parameters like in previous releases doesn't work either.
 The only way that worked so far is forcing a 'query' component on the
 /suggest handler. Is there any other (better) way?
 
 Thanks,
 Alexey
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-vs-distributed-suggester-tp4041859.html
 Sent from the Solr - User mailing list archive at Nabble.com.



How to change the index dir in Solr 4.1

2013-02-21 Thread chamara
I am having 5 shards in one machine using the new one collection multiple
cores method. I am trying to change the index directory, but if i hard code
that in the SolrConfig.xml , the index dir does not change for other cores
and each core tries to fight over it and ends up as  a deadlock. Is there
anyway to suffix the index directory with the shard replica name so that for
each shard i will have a different index directory?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-change-the-index-dir-in-Solr-4-1-tp4041891.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR4 SAN vs Local Disk?

2013-02-21 Thread chamara
Thanks Shawn for the Input, I could actually get RAID10's. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR4-SAN-vs-Local-Disk-tp4041299p4041895.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr splitting my words

2013-02-21 Thread scallawa
Let me start out by saying that I am just learning Solr now.  Solr is
splitting a word and I am not sure why.  The word is mcmurdo.  If I do a
search for McMurdo it picks it up.  If I do a search for just murdo it will
also pick it up.  If I search for mcmurdo, I get nothing.

womens-mcmurdo-ii-boots  that is the data in the name field that is
getting copied to the name_search field without the quotes.  This is what we
are feeding into solr

The data is coming from a filed called name_search which is copied from a
field called name.  Below is the description for name_search in the
schema_browser.

Field Type: TEXT

Properties: Indexed, Tokenized, Omit Norms

Schema: Indexed, Tokenized, Omit Norms

Index: (unstored field)

Copied From: NAME

Position Increment Gap: 100

Index Analyzer: org.apache.solr.analysis.TokenizerChain DETAILS

Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory

Filters:

org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt
ignoreCase: true enablePositionIncrements: true }
org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange:
1 generateNumberParts: 1 catenateWords: 1 generateWordParts: 1 catenateAll:
0 catenateNumbers: 1 }
org.apache.solr.analysis.SynonymFilterFactory args:{synonyms:
index_synonyms.txt expand: false ignoreCase: true }
org.apache.solr.analysis.LowerCaseFilterFactory args:{}
org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected:
protwords.txt }
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
Query Analyzer: org.apache.solr.analysis.TokenizerChain DETAILS

Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory

Filters:

org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: synonyms.txt
expand: true ignoreCase: true }
org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt
ignoreCase: true }
org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange:
1 generateNumberParts: 1 catenateWords: 0 generateWordParts: 1 catenateAll:
0 catenateNumbers: 0 }
org.apache.solr.analysis.LowerCaseFilterFactory args:{}
org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected:
protwords.txt }
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}

Any help would be greatly appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-splitting-my-words-tp4041913.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr splitting my words

2013-02-21 Thread Timothy Potter
Feed your data into the Analysis form to see the transformations
taking place. Navigate to the Solr admin console, select your
collection name on the left (e.g. collection1). Click on Analysis
link. I suspect it's the WordDelimiterFilterFactory that is not doing
what you expect, which you can fine-tune with the various attributes
on that factory.

Cheers,
Tim

On Thu, Feb 21, 2013 at 8:47 AM, scallawa dami...@altrec.com wrote:
 Let me start out by saying that I am just learning Solr now.  Solr is
 splitting a word and I am not sure why.  The word is mcmurdo.  If I do a
 search for McMurdo it picks it up.  If I do a search for just murdo it will
 also pick it up.  If I search for mcmurdo, I get nothing.

 womens-mcmurdo-ii-boots  that is the data in the name field that is
 getting copied to the name_search field without the quotes.  This is what we
 are feeding into solr

 The data is coming from a filed called name_search which is copied from a
 field called name.  Below is the description for name_search in the
 schema_browser.

 Field Type: TEXT

 Properties: Indexed, Tokenized, Omit Norms

 Schema: Indexed, Tokenized, Omit Norms

 Index: (unstored field)

 Copied From: NAME

 Position Increment Gap: 100

 Index Analyzer: org.apache.solr.analysis.TokenizerChain DETAILS

 Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory

 Filters:

 org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt
 ignoreCase: true enablePositionIncrements: true }
 org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange:
 1 generateNumberParts: 1 catenateWords: 1 generateWordParts: 1 catenateAll:
 0 catenateNumbers: 1 }
 org.apache.solr.analysis.SynonymFilterFactory args:{synonyms:
 index_synonyms.txt expand: false ignoreCase: true }
 org.apache.solr.analysis.LowerCaseFilterFactory args:{}
 org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected:
 protwords.txt }
 org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
 Query Analyzer: org.apache.solr.analysis.TokenizerChain DETAILS

 Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory

 Filters:

 org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: synonyms.txt
 expand: true ignoreCase: true }
 org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt
 ignoreCase: true }
 org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange:
 1 generateNumberParts: 1 catenateWords: 0 generateWordParts: 1 catenateAll:
 0 catenateNumbers: 0 }
 org.apache.solr.analysis.LowerCaseFilterFactory args:{}
 org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected:
 protwords.txt }
 org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}

 Any help would be greatly appreciated.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-splitting-my-words-tp4041913.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr splitting my words

2013-02-21 Thread Jack Krupansky
The word splitting is caused by splitOnCaseChange: 1. Change that 1 to 
0 and completely reindex your data.


-- Jack Krupansky

-Original Message- 
From: scallawa

Sent: Thursday, February 21, 2013 7:47 AM
To: solr-user@lucene.apache.org
Subject: Solr splitting my words

Let me start out by saying that I am just learning Solr now.  Solr is
splitting a word and I am not sure why.  The word is mcmurdo.  If I do a
search for McMurdo it picks it up.  If I do a search for just murdo it will
also pick it up.  If I search for mcmurdo, I get nothing.

womens-mcmurdo-ii-boots  that is the data in the name field that is
getting copied to the name_search field without the quotes.  This is what we
are feeding into solr

The data is coming from a filed called name_search which is copied from a
field called name.  Below is the description for name_search in the
schema_browser.

Field Type: TEXT

Properties: Indexed, Tokenized, Omit Norms

Schema: Indexed, Tokenized, Omit Norms

Index: (unstored field)

Copied From: NAME

Position Increment Gap: 100

Index Analyzer: org.apache.solr.analysis.TokenizerChain DETAILS

Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory

Filters:

org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt
ignoreCase: true enablePositionIncrements: true }
org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange:
1 generateNumberParts: 1 catenateWords: 1 generateWordParts: 1 catenateAll:
0 catenateNumbers: 1 }
org.apache.solr.analysis.SynonymFilterFactory args:{synonyms:
index_synonyms.txt expand: false ignoreCase: true }
org.apache.solr.analysis.LowerCaseFilterFactory args:{}
org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected:
protwords.txt }
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
Query Analyzer: org.apache.solr.analysis.TokenizerChain DETAILS

Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory

Filters:

org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: synonyms.txt
expand: true ignoreCase: true }
org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt
ignoreCase: true }
org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange:
1 generateNumberParts: 1 catenateWords: 0 generateWordParts: 1 catenateAll:
0 catenateNumbers: 0 }
org.apache.solr.analysis.LowerCaseFilterFactory args:{}
org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected:
protwords.txt }
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}

Any help would be greatly appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-splitting-my-words-tp4041913.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: SolrCloud as my primary data store

2013-02-21 Thread Timothy Potter
With Solr's atomic updates, optimistic locking, update log,
openSearcher=false on commits, etc. you can definitely do this.

Biggest question in my mind is whether you're willing to accept Solr's
emphasis on consistency vs. write-availability? With a db like
Cassandra, you can achieve better write-availability by giving up a
little on the consistency side. With Solr, you don't have that choice
- writes must succeed on the shard leader and replicas. With the tlog,
Solr still does pretty good here. The other concern is how frequently
(and how many) are you updating data in existing docs? Solr has to
delete and re-index the entire doc after updating a single field. We
abuse Solr with millions of atomic updates daily but it's not anywhere
near as fast as you get with database updates.

Lastly, have you seen Yonik's slides from Apache Eurocon - great read
if not: 
http://vimeopro.com/user11514798/apache-lucene-eurocon-2012/video/55387447

Cheers,
Tim

On Wed, Feb 20, 2013 at 10:02 PM, jimtronic jimtro...@gmail.com wrote:
 Now that I've been running Solr Cloud for a couple months and gotten
 comfortable with it, I think it's time to revisit this subject.

 When I search for the topic of using Solr as a primary db online, I get lots
 of discussions from 2-3 years ago and usually they point out a lot of
 hurdles that have now largely been eliminated with the release of Solr
 Cloud.

 I've stopped using the standard method of writing to my db and pushing out
 periodically to solr. Instead, I'm writing simultaneously to solr and the db
 with less frequent syncs from the database just to be safe. I find this to
 be much faster and easier than doing delta imports via the DIH handler. In
 fact, it's gone so smoothly, I'm really wondering why I need to keep writing
 it to the db at all.

 I've always got several nodes running and launching new ones takes only
 minutes to be fully operational. I'm taking frequent snapshots and my test
 restores have been painless and quick.

 So, if I'm looking at other NoSQL solutions like MongoDB or Cassandra, why
 wouldn't I just use Solr? It's distributed, fast, and stable. It has a great
 http api and it's nearly schema-less using dynamic fields. And, most
 importantly, it offers the most powerful query language available.

 I'd really like to hear from someone who has made the leap.

 Cheers, Jim



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-as-my-primary-data-store-tp4041774.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is their a way in which I can make spell suggestion dictionary build on specific fileds

2013-02-21 Thread Jack Krupansky
Yes, each spellchecker (or dictionary) in your spellcheck search component 
has a field parameter to specify the field to be used to generate the 
dictionary index for that spellchecker:


str name=fieldspell/str

See the Solr example solrconfig.xml and search for lst 
name=spellchecker.


Also see:
http://wiki.apache.org/solr/SpellCheckComponent

-- Jack Krupansky

-Original Message- 
From: Rohan Thakur

Sent: Thursday, February 21, 2013 2:34 AM
To: solr-user@lucene.apache.org
Subject: Is their a way in which I can make spell suggestion dictionary 
build on specific fileds


hi all

I wanted to know is their a way in which I have select on which indexed
field I want to build the spell suggestions dictionary?

thanks
regards
Rohan 



Re: Document update question

2013-02-21 Thread Timothy Potter
Hi Jack,

There was a bug for this fixed for 4.1 - which version are you on? I
remember this b/c I was on 4.0 and had to upgrade for this exact
reason.

https://issues.apache.org/jira/browse/SOLR-4134

Tim

On Wed, Feb 20, 2013 at 9:16 PM, Jack Park jackp...@topicquests.org wrote:
 From what I can read about partial updates, it will only work for
 singleton fields where you can set them to something else, or
 multi-valued fields where you can add something. I am testing on 4.1

 I ran some tests to prove to me that you cannot do anything else to a
 multi-valued field, like remove a value and do a partial update on the
 whole list. It flattens the result to a comma delimited String when I
 remove a value, from
details: [
   here  there,
   Hello there,
   Oh Fudge
 ],
 to this
details: [
   [here  there, Oh Fudge]
 ],

 Does this meant that I must remove the entire document and re-index it?

 Many thanks in advance
 Jack


Re: Is their a way in which I can make spell suggestion dictionary build on specific fileds

2013-02-21 Thread Alexandre Rafalovitch
AnalyzingSuggester might also be worth having a look at (requires some
Googling and SO reading to get it right for now).

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Thu, Feb 21, 2013 at 11:11 AM, Jack Krupansky j...@basetechnology.comwrote:

 Yes, each spellchecker (or dictionary) in your spellcheck search
 component has a field parameter to specify the field to be used to
 generate the dictionary index for that spellchecker:

 str name=fieldspell/str

 See the Solr example solrconfig.xml and search for lst
 name=spellchecker.

 Also see:
 http://wiki.apache.org/solr/**SpellCheckComponenthttp://wiki.apache.org/solr/SpellCheckComponent

 -- Jack Krupansky

 -Original Message- From: Rohan Thakur
 Sent: Thursday, February 21, 2013 2:34 AM
 To: solr-user@lucene.apache.org
 Subject: Is their a way in which I can make spell suggestion dictionary
 build on specific fileds


 hi all

 I wanted to know is their a way in which I have select on which indexed
 field I want to build the spell suggestions dictionary?

 thanks
 regards
 Rohan



Re: Document update question

2013-02-21 Thread Jack Park
I am using 4.1. I was not aware of that link. In the absence of being
able to do partial updates to multi-valued fields, I just punted to
delete and reindex. I'd like to see otherwise.

Many thanks
Jack

On Thu, Feb 21, 2013 at 8:13 AM, Timothy Potter thelabd...@gmail.com wrote:
 Hi Jack,

 There was a bug for this fixed for 4.1 - which version are you on? I
 remember this b/c I was on 4.0 and had to upgrade for this exact
 reason.

 https://issues.apache.org/jira/browse/SOLR-4134

 Tim

 On Wed, Feb 20, 2013 at 9:16 PM, Jack Park jackp...@topicquests.org wrote:
 From what I can read about partial updates, it will only work for
 singleton fields where you can set them to something else, or
 multi-valued fields where you can add something. I am testing on 4.1

 I ran some tests to prove to me that you cannot do anything else to a
 multi-valued field, like remove a value and do a partial update on the
 whole list. It flattens the result to a comma delimited String when I
 remove a value, from
details: [
   here  there,
   Hello there,
   Oh Fudge
 ],
 to this
details: [
   [here  there, Oh Fudge]
 ],

 Does this meant that I must remove the entire document and re-index it?

 Many thanks in advance
 Jack


Re: How to change the index dir in Solr 4.1

2013-02-21 Thread Timothy Potter
Have you tried leaving: dataDir${solr.data.dir:}/dataDir in
solrconfig.xml and then setting the data dir for each core in the
solr.xml, i.e.

core schema=schema.xml loadOnStartup=true instanceDir=someCore/
transient=false name=justSomeCore config=solrconfig.xml
dataDir=PATH_TO_DATA_DIR/


On Thu, Feb 21, 2013 at 7:13 AM, chamara chama...@gmail.com wrote:
 I am having 5 shards in one machine using the new one collection multiple
 cores method. I am trying to change the index directory, but if i hard code
 that in the SolrConfig.xml , the index dir does not change for other cores
 and each core tries to fight over it and ends up as  a deadlock. Is there
 anyway to suffix the index directory with the shard replica name so that for
 each shard i will have a different index directory?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-to-change-the-index-dir-in-Solr-4-1-tp4041891.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Document update question

2013-02-21 Thread Timothy Potter
Weird - the only difference I see is that we us XML vs. JSON, but
otherwise, doing the following works for us:

field update=set name=someMultiValuedFieldVALU1/field
field update=set name=someMultiValuedFieldVALU2/field

Result would be:

arr name=someMultiValuedField
  strVALU1/str
  strVALU2/str
/arr


On Thu, Feb 21, 2013 at 9:44 AM, Jack Park jackp...@topicquests.org wrote:
 I am using 4.1. I was not aware of that link. In the absence of being
 able to do partial updates to multi-valued fields, I just punted to
 delete and reindex. I'd like to see otherwise.

 Many thanks
 Jack

 On Thu, Feb 21, 2013 at 8:13 AM, Timothy Potter thelabd...@gmail.com wrote:
 Hi Jack,

 There was a bug for this fixed for 4.1 - which version are you on? I
 remember this b/c I was on 4.0 and had to upgrade for this exact
 reason.

 https://issues.apache.org/jira/browse/SOLR-4134

 Tim

 On Wed, Feb 20, 2013 at 9:16 PM, Jack Park jackp...@topicquests.org wrote:
 From what I can read about partial updates, it will only work for
 singleton fields where you can set them to something else, or
 multi-valued fields where you can add something. I am testing on 4.1

 I ran some tests to prove to me that you cannot do anything else to a
 multi-valued field, like remove a value and do a partial update on the
 whole list. It flattens the result to a comma delimited String when I
 remove a value, from
details: [
   here  there,
   Hello there,
   Oh Fudge
 ],
 to this
details: [
   [here  there, Oh Fudge]
 ],

 Does this meant that I must remove the entire document and re-index it?

 Many thanks in advance
 Jack


Re: Document update question

2013-02-21 Thread Jack Park
Interesting you should say that.  Here is my solrj code:

public Solr3Client(String solrURL) throws Exception {
server = new HttpSolrServer(solrURL);
//  server.setParser(new XMLResponseParser());
}

I cannot recall why I commented out the setParser line; something
about someone saying in another thread it's not important. I suppose I
should revisit my unit tests with that line uncommented. Or, did I
miss something?

The JSON results I painted earlier were from reading the document
online in the admin query panel.

Many thanks
Jack

On Thu, Feb 21, 2013 at 8:52 AM, Timothy Potter thelabd...@gmail.com wrote:
 Weird - the only difference I see is that we us XML vs. JSON, but
 otherwise, doing the following works for us:

 field update=set name=someMultiValuedFieldVALU1/field
 field update=set name=someMultiValuedFieldVALU2/field

 Result would be:

 arr name=someMultiValuedField
   strVALU1/str
   strVALU2/str
 /arr


 On Thu, Feb 21, 2013 at 9:44 AM, Jack Park jackp...@topicquests.org wrote:
 I am using 4.1. I was not aware of that link. In the absence of being
 able to do partial updates to multi-valued fields, I just punted to
 delete and reindex. I'd like to see otherwise.

 Many thanks
 Jack

 On Thu, Feb 21, 2013 at 8:13 AM, Timothy Potter thelabd...@gmail.com wrote:
 Hi Jack,

 There was a bug for this fixed for 4.1 - which version are you on? I
 remember this b/c I was on 4.0 and had to upgrade for this exact
 reason.

 https://issues.apache.org/jira/browse/SOLR-4134

 Tim

 On Wed, Feb 20, 2013 at 9:16 PM, Jack Park jackp...@topicquests.org wrote:
 From what I can read about partial updates, it will only work for
 singleton fields where you can set them to something else, or
 multi-valued fields where you can add something. I am testing on 4.1

 I ran some tests to prove to me that you cannot do anything else to a
 multi-valued field, like remove a value and do a partial update on the
 whole list. It flattens the result to a comma delimited String when I
 remove a value, from
details: [
   here  there,
   Hello there,
   Oh Fudge
 ],
 to this
details: [
   [here  there, Oh Fudge]
 ],

 Does this meant that I must remove the entire document and re-index it?

 Many thanks in advance
 Jack


Re: How to change the index dir in Solr 4.1

2013-02-21 Thread chamara
Yes that is what i am doing now? I taught this solution is not elegant for a
deployment? Is there any other way to do this from the SolrConfig.xml?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-change-the-index-dir-in-Solr-4-1-tp4041891p4041950.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to change the index dir in Solr 4.1

2013-02-21 Thread Mingfeng Yang
How about passing -Dsolr.data.dir=/ur/data/dir  in the command line to java
when you start Solr service.


On Thu, Feb 21, 2013 at 9:05 AM, chamara chama...@gmail.com wrote:

 Yes that is what i am doing now? I taught this solution is not elegant for
 a
 deployment? Is there any other way to do this from the SolrConfig.xml?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-change-the-index-dir-in-Solr-4-1-tp4041891p4041950.html
 Sent from the Solr - User mailing list archive at Nabble.com.



synonym replacement in AnalyzingSuggester?

2013-02-21 Thread Sebastian Saip
I'm using the new AnalyzingSuggester (my code is available on
http://pastebin.com/tN9yXHB0)
and I got the synonyms whisky,whiskey (they are bi-directional)

So whether the user searches for whiskey or whisky, I want to retrieve all
documents that have any of them.

However, for autosuggest, I would like to prefer (better said: only show!)
whisky
e.g. I got the document Whiskey Bottle
but autosuggest for whi should return Whisky Bottle

The only way I'd think of is replacing Whiskey with Whisky on feeding,
but that would also mean an additional field in solr (since I do want to
keep Whiskey in the original field)

Is there any way to do some kind of synonym replacement on-the-fly for
these suggestions?
Has anyone ever done that or has an idea how to do that?

Cheers.
Sebastian


Re: Threads running while querrying

2013-02-21 Thread Ido Kissos
I get 2 second response time in average.
Any config / hardware change suggestions for my usecase - low qps rate?
I would say more shards on the same node, but there would be the cache
diminution disadvantage


On Wednesday, February 20, 2013, Walter Underwood wrote:

 In production, you should have requests arriving at Solr simultaneously.
 Those simultaneous requests will be processed in parallel.

 For each query, there are many ways to improve response time. It depends
 on the query and the schema.

 What query response time are you seeing?

 wunder

 On Feb 20, 2013, at 7:39 AM, Manuel Le Normand wrote:

  Thanks for the reply Erick! To make sure i understand: each query
  request runs on a single thread of the shard.
  My searcher thread is CPU bounded. Does it mean my only possibility to
  shorten my query time, assuming low qps rate, is to split my collection
 to
  many shards on different nodes? (And that multiple CPU cores are good
 only
  for high qps rate?)
  Thanks in advance
 
  On Wednesday, February 20, 2013, Erick Erickson wrote:
 
  Well, it matters because your single-threaded client is firing one
 request,
  waiting for the response, then firing another. There's no opportunity
 for
  Solr to use more than one thread for queries if there's only a single
  thread on a single client ever making requests
 
  Or I misunderstand what you've set up completely.
 
  Best
  Erick
 
 
  On Wed, Feb 20, 2013 at 8:37 AM, Manuel Le Normand 
  manuel.lenorm...@gmail.com javascript:; javascript:; wrote:
 
  Yes, i made a one threaded script which sends a querry by a post
 request
  to
  the shard's url, gets back the response and posts the next querry.
  How can it matter?
  Manuel
 
  On Wednesday, February 20, 2013, Erick Erickson wrote:
 
  Silly question perhaps, but are you feeding queries  at Solr with a
  single
  thread? Because Solr uses multiple threads to search AFAIK.
 
  Best
  Erick
 
 
  On Wed, Feb 20, 2013 at 4:01 AM, Manuel Le Normand 
  manuel.lenorm...@gmail.com javascript:; javascript:;
 javascript:; wrote:
 
  More to it, i do see 75 more threads under the process of tomcat6,
  but
  only
  a single one is working while querrying
 
  On Wednesday, February 20, 2013, Manuel Le Normand wrote:
 
  Hello,
  I created a single collection on a linux server with 8m docs. Solr
  4.1
  While making performance tests, i see that my quad core server
  makes
  a
  full use of a single core while the 3 others are idle.
  Is there a possibility of making a single sharded collection
  available
  for
  multi-threaded querry?
  P.s: im not indexing while querrying
 
 
 







Re: Index optimize takes more than 40 minutes for 18M documents

2013-02-21 Thread Walter Underwood
That seems fairly fast. We index about 3 million documents in about half that 
time. We are probably limited by the time it takes to get the data from MySQL.

Don't optimize. Solr automatically merges index segments as needed. Optimize 
forces a full merge. You'll probably never notice the difference, either in 
disk space or speed.

It might make sense to force merge (optimize) if you reindex everything once 
per day and have no updates in between. But even then it may be a waste of time.

You need lots of free disk space for merging, whether a forced merge or 
automatic. Free space equal to the size of the index is usually enough, but 
worst case can need double the size of the index.

wunder

On Feb 21, 2013, at 9:20 AM, Yandong Yao wrote:

 Hi Guys,
 
 I am using Solr 4.1 and have indexed 18M documents using solrj
 ConcurrentUpdateSolrServer (each document contains 5 fields, and average
 length is less than 1k).
 
 1) It takes 70 minutes to index those documents without optimize on my mac
 10.8, how is the performance, slow, fast or common?
 
 2) It takes about 40 minutes to optimize those documents, following is top
 output, and there are lots of FAULTS, what does this means?
 
 Processes: 118 total, 2 running, 8 stuck, 108 sleeping, 719 threads
 
   00:56:52
 Load Avg: 1.48, 1.56, 1.73  CPU usage: 6.63% user, 6.40% sys, 86.95% idle
 SharedLibs: 31M resident, 0B data, 6712K linkedit.
 MemRegions: 34734 total, 5801M resident, 39M private, 638M shared. PhysMem:
 982M wired, 3600M active, 3567M inactive, 8150M used, 38M free.
 VM: 254G vsize, 1285M framework vsize, 1469887(368) pageins, 1095550(0)
 pageouts.  Networks: packets: 14842595/9661M in, 14777685/9395M out.
 Disks: 820048/43G read, 523814/53G written.
 
 PID   COMMAND  %CPU  TIME #TH  #WQ  #POR #MRE RPRVT  RSHRD  RSIZE
 VPRVT  VSIZE  PGRP PPID STATE   UID  FAULTS   COW  MSGSENT  MSGRECV SYSBSD
   SYSMACH
 4585  java 11.7  02:52:01 32   1483  342  3866M+ 6724K  3856M+
 4246M  6908M  4580 4580 sleepin 501  1490340+ 402  3000781+ 231785+
 15044055+ 10033109+
 
 3) If I don't run optimize, what is the impact? bigger disk size or slow
 query performance?
 
 Following is my index config in  solrconfig.xml:
 
 ramBufferSizeMB100/ramBufferSizeMB
 mergeFactor10/mergeFactor
 autoCommit
   maxDocs10/maxDocs!-- 100K docs --
   maxTime30/maxTime!-- 5 minutes --
   openSearcherfalse/openSearcher
 /autoCommit
 
 Thanks very much in advance!
 
 Regards,
 Yandong






Re: DIH deleting documents

2013-02-21 Thread Gora Mohanty
On 21 February 2013 19:30, cveres csabave...@me.com wrote:
 Thanks Gora,

 Sorry I might not have been sufficiently clear.

 I start with an empty index, then add documents.
 9000 are added and 6000 immediately deleted again, leaving 3000.
 I assume this can only happen with duplicate IDs, but that should not be
 possible! So I wanted to get a list of deleted documents so that I could try
 and figure out why they were deleted immediately.
[...]

What do you mean by 9000 are added and 6000
immediately deleted again? How are you getting
the number added, and the number deleted? How
many documents does DIH report on the final screen
after the full-import completes?

From what you describe, it is most likely duplicate IDs.
Could you do a SELECT from the database outside of
Solr, create the IDs as you do with DIH, and see what is
going wrong there?

Regards,
Gora


Re: Solr splitting my words

2013-02-21 Thread scallawa
I tried playing with the analyzer before posting and wasn't sure how to
interpret it.  
Field type: text
Field value index: womens-mcmurdo-ii-bootsthis is based on the info that
is in the field
Field value query: mcmurdo

results
I only got one match in the index analyzer
org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=0,
catenateNumbers=1}
term position  12   3   4
term text  womens   mcmurdo ii  bootswomensmcmurdoiiboots
term type  word wordwordword
word
source start,end0,6 7,14   15,1718,23
0,23
payload 


Jack,
The field that I am expecting to be indexed is not sending the data in caps. 
Which is why I am puzzled.  I am wondering if the indexed data is not coming
from the field I expect.  I will try your change in dev once I get data
generated there.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-splitting-my-words-tp4041913p4041963.html
Sent from the Solr - User mailing list archive at Nabble.com.


splitting big, existing index into shards

2013-02-21 Thread zqzuk
Hi

I have built a 300GB index using lucene 4.1 and now it is too big to do
queries efficiently. I wonder if it is possible to split it into shards,
then use SolrCloud configuration?

I have looked around the forum but was unable to find any tips on this. Any
help please?

Many thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/splitting-big-existing-index-into-shards-tp4041964.html
Sent from the Solr - User mailing list archive at Nabble.com.


Matching an exact word

2013-02-21 Thread Van Tassell, Kristian
I'm trying to match the word created. Given that it is surrounded by quotes, 
I would expect an exact match to occur, but instead the entire stemming results 
show for words such as create, creates, created, etc.

q=createdwt=xmlrows=1000qf=textdefType=edismax

If I copy the text field to a new one that does not stem words, text_exact 
for example, I get the expected results:

q=createdwt=xmlrows=1000qf=text_exactdefType=edismax

I would like the decision whether to match exact or not to be determined by the 
quotes rather than the qf parameter (eg, not have to use it at all). What topic 
do I need to look into more to understand this? Thanks in advance!



Re: splitting big, existing index into shards

2013-02-21 Thread Upayavira
You can split an index using the MultiPassIndexSplitter, which is in
Lucene contrib. However, it won't use the same algorithm for assigning
documents to shards, which means the indexes won't work with a SolrCloud
setup.

A splitter that uses the same split technique but uses the shard
assignment algorithm from SolrCloud could be a useful thing.

But I have to say, I suspect it will be quicker/easier to just re-index.
Make sure you choose the right number of shards, with SolrCloud as it
is, you cannot change the number of shards without reindexing. This may
change soon with newer releases of Solr though.

Upayavira

On Thu, Feb 21, 2013, at 06:09 PM, zqzuk wrote:
 Hi
 
 I have built a 300GB index using lucene 4.1 and now it is too big to do
 queries efficiently. I wonder if it is possible to split it into shards,
 then use SolrCloud configuration?
 
 I have looked around the forum but was unable to find any tips on this.
 Any
 help please?
 
 Many thanks!
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/splitting-big-existing-index-into-shards-tp4041964.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Matching an exact word

2013-02-21 Thread Upayavira
Solr will only match on the terms as they are in the index. If it is
stemmed in the index, it will match that. If it isn't, it'll match that.

All term matches are (by default at least) exact matches. Only with
stemming you are doing an exact match against the stemmed term.
Therefore, there really is no way to do what you are looking for within
Solr. I'd suggest you'll need to do some parsing at your side and, if
you find quotes, do the query against a different field.

Upayavira

On Thu, Feb 21, 2013, at 06:17 PM, Van Tassell, Kristian wrote:
 I'm trying to match the word created. Given that it is surrounded by
 quotes, I would expect an exact match to occur, but instead the entire
 stemming results show for words such as create, creates, created, etc.
 
 q=createdwt=xmlrows=1000qf=textdefType=edismax
 
 If I copy the text field to a new one that does not stem words,
 text_exact for example, I get the expected results:
 
 q=createdwt=xmlrows=1000qf=text_exactdefType=edismax
 
 I would like the decision whether to match exact or not to be determined
 by the quotes rather than the qf parameter (eg, not have to use it at
 all). What topic do I need to look into more to understand this? Thanks
 in advance!
 


Re: Solr UIMA

2013-02-21 Thread Chris Hostetter

: Subject: Solr UIMA
: References: 5123b218.7050...@juntadeandalucia.es
: In-reply-to: 5123b218.7050...@juntadeandalucia.es

https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.


-Hoss


Re: DIH deleting documents

2013-02-21 Thread Arcadius Ahouansou
Hi Csaba.

Would you mind posting your DIHconfig/data-config.xml and the command
you use for the import?

Thanks.

Arcadius.


On 21 February 2013 17:55, Gora Mohanty g...@mimirtech.com wrote:
 On 21 February 2013 19:30, cveres csabave...@me.com wrote:
 Thanks Gora,

 Sorry I might not have been sufficiently clear.

 I start with an empty index, then add documents.
 9000 are added and 6000 immediately deleted again, leaving 3000.
 I assume this can only happen with duplicate IDs, but that should not be
 possible! So I wanted to get a list of deleted documents so that I could try
 and figure out why they were deleted immediately.
 [...]

 What do you mean by 9000 are added and 6000
 immediately deleted again? How are you getting
 the number added, and the number deleted? How
 many documents does DIH report on the final screen
 after the full-import completes?

 From what you describe, it is most likely duplicate IDs.
 Could you do a SELECT from the database outside of
 Solr, create the IDs as you do with DIH, and see what is
 going wrong there?

 Regards,
 Gora


Re: get content is put in the index queue but is not committed

2013-02-21 Thread Chris Hostetter

:  Anybody know how-to get content is put in the index queue but is not
: committed?

i'm guessing you are refering to uncommited documents in the transaction 
log?  Take a look at the UpdateLog class, and how it's used by the 
RealTimeGetComponent.

If you provide more details as to what you end goal is, we might be able 
to provide more specific (or alternative) suggestions on how to achieve 
your goal...


https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an XY Problem ... that is: you are dealing
with X, you are assuming Y will help you, and you are asking about Y
without giving more details about the X so that we can understand the
full issue.  Perhaps the best solution doesn't involve Y at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341


-Hoss


Re: Is their a way to remove the unwanted characters from solr index

2013-02-21 Thread Chris Hostetter

: I have a field in which I have strings with unwanted character like
: \n\r\n\n   these kind, I wanted to know is their any why I can remove
: these...actually I had data stored in html format in the sql database
: column which I had to index in solr...using HTML stripe I had removed the
: HTML tags but leaving these unwanted characters in between any one knows
: how to remove them.

https://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/RegexReplaceProcessorFactory.html

See the parent class for an in depth description of how to configure which 
fields it will be applied to...

https://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html


-Hoss


Re: Slaves always replicate entire index Index versions

2013-02-21 Thread Amit Nithian
Sounds good I am trying the combination of my patch and 4413 now to see how
it works and will have to see if I can put unit tests around them as some
of what I thought may not be true with respect to the commit generation
numbers.

For your issue above in your last post, is it possible that there was a
commit on the master in that slight window after solr checks for the latest
generation of the master but before it downloads the actual files? How
frequent are the commits on your master?


On Thu, Feb 21, 2013 at 2:00 AM, raulgrande83 raulgrand...@hotmail.comwrote:

 Thanks for the patch, we'll try to install these fixes and post if
 replication works or not.

 I renamed 'index.timestamp' folders to just 'index' but it didn't work.
 These lines appeared in the log:
 INFO: Master's generation: 64594
 21-feb-2013 10:42:00 org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave's generation: 64593
 21-feb-2013 10:42:00 org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Starting replication process
 21-feb-2013 10:42:00 org.apache.solr.handler.SnapPuller fetchFileList
 SEVERE: No files to download for index generation: 64594



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Slaves-always-replicate-entire-index-Index-versions-tp4041256p4041827.html
 Sent from the Solr - User mailing list archive at Nabble.com.



can i install new SOLR 4.1 as slaver(3.3 Master)

2013-02-21 Thread michaelweica
Hi ,

our SOLR master version is 3.3,  can i install new box SOLR 4.1 as slaver, 
and replication from master data.

thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/can-i-install-new-SOLR-4-1-as-slaver-3-3-Master-tp4041976.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Is it possible to manually select a shard leader in a running SolrCloud?

2013-02-21 Thread Vaillancourt, Tim
Thanks Mark,

The real driver for me wanting to promote a different leader is when I create a 
new Collection via the Collections API across a multi-server SolrCloud, the 
leader of each shard is always the same host, so you're right that I'm tackling 
the wrong problem with this request, although it would fix it for me.

If I create the cores manually via the cores API, one-by-one, I am able to get 
what I expect, but when running this Collections API call on a 3 SOLR 4.1 
instance, 3 shard setup, 1 server becomes the leader of all 3 shards, meaning 
it will get all the writes for everything (correct me if I am wrong). If so, 
this will not scale well with all writes to one node (or correct me if I am 
wrong)?

curl -v 
'http://HOST:8983/solr/admin/collections?action=CREATEname=testnumShards=3replicationFactor=1maxShardsPerNode=2'

Currently on my 3 instance SOLR 4.1 setup, the above call creates the following:

- ServerA is the leader of all 3 shards (the problem I want to address).
- ServerB + ServerC are automagically replicas of the 3 leader shards on 
ServerA.

So again, my issue is one server gets all the writes. Does anyone else 
encounter this? If so, I should spawn a separate thread on my specific issue.

Cheers,

Tim

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Tuesday, February 19, 2013 8:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Is it possible to manually select a shard leader in a running 
SolrCloud?

You can't easily do it the way it's implemented in ZooKeeper. We would probably 
internally have to do the same thing - elect a new leader and drop him until 
the one we wanted came up. The main thing doing it internally would gain is 
that you could skip the elected guy from becoming the actual leader and just 
move on to the next candidate.
Still some tricky corner cases to deal with and such as well.

I think for most things you would use this to solve, there is probably an 
alternate thing that should be addressed.

- Mark

On Mon, Feb 18, 2013 at 4:15 PM, Vaillancourt, Tim tvaillanco...@ea.com wrote:
 Hey all,

 I feel having to unload the leader core to force an election is hacky, and 
 as far as I know would still leave which node becomes the Leader to chance, 
 ie I cannot guarantee NodeX becomes Leader 100% in all cases.

 Also, this imposes additional load temporarily.

 Is there a way to force the winner of the Election, and if not, is there a 
 known feature-request for this?

 Cheers,

 Tim Vaillancourt

 -Original Message-
 From: Joseph Dale [mailto:joey.d...@gmail.com]
 Sent: Sunday, February 03, 2013 7:42 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Is it possible to manually select a shard leader in a running 
 SolrCloud?

 With solrclound all cores are collections. The collections API it just a 
 wrapper to call the core api a million times with one command.

 to /solr/admin/cores?action=CREATEname=core1collection=core1shard=1

 Basically your creating the shard again, after leader props have gone out. 
 Solr will check ZK and find a core meeting that description, then simply get 
 a copy of the index from the leader of that shard.


 On Feb 3, 2013, at 10:37 AM, Brett Hoerner br...@bretthoerner.com wrote:

 What is the inverse I'd use to re-create/load a core on another 
 machine but make sure it's also known to SolrCloud/as a shard?


 On Sat, Feb 2, 2013 at 4:01 PM, Joseph Dale joey.d...@gmail.com wrote:


 To be more clear lets say bob it the leader of core 1. On bob do a 
 /admin/cores?action=unloadname=core1. This removes the core/shard 
 from bob, giving the other servers a chance to grab leader props.

 -Joey

 On Feb 2, 2013, at 11:27 AM, Brett Hoerner br...@bretthoerner.com wrote:

 Hi,

 I have a 5 server cluster running 1 collection with 20 shards,
 replication
 factor of 2.

 Earlier this week I had to do a rolling restart across the cluster, 
 this worked great and the cluster stayed up the whole time. The 
 problem is
 that
 the last node I restarted is now the leader of 0 shards, and is 
 just holding replicas.

 I've noticed this node has abnormally high load average, while the 
 other nodes (who have the same number of shards, but more leaders 
 on
 average)
 are
 fine.

 First, I'm wondering if that loud could be related to being a 5x 
 replica and 0x leader?

 Second, I was wondering if I could somehow flag single shards to
 re-elect a
 leader (or force a leader) so that I could more evenly distribute 
 how
 many
 leader shards each physical server has running?

 Thanks.







--
- Mark



Re: Combining Solr score with customized user ratings for a document

2013-02-21 Thread Chris Hostetter

: With this approach now I can boost (i.e. multiply Solr's score by a factor)
: the results of any query by doing something like this:
: http://localhost:8080/solr/Prueba/select_test?q={!boost
: b=rating(usuario1)}text:grapafl=score
: 
: Where 'rating' is the name of my function.
: 
: Unfortunately, I still can't see which differences are between doing this or
: making the product of both scores as the value for the query's sort
: parameter... :(

I'm not sure i understand your question.  With the example query above, 
your score -- both returned, and used for sorting by score -- is the 
mathematical result of multiplying your function by the relevancy score of 
text:grapa

Perhaps what you are refering to is the idea that if you wnat the score 
to remain purely about relevancy, you can still opionally sort on the 
results of this function, by using the function solely in your sort -- the 
only thing that tends to confuse people here is how you refer back to the 
original query in that sort by function command...

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201206.mbox/%3Calpine.DEB.2.00.1206111242260.17925@bester%3E

or in your case, something like this would return the both the raw 
score, and your custom rating, but it would sort on the product of those 
two values...

?q=text:grapafl=id,score,rating(usuario1)sort=product(rating(usuario1),query($q)

: Which is the best place to do it? I think I would query the DB/cache just
: when the custom ValueSource is created in the ValueSourceParser's parse

That might makes sense, but becareful where you put this cache data -- 
if it's part of the ValueSource then whenever that ValueSource is used in 
a FunctionQuery (ie: {!boost b=rating(usuario1)}text:grapa it will be 
part of the cache key for the queryResultCache or filterCache -- so having 
large data structures in your ValueSource could eat up a lot of RAM.  Take 
a look at src/docs/differences between the ValueSource class and the 
FunctionValues class

-Hoss


Re: If we Open Source our platform, would it be interesting to you?

2013-02-21 Thread Marcelo Elias Del Valle
Hello David,

 First of all, thanks for answering!

2013/2/21 David Quarterman da...@corexe.com

 Looked through your site and the framework looks very powerful as an
 aggregator. We do a lot of data aggregation from many different sources in
 many different formats (XML, JSON, text, CSV, etc) using RDBMS as the main
 repository for eventual SOLR indexing. A 'one-stop-shop' for all this would
 be very appealing.


Actually, just to clarify, it uses Cassandra as repository, not an
RDMS. We want to use it for large scale, so you could import entire company
databases into the repo and relate the data from one another. However, If I
understood you right, you got the idea, an intermediate repo before
indexing, so you could postpone decisions about what to index and how...


 Have you looked at products like Talend  Jitterbit? These offer
 transformation from almost anything to almost anything using graphical
 interfaces (Jitterbit is better) and a PHP-like coding format for trickier
 work. If you (or somebody) could add a graphical interface, the world would
 beat a path to your door!


 This is very interesting, actually! We considered using Talend when we
started our business, but we decided to go ahead with the development of a
new product. The reason was: Talend is great, but it limits a good
programmer, if he is more agile coding than using graphical interfaces.
Have user interfaces as a possibility is nice, but as something you HAVE TO
use is awful. Besides, it has a learning curve and seems to run better and
you hire their own platform, and we wanted to choose the fine grain of our
platform.
  However, your question made me think a lot about it. Do you think
integrating to jitterbit or talend could be interesting? Or did you mean
developing a new user interface? The bad thing I see in integrating with a
talend like program is that you start to be dependent on the graphical
interface, I feel it's hard to use my own java code... I might be wrong.
  Anyway, I will consider this possibility, but if you could explain
better why you think one or other could be such a good idea would help us a
lot. Would you be interested in using such a tool yourself?

Best regards,
Marcelo.


Re: DIH deleting documents

2013-02-21 Thread cveres
Hi Gora and Arcadius,

Thanks for your help. I'll try and answer both your questions here.

I am interested in three database tables. Book contains information about
books, page has the content of each book page by page, and chapter
contains the title of each chapter in every book, and the page on which the
chapter begins. It is a bit of a mess because I need the contents of each
chapter in every book, but I have to infer which pages each chapter contains
by its page number. So there is quite a complex query.

There are 8764 rows in the chapter table .. so 8764 unique chapter headings
.. and 6870 books. 

When I import, I get 

Num Docs:
2784
Max Doc:
9488
Deleted Docs:
6704

Here is the config file (the relevant part):

   entity name=book_chapter PK=ID rootEntity=false
   query=select id as b_id,title,type_id from book
   entity name=chapter 
   query=SELECT CONCAT(CAST('${book_chapter.title}' AS
CHAR),'-',CAST(chapter AS CHAR)) as solr_id, book_id,'chapter' as
entityType,GROUP_CONCAT(content_raw) from (select id as page_id, book_id,
page_no, content_raw, (select   title from chapter ch where
(ch.begin_page_no lt; p.page_no OR ch.begin_page_no = p.page_no) and
ch.book_id = p.book_id and ch.parent_id = 0 order by begin_page_no desc 
LIMIT 1) as chapter from page p where  book_id = '${book_chapter.b_id}') a
group by chapter  
 field column=solr_id name=id /
field column=title name=title/   
field column=GROUP_CONCAT(content_raw) name=pageText/
field column=entityType name=entityType/
 entity name=book-type2 query=select name,id from book_type
where id='${book_chapter.type_id}' 
  field column=name name=contentType/ 
  /entity
   /entity
   /entity

thanks,

Csaba



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-deleting-documents-tp4041811p4041996.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: If we Open Source our platform, would it be interesting to you?

2013-02-21 Thread Jack Park
Marcelo

In some sense, it sounds like you are aiming at building a topic map
of all your resources.

Jack

On Thu, Feb 21, 2013 at 11:54 AM, Marcelo Elias Del Valle
marc...@s1mbi0se.com.br wrote:
 Hello David,

  First of all, thanks for answering!

 2013/2/21 David Quarterman da...@corexe.com

 Looked through your site and the framework looks very powerful as an
 aggregator. We do a lot of data aggregation from many different sources in
 many different formats (XML, JSON, text, CSV, etc) using RDBMS as the main
 repository for eventual SOLR indexing. A 'one-stop-shop' for all this would
 be very appealing.


 Actually, just to clarify, it uses Cassandra as repository, not an
 RDMS. We want to use it for large scale, so you could import entire company
 databases into the repo and relate the data from one another. However, If I
 understood you right, you got the idea, an intermediate repo before
 indexing, so you could postpone decisions about what to index and how...


 Have you looked at products like Talend  Jitterbit? These offer
 transformation from almost anything to almost anything using graphical
 interfaces (Jitterbit is better) and a PHP-like coding format for trickier
 work. If you (or somebody) could add a graphical interface, the world would
 beat a path to your door!


  This is very interesting, actually! We considered using Talend when we
 started our business, but we decided to go ahead with the development of a
 new product. The reason was: Talend is great, but it limits a good
 programmer, if he is more agile coding than using graphical interfaces.
 Have user interfaces as a possibility is nice, but as something you HAVE TO
 use is awful. Besides, it has a learning curve and seems to run better and
 you hire their own platform, and we wanted to choose the fine grain of our
 platform.
   However, your question made me think a lot about it. Do you think
 integrating to jitterbit or talend could be interesting? Or did you mean
 developing a new user interface? The bad thing I see in integrating with a
 talend like program is that you start to be dependent on the graphical
 interface, I feel it's hard to use my own java code... I might be wrong.
   Anyway, I will consider this possibility, but if you could explain
 better why you think one or other could be such a good idea would help us a
 lot. Would you be interested in using such a tool yourself?

 Best regards,
 Marcelo.


RE: Matching an exact word

2013-02-21 Thread Van Tassell, Kristian
Thank you.

So essentially I need to write a custom query parser (extending upon something 
like the QParser)?

-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk] 
Sent: Thursday, February 21, 2013 12:22 PM
To: solr-user@lucene.apache.org
Subject: Re: Matching an exact word

Solr will only match on the terms as they are in the index. If it is stemmed in 
the index, it will match that. If it isn't, it'll match that.

All term matches are (by default at least) exact matches. Only with stemming 
you are doing an exact match against the stemmed term.
Therefore, there really is no way to do what you are looking for within Solr. 
I'd suggest you'll need to do some parsing at your side and, if you find 
quotes, do the query against a different field.

Upayavira

On Thu, Feb 21, 2013, at 06:17 PM, Van Tassell, Kristian wrote:
 I'm trying to match the word created. Given that it is surrounded by 
 quotes, I would expect an exact match to occur, but instead the entire 
 stemming results show for words such as create, creates, created, etc.
 
 q=createdwt=xmlrows=1000qf=textdefType=edismax
 
 If I copy the text field to a new one that does not stem words, 
 text_exact for example, I get the expected results:
 
 q=createdwt=xmlrows=1000qf=text_exactdefType=edismax
 
 I would like the decision whether to match exact or not to be 
 determined by the quotes rather than the qf parameter (eg, not have to 
 use it at all). What topic do I need to look into more to understand 
 this? Thanks in advance!
 


Re: Matching an exact word

2013-02-21 Thread SUJIT PAL
You could also do this outside Solr, in your client. If your query is 
surrounded by quotes, then strip away the quotes and make 
q=text_exact_field:your_unquoted_query. Probably better to do outside Solr in 
general keeping in mind the upgrade path.

-sujit

On Feb 21, 2013, at 12:20 PM, Van Tassell, Kristian wrote:

 Thank you.
 
 So essentially I need to write a custom query parser (extending upon 
 something like the QParser)?
 
 -Original Message-
 From: Upayavira [mailto:u...@odoko.co.uk] 
 Sent: Thursday, February 21, 2013 12:22 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Matching an exact word
 
 Solr will only match on the terms as they are in the index. If it is stemmed 
 in the index, it will match that. If it isn't, it'll match that.
 
 All term matches are (by default at least) exact matches. Only with stemming 
 you are doing an exact match against the stemmed term.
 Therefore, there really is no way to do what you are looking for within Solr. 
 I'd suggest you'll need to do some parsing at your side and, if you find 
 quotes, do the query against a different field.
 
 Upayavira
 
 On Thu, Feb 21, 2013, at 06:17 PM, Van Tassell, Kristian wrote:
 I'm trying to match the word created. Given that it is surrounded by 
 quotes, I would expect an exact match to occur, but instead the entire 
 stemming results show for words such as create, creates, created, etc.
 
 q=createdwt=xmlrows=1000qf=textdefType=edismax
 
 If I copy the text field to a new one that does not stem words, 
 text_exact for example, I get the expected results:
 
 q=createdwt=xmlrows=1000qf=text_exactdefType=edismax
 
 I would like the decision whether to match exact or not to be 
 determined by the quotes rather than the qf parameter (eg, not have to 
 use it at all). What topic do I need to look into more to understand 
 this? Thanks in advance!
 



Re: Matching an exact word

2013-02-21 Thread Sebastian Saip
And keep in mind you do need quotes around your searchTerm if it consists
of multiple words - q=text_exact_field:your_unquoted_query
otherwise Solr will interpret two words as: exact_field:two
defaultfield:words

(Maybe not directly applicable for your problem Kristian, but I just want
to mention that there are a few StemFilters available, maybe another one
acts differently!)


On 21 February 2013 21:52, SUJIT PAL sujit@comcast.net wrote:

 You could also do this outside Solr, in your client. If your query is
 surrounded by quotes, then strip away the quotes and make
 q=text_exact_field:your_unquoted_query. Probably better to do outside Solr
 in general keeping in mind the upgrade path.

 -sujit

 On Feb 21, 2013, at 12:20 PM, Van Tassell, Kristian wrote:

  Thank you.
 
  So essentially I need to write a custom query parser (extending upon
 something like the QParser)?
 
  -Original Message-
  From: Upayavira [mailto:u...@odoko.co.uk]
  Sent: Thursday, February 21, 2013 12:22 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Matching an exact word
 
  Solr will only match on the terms as they are in the index. If it is
 stemmed in the index, it will match that. If it isn't, it'll match that.
 
  All term matches are (by default at least) exact matches. Only with
 stemming you are doing an exact match against the stemmed term.
  Therefore, there really is no way to do what you are looking for within
 Solr. I'd suggest you'll need to do some parsing at your side and, if you
 find quotes, do the query against a different field.
 
  Upayavira
 
  On Thu, Feb 21, 2013, at 06:17 PM, Van Tassell, Kristian wrote:
  I'm trying to match the word created. Given that it is surrounded by
  quotes, I would expect an exact match to occur, but instead the entire
  stemming results show for words such as create, creates, created, etc.
 
  q=createdwt=xmlrows=1000qf=textdefType=edismax
 
  If I copy the text field to a new one that does not stem words,
  text_exact for example, I get the expected results:
 
  q=createdwt=xmlrows=1000qf=text_exactdefType=edismax
 
  I would like the decision whether to match exact or not to be
  determined by the quotes rather than the qf parameter (eg, not have to
  use it at all). What topic do I need to look into more to understand
  this? Thanks in advance!
 




RE: Is it possible to manually select a shard leader in a running SolrCloud?

2013-02-21 Thread Vaillancourt, Tim
Correction, I used this curl:

curl -v 
'http://HOST:8983/solr/admin/collections?action=CREATEname=testnumShards=3replicationFactor=2maxShardsPerNode=2'

So 3 instances, 3 shards, 2 replicas per shard. ServerA becomes leader of all 3 
shards in 4.1 with this call.

Tim Vaillancourt

-Original Message-
From: Vaillancourt, Tim [mailto:tvaillanco...@ea.com] 
Sent: Thursday, February 21, 2013 11:27 AM
To: solr-user@lucene.apache.org; markrmil...@gmail.com
Subject: RE: Is it possible to manually select a shard leader in a running 
SolrCloud?

Thanks Mark,

The real driver for me wanting to promote a different leader is when I create a 
new Collection via the Collections API across a multi-server SolrCloud, the 
leader of each shard is always the same host, so you're right that I'm tackling 
the wrong problem with this request, although it would fix it for me.

If I create the cores manually via the cores API, one-by-one, I am able to get 
what I expect, but when running this Collections API call on a 3 SOLR 4.1 
instance, 3 shard setup, 1 server becomes the leader of all 3 shards, meaning 
it will get all the writes for everything (correct me if I am wrong). If so, 
this will not scale well with all writes to one node (or correct me if I am 
wrong)?

curl -v 
'http://HOST:8983/solr/admin/collections?action=CREATEname=testnumShards=3replicationFactor=1maxShardsPerNode=2'

Currently on my 3 instance SOLR 4.1 setup, the above call creates the following:

- ServerA is the leader of all 3 shards (the problem I want to address).
- ServerB + ServerC are automagically replicas of the 3 leader shards on 
ServerA.

So again, my issue is one server gets all the writes. Does anyone else 
encounter this? If so, I should spawn a separate thread on my specific issue.

Cheers,

Tim

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: Tuesday, February 19, 2013 8:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Is it possible to manually select a shard leader in a running 
SolrCloud?

You can't easily do it the way it's implemented in ZooKeeper. We would probably 
internally have to do the same thing - elect a new leader and drop him until 
the one we wanted came up. The main thing doing it internally would gain is 
that you could skip the elected guy from becoming the actual leader and just 
move on to the next candidate.
Still some tricky corner cases to deal with and such as well.

I think for most things you would use this to solve, there is probably an 
alternate thing that should be addressed.

- Mark

On Mon, Feb 18, 2013 at 4:15 PM, Vaillancourt, Tim tvaillanco...@ea.com wrote:
 Hey all,

 I feel having to unload the leader core to force an election is hacky, and 
 as far as I know would still leave which node becomes the Leader to chance, 
 ie I cannot guarantee NodeX becomes Leader 100% in all cases.

 Also, this imposes additional load temporarily.

 Is there a way to force the winner of the Election, and if not, is there a 
 known feature-request for this?

 Cheers,

 Tim Vaillancourt

 -Original Message-
 From: Joseph Dale [mailto:joey.d...@gmail.com]
 Sent: Sunday, February 03, 2013 7:42 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Is it possible to manually select a shard leader in a running 
 SolrCloud?

 With solrclound all cores are collections. The collections API it just a 
 wrapper to call the core api a million times with one command.

 to /solr/admin/cores?action=CREATEname=core1collection=core1shard=1

 Basically your creating the shard again, after leader props have gone out. 
 Solr will check ZK and find a core meeting that description, then simply get 
 a copy of the index from the leader of that shard.


 On Feb 3, 2013, at 10:37 AM, Brett Hoerner br...@bretthoerner.com wrote:

 What is the inverse I'd use to re-create/load a core on another 
 machine but make sure it's also known to SolrCloud/as a shard?


 On Sat, Feb 2, 2013 at 4:01 PM, Joseph Dale joey.d...@gmail.com wrote:


 To be more clear lets say bob it the leader of core 1. On bob do a 
 /admin/cores?action=unloadname=core1. This removes the core/shard 
 from bob, giving the other servers a chance to grab leader props.

 -Joey

 On Feb 2, 2013, at 11:27 AM, Brett Hoerner br...@bretthoerner.com wrote:

 Hi,

 I have a 5 server cluster running 1 collection with 20 shards,
 replication
 factor of 2.

 Earlier this week I had to do a rolling restart across the cluster, 
 this worked great and the cluster stayed up the whole time. The 
 problem is
 that
 the last node I restarted is now the leader of 0 shards, and is 
 just holding replicas.

 I've noticed this node has abnormally high load average, while the 
 other nodes (who have the same number of shards, but more leaders 
 on
 average)
 are
 fine.

 First, I'm wondering if that loud could be related to being a 5x 
 replica and 0x leader?

 Second, I was wondering if I could somehow flag single shards to
 re-elect a
 leader (or force a 

Re: Is it possible to manually select a shard leader in a running SolrCloud?

2013-02-21 Thread Upayavira
Which of your three hosts did you point this request at?

Upayavira

On Thu, Feb 21, 2013, at 09:13 PM, Vaillancourt, Tim wrote:
 Correction, I used this curl:
 
 curl -v
 'http://HOST:8983/solr/admin/collections?action=CREATEname=testnumShards=3replicationFactor=2maxShardsPerNode=2'
 
 So 3 instances, 3 shards, 2 replicas per shard. ServerA becomes leader of
 all 3 shards in 4.1 with this call.
 
 Tim Vaillancourt
 
 -Original Message-
 From: Vaillancourt, Tim [mailto:tvaillanco...@ea.com] 
 Sent: Thursday, February 21, 2013 11:27 AM
 To: solr-user@lucene.apache.org; markrmil...@gmail.com
 Subject: RE: Is it possible to manually select a shard leader in a
 running SolrCloud?
 
 Thanks Mark,
 
 The real driver for me wanting to promote a different leader is when I
 create a new Collection via the Collections API across a multi-server
 SolrCloud, the leader of each shard is always the same host, so you're
 right that I'm tackling the wrong problem with this request, although it
 would fix it for me.
 
 If I create the cores manually via the cores API, one-by-one, I am able
 to get what I expect, but when running this Collections API call on a 3
 SOLR 4.1 instance, 3 shard setup, 1 server becomes the leader of all 3
 shards, meaning it will get all the writes for everything (correct me if
 I am wrong). If so, this will not scale well with all writes to one node
 (or correct me if I am wrong)?
 
 curl -v
 'http://HOST:8983/solr/admin/collections?action=CREATEname=testnumShards=3replicationFactor=1maxShardsPerNode=2'
 
 Currently on my 3 instance SOLR 4.1 setup, the above call creates the
 following:
 
 - ServerA is the leader of all 3 shards (the problem I want to address).
 - ServerB + ServerC are automagically replicas of the 3 leader shards on
 ServerA.
 
 So again, my issue is one server gets all the writes. Does anyone else
 encounter this? If so, I should spawn a separate thread on my specific
 issue.
 
 Cheers,
 
 Tim
 
 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com]
 Sent: Tuesday, February 19, 2013 8:44 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Is it possible to manually select a shard leader in a
 running SolrCloud?
 
 You can't easily do it the way it's implemented in ZooKeeper. We would
 probably internally have to do the same thing - elect a new leader and
 drop him until the one we wanted came up. The main thing doing it
 internally would gain is that you could skip the elected guy from
 becoming the actual leader and just move on to the next candidate.
 Still some tricky corner cases to deal with and such as well.
 
 I think for most things you would use this to solve, there is probably an
 alternate thing that should be addressed.
 
 - Mark
 
 On Mon, Feb 18, 2013 at 4:15 PM, Vaillancourt, Tim tvaillanco...@ea.com
 wrote:
  Hey all,
 
  I feel having to unload the leader core to force an election is hacky, 
  and as far as I know would still leave which node becomes the Leader to 
  chance, ie I cannot guarantee NodeX becomes Leader 100% in all cases.
 
  Also, this imposes additional load temporarily.
 
  Is there a way to force the winner of the Election, and if not, is there a 
  known feature-request for this?
 
  Cheers,
 
  Tim Vaillancourt
 
  -Original Message-
  From: Joseph Dale [mailto:joey.d...@gmail.com]
  Sent: Sunday, February 03, 2013 7:42 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Is it possible to manually select a shard leader in a running 
  SolrCloud?
 
  With solrclound all cores are collections. The collections API it just a 
  wrapper to call the core api a million times with one command.
 
  to /solr/admin/cores?action=CREATEname=core1collection=core1shard=1
 
  Basically your creating the shard again, after leader props have gone 
  out. Solr will check ZK and find a core meeting that description, then 
  simply get a copy of the index from the leader of that shard.
 
 
  On Feb 3, 2013, at 10:37 AM, Brett Hoerner br...@bretthoerner.com wrote:
 
  What is the inverse I'd use to re-create/load a core on another 
  machine but make sure it's also known to SolrCloud/as a shard?
 
 
  On Sat, Feb 2, 2013 at 4:01 PM, Joseph Dale joey.d...@gmail.com wrote:
 
 
  To be more clear lets say bob it the leader of core 1. On bob do a 
  /admin/cores?action=unloadname=core1. This removes the core/shard 
  from bob, giving the other servers a chance to grab leader props.
 
  -Joey
 
  On Feb 2, 2013, at 11:27 AM, Brett Hoerner br...@bretthoerner.com wrote:
 
  Hi,
 
  I have a 5 server cluster running 1 collection with 20 shards,
  replication
  factor of 2.
 
  Earlier this week I had to do a rolling restart across the cluster, 
  this worked great and the cluster stayed up the whole time. The 
  problem is
  that
  the last node I restarted is now the leader of 0 shards, and is 
  just holding replicas.
 
  I've noticed this node has abnormally high load average, while the 
  other nodes (who have the same number of 

Re: can i install new SOLR 4.1 as slaver(3.3 Master)

2013-02-21 Thread Mingfeng Yang
I cannot give an affirmative answer.  But I am thinking that it would have
potential problem, as the index format in 3.3 and 4.1 are slightly
different.

Why don't you upgrade to 4.1?  The only thing you need to do is
1. install solr 4.1
2.1 copy all related config files from 3.3
2.2 back up the index data folder
3. shutdown solr 3.3
4 start solr 4.1 with solr.data.dir pointing to the old dir




On Thu, Feb 21, 2013 at 10:54 AM, michaelweica m...@hipdigital.com wrote:

 Hi ,

 our SOLR master version is 3.3,  can i install new box SOLR 4.1 as slaver,
 and replication from master data.

 thanks



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/can-i-install-new-SOLR-4-1-as-slaver-3-3-Master-tp4041976.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: can i install new SOLR 4.1 as slaver(3.3 Master)

2013-02-21 Thread michaelweica
thanks

we do have 1 master ,  5 slave servers.  and we use slave as production
server.
we just update master index file when we have new contents

now our index file almost 88G,   the server just 1 core, 8G ram,JVM:
Xmx60964M -Xms1024M
it's easy out of memory

so i plan to deploy new server to install SOLR 4.1,  it easy to keep master
update and just replication to new SOLR,   and i dont know it's same process
to update new content index on SOLR 4.1 as SOLR 3.3.

hope can find best solution for that.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/can-i-install-new-SOLR-4-1-as-slaver-3-3-Master-tp4041976p4042037.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.?

2013-02-21 Thread Chris Hostetter

: Hi everyone, i am new to solr technology and not getting a way to get back
: the original HTML document with Hits highlighted into it. what
: configuration and where i can do to instruct SolrCell/ Tika so that it does
: not strips down the tags of HTML document in the content field.

I _think_ what you want is simply to ensure that you have a content 
field in your schema which is stored=true (and indexed=true if you 
want to serach on it directly) ... and then ExtractingRequestHandler will 
put the entire XHTML it generates from the documents you index into that 
field.

http://wiki.apache.org/solr/ExtractingRequestHandler

If that isn't what you had in mind, then you need to provide us with more 
details about what you've tried, what results you get, and how exactly 
those results differ fro mwhat you want to get.


-Hoss


Re: Document update question

2013-02-21 Thread Shawn Heisey

On 2/21/2013 10:00 AM, Jack Park wrote:

Interesting you should say that.  Here is my solrj code:

public Solr3Client(String solrURL) throws Exception {
server = new HttpSolrServer(solrURL);
//  server.setParser(new XMLResponseParser());
}

I cannot recall why I commented out the setParser line; something
about someone saying in another thread it's not important. I suppose I
should revisit my unit tests with that line uncommented. Or, did I
miss something?

The JSON results I painted earlier were from reading the document
online in the admin query panel.


Jack,

SolrJ defaults to the javabin response parser, which offers maximum 
efficiency in the communication.  Between version 1.4.1 and 3.1.0, the 
javabin version changed and became incompatible with the old one.


The XML parser is a little bit less efficient than javabin, but is the 
only way to get Solr/SolrJ to talk when one side is using a different 
javabin version than the other side.  If you are not mixing 1.x with 
later versions, you do not need to worry about changing the response parser.


Thanks,
Shawn



Re: Solr splitting my words

2013-02-21 Thread Jack Krupansky
The issue may simply be that your indexed data has the mixed case and your 
query has only lower case. So, the suggested change won't affect the query 
itself, but will cause the indexed data to be indexed differently.


-- Jack Krupansky

-Original Message- 
From: scallawa

Sent: Thursday, February 21, 2013 9:59 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr splitting my words

I tried playing with the analyzer before posting and wasn't sure how to
interpret it.
Field type: text
Field value index: womens-mcmurdo-ii-bootsthis is based on the info that
is in the field
Field value query: mcmurdo

results
I only got one match in the index analyzer
org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=0,
catenateNumbers=1}
term position  1 2 3 4
term textwomens mcmurdo ii bootswomensmcmurdoiiboots
term typeword word word word
   word
source start,end 0,6 7,1415,17 18,23
   0,23
payload


Jack,
The field that I am expecting to be indexed is not sending the data in caps.
Which is why I am puzzled.  I am wondering if the indexed data is not coming
from the field I expect.  I will try your change in dev once I get data
generated there.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-splitting-my-words-tp4041913p4041963.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Index optimize takes more than 40 minutes for 18M documents

2013-02-21 Thread Yandong Yao
Thans Walter for info, we will disable optimize then and do more testing.

Regards,
Yandong

2013/2/22 Walter Underwood wun...@wunderwood.org

 That seems fairly fast. We index about 3 million documents in about half
 that time. We are probably limited by the time it takes to get the data
 from MySQL.

 Don't optimize. Solr automatically merges index segments as needed.
 Optimize forces a full merge. You'll probably never notice the difference,
 either in disk space or speed.

 It might make sense to force merge (optimize) if you reindex everything
 once per day and have no updates in between. But even then it may be a
 waste of time.

 You need lots of free disk space for merging, whether a forced merge or
 automatic. Free space equal to the size of the index is usually enough, but
 worst case can need double the size of the index.

 wunder

 On Feb 21, 2013, at 9:20 AM, Yandong Yao wrote:

  Hi Guys,
 
  I am using Solr 4.1 and have indexed 18M documents using solrj
  ConcurrentUpdateSolrServer (each document contains 5 fields, and average
  length is less than 1k).
 
  1) It takes 70 minutes to index those documents without optimize on my
 mac
  10.8, how is the performance, slow, fast or common?
 
  2) It takes about 40 minutes to optimize those documents, following is
 top
  output, and there are lots of FAULTS, what does this means?
 
  Processes: 118 total, 2 running, 8 stuck, 108 sleeping, 719 threads
 
00:56:52
  Load Avg: 1.48, 1.56, 1.73  CPU usage: 6.63% user, 6.40% sys, 86.95% idle
  SharedLibs: 31M resident, 0B data, 6712K linkedit.
  MemRegions: 34734 total, 5801M resident, 39M private, 638M shared.
 PhysMem:
  982M wired, 3600M active, 3567M inactive, 8150M used, 38M free.
  VM: 254G vsize, 1285M framework vsize, 1469887(368) pageins, 1095550(0)
  pageouts.  Networks: packets: 14842595/9661M in, 14777685/9395M out.
  Disks: 820048/43G read, 523814/53G written.
 
  PID   COMMAND  %CPU  TIME #TH  #WQ  #POR #MRE RPRVT  RSHRD  RSIZE
  VPRVT  VSIZE  PGRP PPID STATE   UID  FAULTS   COW  MSGSENT  MSGRECV
 SYSBSD
SYSMACH
  4585  java 11.7  02:52:01 32   1483  342  3866M+ 6724K
  3856M+
  4246M  6908M  4580 4580 sleepin 501  1490340+ 402  3000781+ 231785+
  15044055+ 10033109+
 
  3) If I don't run optimize, what is the impact? bigger disk size or slow
  query performance?
 
  Following is my index config in  solrconfig.xml:
 
  ramBufferSizeMB100/ramBufferSizeMB
  mergeFactor10/mergeFactor
  autoCommit
maxDocs10/maxDocs!-- 100K docs --
maxTime30/maxTime!-- 5 minutes --
openSearcherfalse/openSearcher
  /autoCommit
 
  Thanks very much in advance!
 
  Regards,
  Yandong







RE: Is it possible to manually select a shard leader in a running SolrCloud?

2013-02-21 Thread Vaillancourt, Tim
I sent this request to ServerA in this case, which became the leader of all 
shards. As far as I know you're supposed to issue this call to just one server 
as it issues the calls to the other leaders/replicas in the background, right?

I am expecting the single collections API call to spread the leaders evenly 
across SOLR instances.

Hopefully I am just doing/expecting something wrong :).

Tim Vaillancourt

-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk] 
Sent: Thursday, February 21, 2013 1:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Is it possible to manually select a shard leader in a running 
SolrCloud?

Which of your three hosts did you point this request at?

Upayavira

On Thu, Feb 21, 2013, at 09:13 PM, Vaillancourt, Tim wrote:
 Correction, I used this curl:
 
 curl -v
 'http://HOST:8983/solr/admin/collections?action=CREATEname=testnumShards=3replicationFactor=2maxShardsPerNode=2'
 
 So 3 instances, 3 shards, 2 replicas per shard. ServerA becomes leader 
 of all 3 shards in 4.1 with this call.
 
 Tim Vaillancourt
 
 -Original Message-
 From: Vaillancourt, Tim [mailto:tvaillanco...@ea.com]
 Sent: Thursday, February 21, 2013 11:27 AM
 To: solr-user@lucene.apache.org; markrmil...@gmail.com
 Subject: RE: Is it possible to manually select a shard leader in a 
 running SolrCloud?
 
 Thanks Mark,
 
 The real driver for me wanting to promote a different leader is when I 
 create a new Collection via the Collections API across a multi-server 
 SolrCloud, the leader of each shard is always the same host, so you're 
 right that I'm tackling the wrong problem with this request, although 
 it would fix it for me.
 
 If I create the cores manually via the cores API, one-by-one, I am 
 able to get what I expect, but when running this Collections API call 
 on a 3 SOLR 4.1 instance, 3 shard setup, 1 server becomes the leader 
 of all 3 shards, meaning it will get all the writes for everything 
 (correct me if I am wrong). If so, this will not scale well with all 
 writes to one node (or correct me if I am wrong)?
 
 curl -v
 'http://HOST:8983/solr/admin/collections?action=CREATEname=testnumShards=3replicationFactor=1maxShardsPerNode=2'
 
 Currently on my 3 instance SOLR 4.1 setup, the above call creates the
 following:
 
 - ServerA is the leader of all 3 shards (the problem I want to address).
 - ServerB + ServerC are automagically replicas of the 3 leader shards 
 on ServerA.
 
 So again, my issue is one server gets all the writes. Does anyone else 
 encounter this? If so, I should spawn a separate thread on my specific 
 issue.
 
 Cheers,
 
 Tim
 
 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com]
 Sent: Tuesday, February 19, 2013 8:44 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Is it possible to manually select a shard leader in a 
 running SolrCloud?
 
 You can't easily do it the way it's implemented in ZooKeeper. We would 
 probably internally have to do the same thing - elect a new leader and 
 drop him until the one we wanted came up. The main thing doing it 
 internally would gain is that you could skip the elected guy from 
 becoming the actual leader and just move on to the next candidate.
 Still some tricky corner cases to deal with and such as well.
 
 I think for most things you would use this to solve, there is probably 
 an alternate thing that should be addressed.
 
 - Mark
 
 On Mon, Feb 18, 2013 at 4:15 PM, Vaillancourt, Tim 
 tvaillanco...@ea.com
 wrote:
  Hey all,
 
  I feel having to unload the leader core to force an election is hacky, 
  and as far as I know would still leave which node becomes the Leader to 
  chance, ie I cannot guarantee NodeX becomes Leader 100% in all cases.
 
  Also, this imposes additional load temporarily.
 
  Is there a way to force the winner of the Election, and if not, is there a 
  known feature-request for this?
 
  Cheers,
 
  Tim Vaillancourt
 
  -Original Message-
  From: Joseph Dale [mailto:joey.d...@gmail.com]
  Sent: Sunday, February 03, 2013 7:42 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Is it possible to manually select a shard leader in a running 
  SolrCloud?
 
  With solrclound all cores are collections. The collections API it just a 
  wrapper to call the core api a million times with one command.
 
  to 
  /solr/admin/cores?action=CREATEname=core1collection=core1shard=1
 
  Basically your creating the shard again, after leader props have gone 
  out. Solr will check ZK and find a core meeting that description, then 
  simply get a copy of the index from the leader of that shard.
 
 
  On Feb 3, 2013, at 10:37 AM, Brett Hoerner br...@bretthoerner.com wrote:
 
  What is the inverse I'd use to re-create/load a core on another 
  machine but make sure it's also known to SolrCloud/as a shard?
 
 
  On Sat, Feb 2, 2013 at 4:01 PM, Joseph Dale joey.d...@gmail.com wrote:
 
 
  To be more clear lets say bob it the leader of core 1. On bob do a 
  

Re: Is it possible to manually select a shard leader in a running SolrCloud?

2013-02-21 Thread Mark Miller
The leader doesn't really do a lot more work than any of the replicas, so I 
don't think it's likely that important. If someone starts running into 
problems, that's usually when we start looking for solutions.

- Mark

On Feb 21, 2013, at 10:20 PM, Vaillancourt, Tim tvaillanco...@ea.com wrote:

 I sent this request to ServerA in this case, which became the leader of all 
 shards. As far as I know you're supposed to issue this call to just one 
 server as it issues the calls to the other leaders/replicas in the 
 background, right?
 
 I am expecting the single collections API call to spread the leaders evenly 
 across SOLR instances.
 
 Hopefully I am just doing/expecting something wrong :).
 
 Tim Vaillancourt
 
 -Original Message-
 From: Upayavira [mailto:u...@odoko.co.uk] 
 Sent: Thursday, February 21, 2013 1:44 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Is it possible to manually select a shard leader in a running 
 SolrCloud?
 
 Which of your three hosts did you point this request at?
 
 Upayavira
 
 On Thu, Feb 21, 2013, at 09:13 PM, Vaillancourt, Tim wrote:
 Correction, I used this curl:
 
 curl -v
 'http://HOST:8983/solr/admin/collections?action=CREATEname=testnumShards=3replicationFactor=2maxShardsPerNode=2'
 
 So 3 instances, 3 shards, 2 replicas per shard. ServerA becomes leader 
 of all 3 shards in 4.1 with this call.
 
 Tim Vaillancourt
 
 -Original Message-
 From: Vaillancourt, Tim [mailto:tvaillanco...@ea.com]
 Sent: Thursday, February 21, 2013 11:27 AM
 To: solr-user@lucene.apache.org; markrmil...@gmail.com
 Subject: RE: Is it possible to manually select a shard leader in a 
 running SolrCloud?
 
 Thanks Mark,
 
 The real driver for me wanting to promote a different leader is when I 
 create a new Collection via the Collections API across a multi-server 
 SolrCloud, the leader of each shard is always the same host, so you're 
 right that I'm tackling the wrong problem with this request, although 
 it would fix it for me.
 
 If I create the cores manually via the cores API, one-by-one, I am 
 able to get what I expect, but when running this Collections API call 
 on a 3 SOLR 4.1 instance, 3 shard setup, 1 server becomes the leader 
 of all 3 shards, meaning it will get all the writes for everything 
 (correct me if I am wrong). If so, this will not scale well with all 
 writes to one node (or correct me if I am wrong)?
 
 curl -v
 'http://HOST:8983/solr/admin/collections?action=CREATEname=testnumShards=3replicationFactor=1maxShardsPerNode=2'
 
 Currently on my 3 instance SOLR 4.1 setup, the above call creates the
 following:
 
 - ServerA is the leader of all 3 shards (the problem I want to address).
 - ServerB + ServerC are automagically replicas of the 3 leader shards 
 on ServerA.
 
 So again, my issue is one server gets all the writes. Does anyone else 
 encounter this? If so, I should spawn a separate thread on my specific 
 issue.
 
 Cheers,
 
 Tim
 
 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com]
 Sent: Tuesday, February 19, 2013 8:44 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Is it possible to manually select a shard leader in a 
 running SolrCloud?
 
 You can't easily do it the way it's implemented in ZooKeeper. We would 
 probably internally have to do the same thing - elect a new leader and 
 drop him until the one we wanted came up. The main thing doing it 
 internally would gain is that you could skip the elected guy from 
 becoming the actual leader and just move on to the next candidate.
 Still some tricky corner cases to deal with and such as well.
 
 I think for most things you would use this to solve, there is probably 
 an alternate thing that should be addressed.
 
 - Mark
 
 On Mon, Feb 18, 2013 at 4:15 PM, Vaillancourt, Tim 
 tvaillanco...@ea.com
 wrote:
 Hey all,
 
 I feel having to unload the leader core to force an election is hacky, 
 and as far as I know would still leave which node becomes the Leader to 
 chance, ie I cannot guarantee NodeX becomes Leader 100% in all cases.
 
 Also, this imposes additional load temporarily.
 
 Is there a way to force the winner of the Election, and if not, is there a 
 known feature-request for this?
 
 Cheers,
 
 Tim Vaillancourt
 
 -Original Message-
 From: Joseph Dale [mailto:joey.d...@gmail.com]
 Sent: Sunday, February 03, 2013 7:42 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Is it possible to manually select a shard leader in a running 
 SolrCloud?
 
 With solrclound all cores are collections. The collections API it just a 
 wrapper to call the core api a million times with one command.
 
 to 
 /solr/admin/cores?action=CREATEname=core1collection=core1shard=1
 
 Basically your creating the shard again, after leader props have gone 
 out. Solr will check ZK and find a core meeting that description, then 
 simply get a copy of the index from the leader of that shard.
 
 
 On Feb 3, 2013, at 10:37 AM, Brett Hoerner br...@bretthoerner.com wrote:
 
 What is 

How do I create two collections on the same cluster?

2013-02-21 Thread Shankar Sundararaju
I am using Solr 4.1.

I created collection1 consisting of 2 leaders and 2 replicas (2 shards) at
boot time.

After the cluster is up, I am trying to create collection2 with 2 leaders
and 2 replicas just like collection1. I am using following collections API
for that:

http://localhost:7575/solr/admin/collections?action=CREATEname=collection2numShards=2replicationFactor=2collection.configName=myconfcreateNodeSet=localhost:8983_solr,localhost:7574_solr,localhost:7575_solr,localhost:7576_solr

Yes, collection2 does get created. But I see a problem - createNodeSet
parameter is not being honored. All 4 nodes are not being used to create
collection2, only 3 are being used. Is this a bug or I don't understand how
this parameter should be used?

What is the best way to create collection2? Can I specify both collections
in solr.xml in the solr home dir in all nodes and launch them? Do I have to
get the configs for collection2 uploaded to zookeeper before I launch the
nodes?

Thanks in advance.

-Shankar

-- 
Regards,
*Shankar Sundararaju
*Sr. Software Architect
ebrary, a ProQuest company
410 Cambridge Avenue, Palo Alto, CA 94306 USA
shan...@ebrary.com | www.ebrary.com | 650-475-8776 (w) | 408-426-3057 (c)


Re: How do I create two collections on the same cluster?

2013-02-21 Thread Shawn Heisey

On 2/21/2013 9:50 PM, Shankar Sundararaju wrote:

I am using Solr 4.1.

I created collection1 consisting of 2 leaders and 2 replicas (2 shards) at
boot time.

After the cluster is up, I am trying to create collection2 with 2 leaders
and 2 replicas just like collection1. I am using following collections API
for that:

http://localhost:7575/solr/admin/collections?action=CREATEname=collection2numShards=2replicationFactor=2collection.configName=myconfcreateNodeSet=localhost:8983_solr,localhost:7574_solr,localhost:7575_solr,localhost:7576_solr

Yes, collection2 does get created. But I see a problem - createNodeSet
parameter is not being honored. All 4 nodes are not being used to create
collection2, only 3 are being used. Is this a bug or I don't understand how
this parameter should be used?

What is the best way to create collection2? Can I specify both collections
in solr.xml in the solr home dir in all nodes and launch them? Do I have to
get the configs for collection2 uploaded to zookeeper before I launch the
nodes?


Is your cluster comprised of only those four Solr nodes, or do you have 
others?  If it's just those four, you should not need to tell it which 
ones to use, it should use all of them.  You could try adding 
maxShardsPerNode=1 just to be sure that it won't try to put more than 
one shard on any one node.


I did find an email thread saying that hostnames won't work in 
createNodeSet with Solr 4.1, because 4.1 defaults to IP addresses when 
each node registers with Zookeeper.  Check your SolrCloud graph in the 
admin UI.  If you see IP addresses there, you will probably have to use 
IP addresses in the createNodeSet parameter.  You can force hostnames by 
including host=myhostname in the cores parameter of solr.xml and 
restarting Solr on that node.


I'm relatively new to SolrCloud, but I'm learning.

Thanks,
Shawn



solr 4 fragmentsBuilder and highlightMultiTerm

2013-02-21 Thread cmd.ares
how to config the solrconfig.xml to open fragmentsBuilder and
highlightMultiTerm on 4.0 and 4.1
i read the documnet on wiki 

fragmentsBuilder name=colored
class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder
  lst name=defaults
str name=hl.tag.pre/str
str name=hl.tag.post/str
  /lst
/fragmentsBuilder

but i don't know where the snippet should be placed. and how to call by url
path
thanks





--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-4-fragmentsBuilder-and-highlightMultiTerm-tp4042128.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH deleting documents

2013-02-21 Thread cveres
I should also add that some of the books don't have chapters, so the query
won't succeed for these books.
But in this case I expected that the document won't be added at all ..
rather than first added then deleted (which I am now suspecting is the
case).
It would be very helpful if I could see a list of deleted documents! I was
trying to look in the terminal window (Jetty) but that did not help. I don't
know where else Solr might put logs. I looked in /var/log.. but did not find
anything useful looking.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-deleting-documents-tp4041811p4042149.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr as local service for .NET desktop app

2013-02-21 Thread Knacktus
I need some advanced search features for a desktop application. The
application is a .NET (C#) application, so I can't use Lucene and as I'm
not sure about the future of Lucene.NET I consider using Solr (with
SolrNET).

As I need a cache for the desktop app anyway it seems to be a good
opportunity to solve two problems at once. Also, we will use Solr on the
server, so we need to build know-how anyway.

I currently make my way through the Solr 3 book (from Packt), so I'm a
newbee to Solr.

Has anyone experience with it? Are there some pitfalls I should be aware
of? Deployment will be a challenge I guess. How about configuration? Can I
leave Solr on a Clients machine with reasonable default settings?

Many thanks,

Jan


Re: get content is put in the index queue but is not committed

2013-02-21 Thread Miguel

Thanks Cris

I'm going to see both UpdateLog and RealTimeGetComponent classes, 
but I not sure if I could use them because I'm working with apache solr 
version 1.4.1, (I know is older).
Anyway I'll tell you my problem. I am developing a custom class extend 
from UpdateRequestProcessorFactory. This class must save in database all 
modifications from Solr server (Add, Update and Delete actions), but  
save in database must happen, always, when commit event was done.
My problem is, clients of solr server do explicit commit, so I receive 
first update event and after commit event and in this last I have to 
recovered docs from update event, and I wanted to know if it was possible.


At the least, I am going to go another way and I will use a status field 
in database. Status field allow save docs in database at update event 
and my other process do not use them until I change value of status 
field on commit event.


thanks very much
I am learning much Solr in this list

El 21/02/2013 19:34, Chris Hostetter escribió:

:  Anybody know how-to get content is put in the index queue but is not
: committed?

i'm guessing you are refering to uncommited documents in the transaction
log?  Take a look at the UpdateLog class, and how it's used by the
RealTimeGetComponent.

If you provide more details as to what you end goal is, we might be able
to provide more specific (or alternative) suggestions on how to achieve
your goal...


https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an XY Problem ... that is: you are dealing
with X, you are assuming Y will help you, and you are asking about Y
without giving more details about the X so that we can understand the
full issue.  Perhaps the best solution doesn't involve Y at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341


-Hoss