Searching on multiple cores using MultiSearcher

2013-08-05 Thread Zhang, Lisheng
Hi,
 
At lucene level we have MultiSearcher to search a few cores at the same time 
with same query,
at solr level can we perform such search (if using same config/schema)? Here I 
donot mean to
search across shards of the same collection but independent collections?
 
Thanks very much for helps, Lisheng


Solr 4.3 open a lot more files than solr 3.6

2013-07-18 Thread Zhang, Lisheng
Hi,
 
After upgrading solr from 3.6 to 4.3, we found that solr opened a lot more 
files compared to
solr 3.6 (when core is open). Since we have many cores (more than 2K and still 
grow), we 
would like to reduce the number of open files.
 
We already used shareSchema and sharedLib, we also shared SolrConfig across all 
cores,
we also commented out autoSoftCommit in solrconfig.xml.
 
In solr 3.6, it seems that indexWriter is opened only if indexing request comes 
and immediately 
closed after request is done, but in solr 4.3, IndexWriter kept open, is there 
an easy way to
go back to 3.6 behavior (we donot need to use Near RealTime Search), can we 
change code
to disable keeping IndexWriter open (if no better way)?
 
Any guidance to reduce open files would be very helpful?
 
Thanks very much for helps, Lisheng


Usage of "luceneMatchVersion" when upgrading from solr 3.6 to solr 4.3

2013-07-16 Thread Zhang, Lisheng
Hi,
 
We are upgrading solr from 3.6 to 4.3, but we have a large amount of indexed 
data and could not
afford to to reindex all once.
 
We wish solr 4.3 could do the following:
 
1/ still able to search on solr 3.6 indexed data
2/ whenever indexing new document, convert to 4.3 format (may not happen all 
once)
 
In this case, should we use LUCENE_36 or LUCENE_43 for luceneMatchVersion (it 
is suggested
that we should reindex all data if using LUCENE_43, so I think we should use 
LUCENE_36, since
we cannot reindex all once, true)?
 
Thanks very much for helps, Lisheng
 
 


RE: solr 4.3.0 cloud in Tomcat, link many collections to Zookeeper

2013-07-12 Thread Zhang, Lisheng
Thanks very much for all the helps!

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org]
Sent: Friday, July 12, 2013 7:31 AM
To: solr-user@lucene.apache.org
Subject: Re: solr 4.3.0 cloud in Tomcat, link many collections to
Zookeeper


On 7/12/2013 7:29 AM, Zhang, Lisheng wrote:
> Sorry I might not have asked clearly, our issue is that we have 
> a few thousand collections (can be much more), so running that 
> command is rather tedius, is there a simpler way (all collections 
> share same schema/config)?

When you create each collection with the Collections API (http calls),
you tell it the name of a config set stored in zookeeper.  You can give
all your collections the same config set if you like.

If you manually create collections with the CoreAdmin API instead, you
must use the zkcli script included in Solr to link the collection to the
config set, which can be done either before or after the collection is
created.  The zkcli script provides some automation for the java command
that you were given by Furkan.

Thanks,
Shawn



RE: solr 4.3.0 cloud in Tomcat, link many collections to Zookeeper

2013-07-12 Thread Zhang, Lisheng
Sorry I might not have asked clearly, our issue is that we have 
a few thousand collections (can be much more), so running that 
command is rather tedius, is there a simpler way (all collections 
share same schema/config)?

Thanks very much for helps, Lisheng

-Original Message-
From: Furkan KAMACI [mailto:furkankam...@gmail.com]
Sent: Friday, July 12, 2013 1:17 AM
To: solr-user@lucene.apache.org
Subject: Re: solr 4.3.0 cloud in Tomcat, link many collections to
Zookeeper


If you have one collection you just need to define hostnames of Zookeeper
ensembles and run that command once.


2013/7/11 Zhang, Lisheng 

> Hi,
>
> We are testing solr 4.3.0 in Tomcat (considering upgrading solr 3.6.1 to
> 4.3.0), in WIKI page
> for solrCloud in Tomcat:
>
> http://wiki.apache.org/solr/SolrCloudTomcat
>
> we need to link each collection explicitly:
>
> ///
> 8) Link uploaded config with target collection
> java -classpath .:/home/myuser/solr-war-lib/* org.apache.solr.cloud.ZkCLI
> -cmd linkconfig -collection mycollection -confname ...
> ///
>
> But our application has many cores (a few thousands which all share same
> schema/config,
> is there a moe convenient way ?
>
> Thanks very much for helps, Lisheng
>


RE: What happens in indexing request in solr cloud if Zookeepers are all dead?

2013-07-12 Thread Zhang, Lisheng
Thanks very much for your clear explanation!

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Thursday, July 11, 2013 1:55 PM
To: solr-user@lucene.apache.org
Subject: Re: What happens in indexing request in solr cloud if
Zookeepers are all dead?


Sorry, no updates if no Zookeepers. There would be no way to assure that any 
node knows the proper configuration. Queries are a little safer using most 
recent configuration without zookeeper, but update consistency requires 
accurate configuration information.

-- Jack Krupansky

-Original Message- 
From: Zhang, Lisheng
Sent: Thursday, July 11, 2013 2:59 PM
To: solr-user@lucene.apache.org
Subject: RE: What happens in indexing request in solr cloud if Zookeepers 
are all dead?

Yes, I should not have used word master/slave for solr cloud!

So if all Zookeepers are dead, could indexing requests be
handled properly (could solr remember the setting for indexing)?

Thanks very much for helps, Lisheng

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Thursday, July 11, 2013 10:46 AM
To: solr-user@lucene.apache.org
Subject: Re: What happens in indexing request in solr cloud if
Zookeepers are all dead?


There are no masters or slaves in SolrCloud - it is fully distributed and
"master-free". Leaders are temporary and can vary over time.

The basic idea for quorum is to prevent "split brain" - two (or more)
distinct sets of nodes (zookeeper nodes, that is) each thinking they
constitute the authoritative source for access to configuration information.
The trick is to require (N/2)+1 nodes for quorum. For n=3, quorum would be
(3/2)+1 = 1+1 = 2, so one node can be down. For n=1, quorum = (1/2)+1 = 0 +
1 = 1. For n=2, quorum would be (2/2)+1 = 1 + 1 = 2, so no nodes can be
down. IOW, for n=2 no nodes can be down for the cluster to do updates.

-- Jack Krupansky

-Original Message- 
From: Zhang, Lisheng
Sent: Thursday, July 11, 2013 9:28 AM
To: solr-user@lucene.apache.org
Subject: What happens in indexing request in solr cloud if Zookeepers are
all dead?

Hi,

In solr cloud latest doc, it mentioned that if all Zookeepers are dead,
distributed
query still works because solr remembers the cluster state.

How about the indexing request handling if all Zookeepers are dead, does
solr
needs Zookeeper to know which box is master and which is slave for indexing
to
work? Could solr remember master/slave relations without Zookeeper?

Also doc said Zookeeper quorum needs to have a majority rule so that we must
have 3 Zookeepers to handle the case one instance is crashed, what would
happen if we have two instances in quorum and one instance is crashed (or
quorum
having 3 instances but two of them are crashed)? I felt the last one should
take
over?

Thanks very much for helps, Lisheng




RE: What happens in indexing request in solr cloud if Zookeepers are all dead?

2013-07-11 Thread Zhang, Lisheng
Yes, I should not have used word master/slave for solr cloud!

So if all Zookeepers are dead, could indexing requests be
handled properly (could solr remember the setting for indexing)?

Thanks very much for helps, Lisheng

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Thursday, July 11, 2013 10:46 AM
To: solr-user@lucene.apache.org
Subject: Re: What happens in indexing request in solr cloud if
Zookeepers are all dead?


There are no masters or slaves in SolrCloud - it is fully distributed and 
"master-free". Leaders are temporary and can vary over time.

The basic idea for quorum is to prevent "split brain" - two (or more) 
distinct sets of nodes (zookeeper nodes, that is) each thinking they 
constitute the authoritative source for access to configuration information. 
The trick is to require (N/2)+1 nodes for quorum. For n=3, quorum would be 
(3/2)+1 = 1+1 = 2, so one node can be down. For n=1, quorum = (1/2)+1 = 0 + 
1 = 1. For n=2, quorum would be (2/2)+1 = 1 + 1 = 2, so no nodes can be 
down. IOW, for n=2 no nodes can be down for the cluster to do updates.

-- Jack Krupansky

-Original Message- 
From: Zhang, Lisheng
Sent: Thursday, July 11, 2013 9:28 AM
To: solr-user@lucene.apache.org
Subject: What happens in indexing request in solr cloud if Zookeepers are 
all dead?

Hi,

In solr cloud latest doc, it mentioned that if all Zookeepers are dead, 
distributed
query still works because solr remembers the cluster state.

How about the indexing request handling if all Zookeepers are dead, does 
solr
needs Zookeeper to know which box is master and which is slave for indexing 
to
work? Could solr remember master/slave relations without Zookeeper?

Also doc said Zookeeper quorum needs to have a majority rule so that we must
have 3 Zookeepers to handle the case one instance is crashed, what would
happen if we have two instances in quorum and one instance is crashed (or 
quorum
having 3 instances but two of them are crashed)? I felt the last one should 
take
over?

Thanks very much for helps, Lisheng




Solr 4.3.0 memory usage is higher than solr 3.6.1?

2013-07-11 Thread Zhang, Lisheng
Hi,
 
We are testing solr 4.3.0 in Tomcat (considering upgrading solr 3.6.1 to 
4.3.0), we have
many cores (a few thousands).
 
We have noticed solr 4.3.0 memory usage is much higher than solr 3.6.1 (without 
using
solr cloud yet). With 2K cores, solr 3.6.1 is using 1.5G, but solr 4.3.0 is 
using close to
3G memory, when Tomcat is initially started.
 
We used shareSchema and sharedLib, we also disabled searcher warm-up during 
startup.
 
We are still debugging the issue, we would appreciate if you could provide any 
guidance?
 
Thanks very much for helps, Lisheng
 
 


What happens in indexing request in solr cloud if Zookeepers are all dead?

2013-07-11 Thread Zhang, Lisheng
Hi,
 
In solr cloud latest doc, it mentioned that if all Zookeepers are dead, 
distributed
query still works because solr remembers the cluster state.
 
How about the indexing request handling if all Zookeepers are dead, does solr
needs Zookeeper to know which box is master and which is slave for indexing to
work? Could solr remember master/slave relations without Zookeeper?
 
Also doc said Zookeeper quorum needs to have a majority rule so that we must
have 3 Zookeepers to handle the case one instance is crashed, what would 
happen if we have two instances in quorum and one instance is crashed (or quorum
having 3 instances but two of them are crashed)? I felt the last one should take
over?
 
Thanks very much for helps, Lisheng
 
 


solr 4.3.0 cloud in Tomcat, link many collections to Zookeeper

2013-07-11 Thread Zhang, Lisheng
Hi,
 
We are testing solr 4.3.0 in Tomcat (considering upgrading solr 3.6.1 to 
4.3.0), in WIKI page
for solrCloud in Tomcat:
 
http://wiki.apache.org/solr/SolrCloudTomcat
 
we need to link each collection explicitly:
 
///
8) Link uploaded config with target collection
java -classpath .:/home/myuser/solr-war-lib/* org.apache.solr.cloud.ZkCLI -cmd 
linkconfig -collection mycollection -confname ...
///
 
But our application has many cores (a few thousands which all share same 
schema/config,
is there a moe convenient way ?
 
Thanks very much for helps, Lisheng


RE: solr 4.3: write.lock is not removed

2013-05-30 Thread Zhang, Lisheng
I did more test and it seems that this is still a bug (previous issue 3/):

1/ Create a core by CURL command with dataDir=, core is created OK
   and later indexing worked OK also.

2/ But in solr.xml, dadaDir is not defined in element "
dataDir=/data/new_collection_name

In solr 3.6.1 we donot need to define schema/config because conf folder is not 
inside each
collection. 

1/ Indexing works OK but write.lock is not removed (we use 
"/update?commit=true..")
2/ Shutdown tomcat, I saw write.lock is gone
3/ Restart Tomcat, indexed data was created at the instanceDir/data level, with 
some warning
   messages. It seems that in solr.xml, dataDir is not defined?

Thanks very much for helps, Lisheng




-Original Message-----
From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com]
Sent: Thursday, May 30, 2013 10:57 AM
To: solr-user@lucene.apache.org
Subject: RE: solr 4.3: write.lock is not removed


Hi,

We just use CURL from PHP code to submit indexing request, like: 

/update?commit=true..

This worked well in solr 3.6.1. I saw the link you showed and really appreciate
(if no other choice I will change java source code but hope there is a better 
way..)?

Thanks very much for helps, Lisheng

-Original Message-
From: bbarani [mailto:bbar...@gmail.com]
Sent: Thursday, May 30, 2013 9:45 AM
To: solr-user@lucene.apache.org
Subject: Re: solr 4.3: write.lock is not removed


How are you indexing the documents? Are you using indexing program?

The below post discusses the same issue..

http://lucene.472066.n3.nabble.com/removing-write-lock-file-in-solr-after-indexing-td3699356.html



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-4-3-write-lock-is-not-removed-tp4066908p4067101.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: solr starting time takes too long

2013-05-30 Thread Zhang, Lisheng
Hi Eric,

Thanks very much for helps (I should have responded sooner):

1/ My problem in 3.6 turned out to be much related to the fact I did not share 
schema,
   after using shareSchema, the start time is reduced up to 80% (to my great 
surprise,
   previously I thought burden is most in solrconfig).
   
2/ I just upgraded to solr 4.3, but somehow I did not see all the fixes 
mentioned in
   the WIKI (like shareConfig), I saw the resolution is "Won't fix", do you 
have plan
   to put the fix into next release?

Thanks and best regards, Lisheng

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Wednesday, May 22, 2013 4:57 AM
To: solr-user@lucene.apache.org
Subject: Re: solr starting time takes too long


Zhang:

In 3.6, there's really no choice except to load all the cores on
startup. 10 minutes still seems excessive, do you perhaps have a
heavy-weight firstSearcher query?

Yes, soft commits are 4.x only, so that's not your problem.

There's a shareSchema option that tries to only load 1 copy of the
schema that should help, but that doesn't help with loading
solrconfig.xml.

Also in the 4.3+ world there's the option to lazily-load cores, see:
http://wiki.apache.org/solr/LotsOfCores for the overview. Perhaps not
an option, but I thought I'd mention it.

But I'm afraid you're stuck. You might be able to run bigger hardware
(perhaps you're memory-starved). Other than that, you may need to use
more than one machine to get fast enough startup times.

Best,
Erick

On Wed, May 22, 2013 at 3:27 AM, Zhang, Lisheng
 wrote:
> Thanks very much for quick helps! I searched but it seems that
> autoSoftCommit is solr 4x feature and we are still using 3.6.1?
>
> Best regards, Lisheng
>
> -Original Message-
> From: Carlos Bonilla [mailto:carlosbonill...@gmail.com]
> Sent: Wednesday, May 22, 2013 12:17 AM
> To: solr-user@lucene.apache.org
> Subject: Re: solr starting time takes too long
>
>
> Hi Lisheng,
> I had the same problem when I enabled the "autoSoftCommit" in
> solrconfig.xml. If you have it enabled, disabling it could fix your problem,
>
> Cheers.
> Carlos.
>
>
> 2013/5/22 Zhang, Lisheng 
>
>>
>> Hi,
>>
>> We are using solr 3.6.1, our application has many cores (more than 1K),
>> the problem is that solr starting took a long time (>10m). Examing log
>> file and code we found that for each core we loaded many resources, but
>> in our app, we are sure we are always using the same solrconfig.xml and
>> schema.xml for all cores. While we can config schema.xml to be shared,
>> we cannot share SolrConfig object. But looking inside SolrConfig code,
>> we donot use any of the cache.
>>
>> Could we somehow change config (or source code) to share resource between
>> cores to reduce solr starting time?
>>
>> Thanks very much for helps, Lisheng
>>


RE: solr 4.3: write.lock is not removed

2013-05-30 Thread Zhang, Lisheng
Hi,

Thanks very much for the explanation! Could we config to get to old behavior?
I asked this option because our app has many small cores so that we prefer 
create/close writer on the fly (otherwise we may have memory issue quickly).

We also do not need NRT for now.

Thanks very much for helps, Lisheng

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: Thursday, May 30, 2013 11:35 AM
To: solr-user@lucene.apache.org
Subject: Re: solr 4.3: write.lock is not removed



: I recently upgraded solr from 3.6.1 to 4.3, it works well, but I noticed that 
after finishing
: indexing 
:  
: write.lock
:  
: is NOT removed. Later if I index again it still works OK. Only after I 
shutdown Tomcat 
: then write.lock is removed. This behavior caused some problem like I could 
not use luke
: to observe indexed data.

IIRC, This was an intentional change.  In older versions of Solr the 
IndexWRiter was only opened if/when updates needed to be made, but that 
made it impossible to safely take advantage of some internal optimizations 
related to NRT IndexReader reloading, so the logic was modified to always 
keep the IndexWriter open as lon as the SolrCore is loaded.

In general, your past behavior of pointing luke at a live solr index could 
have also produced problems if updates came into solr while luke had the 
write lock active.


-Hoss


RE: solr 4.3: write.lock is not removed

2013-05-30 Thread Zhang, Lisheng
I did more tests and get more info: the basic setting is that we created core 
from PHP CURl
API where we define:

schema
config
instanceDir=
dataDir=/data/new_collection_name

In solr 3.6.1 we donot need to define schema/config because conf folder is not 
inside each
collection. 

1/ Indexing works OK but write.lock is not removed (we use 
"/update?commit=true..")
2/ Shutdown tomcat, I saw write.lock is gone
3/ Restart Tomcat, indexed data was created at the instanceDir/data level, with 
some warning
   messages. It seems that in solr.xml, dataDir is not defined?

Thanks very much for helps, Lisheng




-Original Message-----
From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com]
Sent: Thursday, May 30, 2013 10:57 AM
To: solr-user@lucene.apache.org
Subject: RE: solr 4.3: write.lock is not removed


Hi,

We just use CURL from PHP code to submit indexing request, like: 

/update?commit=true..

This worked well in solr 3.6.1. I saw the link you showed and really appreciate
(if no other choice I will change java source code but hope there is a better 
way..)?

Thanks very much for helps, Lisheng

-Original Message-
From: bbarani [mailto:bbar...@gmail.com]
Sent: Thursday, May 30, 2013 9:45 AM
To: solr-user@lucene.apache.org
Subject: Re: solr 4.3: write.lock is not removed


How are you indexing the documents? Are you using indexing program?

The below post discusses the same issue..

http://lucene.472066.n3.nabble.com/removing-write-lock-file-in-solr-after-indexing-td3699356.html



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-4-3-write-lock-is-not-removed-tp4066908p4067101.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: solr 4.3: write.lock is not removed

2013-05-30 Thread Zhang, Lisheng
Hi,

We just use CURL from PHP code to submit indexing request, like: 

/update?commit=true..

This worked well in solr 3.6.1. I saw the link you showed and really appreciate
(if no other choice I will change java source code but hope there is a better 
way..)?

Thanks very much for helps, Lisheng

-Original Message-
From: bbarani [mailto:bbar...@gmail.com]
Sent: Thursday, May 30, 2013 9:45 AM
To: solr-user@lucene.apache.org
Subject: Re: solr 4.3: write.lock is not removed


How are you indexing the documents? Are you using indexing program?

The below post discusses the same issue..

http://lucene.472066.n3.nabble.com/removing-write-lock-file-in-solr-after-indexing-td3699356.html



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-4-3-write-lock-is-not-removed-tp4066908p4067101.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr 4.3: write.lock is not removed

2013-05-29 Thread Zhang, Lisheng
Hi,
 
I recently upgraded solr from 3.6.1 to 4.3, it works well, but I noticed that 
after finishing
indexing 
 
write.lock
 
is NOT removed. Later if I index again it still works OK. Only after I shutdown 
Tomcat 
then write.lock is removed. This behavior caused some problem like I could not 
use luke
to observe indexed data.
 
I did not see any error/warning messages.
 
Is this the designed behavior? Can I have the old behavior (after commit 
write.lock is
removed) through configuration?
 
Thanks very much for helps, Lisheng


RE: solr starting time takes too long

2013-05-22 Thread Zhang, Lisheng
Very sorry about hijacking existing thread (I thought it would be OK
if I just change the title and content, but still wrong).

It will never happen again.

Lisheng

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: Wednesday, May 22, 2013 11:58 AM
To: solr-user@lucene.apache.org
Subject: Re: solr starting time takes too long



: Subject: solr starting time takes too long
: In-Reply-To: <519c6cd6.90...@smartbit.be>
: Thread-Topic: shard splitting

https://people.apache.org/~hossman/#threadhijack


-Hoss


RE: solr starting time takes too long

2013-05-22 Thread Zhang, Lisheng
Thanks very much for quick helps! I searched but it seems that
autoSoftCommit is solr 4x feature and we are still using 3.6.1?

Best regards, Lisheng

-Original Message-
From: Carlos Bonilla [mailto:carlosbonill...@gmail.com]
Sent: Wednesday, May 22, 2013 12:17 AM
To: solr-user@lucene.apache.org
Subject: Re: solr starting time takes too long


Hi Lisheng,
I had the same problem when I enabled the "autoSoftCommit" in
solrconfig.xml. If you have it enabled, disabling it could fix your problem,

Cheers.
Carlos.


2013/5/22 Zhang, Lisheng 

>
> Hi,
>
> We are using solr 3.6.1, our application has many cores (more than 1K),
> the problem is that solr starting took a long time (>10m). Examing log
> file and code we found that for each core we loaded many resources, but
> in our app, we are sure we are always using the same solrconfig.xml and
> schema.xml for all cores. While we can config schema.xml to be shared,
> we cannot share SolrConfig object. But looking inside SolrConfig code,
> we donot use any of the cache.
>
> Could we somehow change config (or source code) to share resource between
> cores to reduce solr starting time?
>
> Thanks very much for helps, Lisheng
>


solr starting time takes too long

2013-05-22 Thread Zhang, Lisheng

Hi,

We are using solr 3.6.1, our application has many cores (more than 1K),
the problem is that solr starting took a long time (>10m). Examing log 
file and code we found that for each core we loaded many resources, but 
in our app, we are sure we are always using the same solrconfig.xml and
schema.xml for all cores. While we can config schema.xml to be shared, 
we cannot share SolrConfig object. But looking inside SolrConfig code, 
we donot use any of the cache. 

Could we somehow change config (or source code) to share resource between
cores to reduce solr starting time?

Thanks very much for helps, Lisheng


RE: SolrCloud leader to replica

2013-04-12 Thread Zhang, Lisheng
Hi Otis and Timothy,

Thanks very much for helps, sure I will test to make sure. What I
mentioned before is a mere possibility, likely you are correct:
the small delay may not matter in reality (yes we do use the same
way to do pagination and no isse ever happened even once).

Surely solr is enormously valuable to us and we really appreciate 
your helps!

Lisheng


-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
Sent: Thursday, April 11, 2013 5:27 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud leader to replica


Hi,

I think Timothy is right about what Lisheng is really after, which is
consistency.

I agree with what Timothy is implying here - changes of search being
inconsistent are very, very small.  I'm guessing Lisheng is trying to
solve a problem he doesn't actually have yet?  Also, think about a
non-SolrCloud solution.  What happens when a user pages through
results?  Typically that just re-runs the same query, but with a
different page offset.  What happens if between page 1 and page 2 the
index changes and a searcher is reopened?  Same sort of problem can
happen, right?  Yet, in a few hundred client engagements involving
Solr or ElasticSearch I don't recall this ever being an issue.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Thu, Apr 11, 2013 at 8:13 PM, Timothy Potter  wrote:
> Hmmm ... I was following this discussion but then got confused when Lisheng
> said to change Solr to "compromise consistency in order to increase
> availability" when your concern is "how long replica is behind leader".
> Seems you want more consistency vs. less in this case? One of the reasons
> behind Solr's leader election approach is to achieve low-latency eventual
> consistency (Mark's term from the linked to discussion).
>
> Un-committed docs are only visible if you use real-time get, in which case
> the request is served by the shard leader (or replica) from its update log.
> I suppose there's a chance of a few millis between the leader having the
> request in its tlog and the replica having the doc it its tlog but that
> seems like the nature of the beast. Meaning that Solr never promised to be
> 100% consistent at millisecond granularity in a distributed model - any
> small time-window between what a leader has and replica are probably
> network latency which you should solve outside of Solr. I suspect you could
> direct all your real-time get requests to leaders only using some smart
> client like CloudSolrServer if it mattered that much.
>
> Otherwise, all other queries require the document to be committed to be
> visible. I suppose there is a very small window when a new searcher is open
> on the leader and the new searcher is not yet open on the replica. However,
> with soft-commits, that too seems like a milli or two based on network
> latency.
>
> @Shawn - yes, I've actually seen this work in my cluster. We lose replicas
> from time-to-time and indexing keeps on trucking.
>
>
>
>
>
> On Thu, Apr 11, 2013 at 4:51 PM, Zhang, Lisheng <
> lisheng.zh...@broadvision.com> wrote:
>
>> Hi Otis,
>>
>> Thanks very much for helps, your explanation is very clear.
>>
>> My main concern is not the return status for indexing calls (although
>> which is
>> also important), my main concern is how long replica is behind the leader
>> (or
>> putting in your way, how consistent search picture is to client A and B).
>>
>> Our application requires clients see same result whether he hits leader or
>> replica, so it seems we do have a problem here. If no better solution I may
>> consider to change solr4 a little (I have not read solr4x fully yet) to
>> compromise
>> consistency (C) in order to increase availability (A), on a high level do
>> you see
>> serious problems in this approach (I am familiar with lucene/solr code to
>> some
>> extent)?
>>
>> Thanks and best regards, Lisheng
>>
>> -Original Message-
>> From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
>> Sent: Thursday, April 11, 2013 2:50 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: SolrCloud leader to replica
>>
>>
>> But note that I misspoke, which I realized after re-reading the thread
>> I pointed you to.  Mark explains it nicely there:
>> * the index call returns only when (and IF!) indexing to all replicas
>> succeeds
>>
>> BUT, that should not be mixed with what search clients see!
>> Just because the indexing client sees the all or nothing situation
>> depending on whether indexing was successful on all replicas does NOT
>> mean that search clients will always see a 100% consis

RE: SolrCloud leader to replica

2013-04-11 Thread Zhang, Lisheng
Hi Otis,

Thanks very much for helps, your explanation is very clear.

My main concern is not the return status for indexing calls (although which is 
also important), my main concern is how long replica is behind the leader (or
putting in your way, how consistent search picture is to client A and B).

Our application requires clients see same result whether he hits leader or 
replica, so it seems we do have a problem here. If no better solution I may
consider to change solr4 a little (I have not read solr4x fully yet) to 
compromise
consistency (C) in order to increase availability (A), on a high level do you 
see
serious problems in this approach (I am familiar with lucene/solr code to some 
extent)?

Thanks and best regards, Lisheng

-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
Sent: Thursday, April 11, 2013 2:50 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud leader to replica


But note that I misspoke, which I realized after re-reading the thread
I pointed you to.  Mark explains it nicely there:
* the index call returns only when (and IF!) indexing to all replicas succeeds

BUT, that should not be mixed with what search clients see!
Just because the indexing client sees the all or nothing situation
depending on whether indexing was successful on all replicas does NOT
mean that search clients will always see a 100% consistent picture.
Client A could hit the leader and see a newly indexed document, while
client B could query the replica and not see that same document simply
because the doc hasn't gotten there yet, or because soft commit hasn't
happened just yet.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Thu, Apr 11, 2013 at 4:39 PM, Zhang, Lisheng
 wrote:
> Thanks very much for your helps!
>
> -Original Message-
> From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
> Sent: Thursday, April 11, 2013 1:23 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SolrCloud leader to replica
>
>
> Yes, I *think* that is the case.  Some distributed systems have the
> option to return success to caller only after data has been
> added/indexed to N other nodes, but I think Solr doesn't have this
> yet.  Somebody please correct me if I'm wrong.
>
> See: http://search-lucene.com/?q=eventually+consistent&fc_project=Solr
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Thu, Apr 11, 2013 at 12:51 PM, Zhang, Lisheng
>  wrote:
>> Hi Otis,
>>
>> Thanks very much for the quick help! We are considering to upgrade
>> from solr 3.6 to 4x and use solrCloud, but we are concerned about
>> performance related to replica? In this scenario it seems that the
>> replica would be a few seconds beyond leader because replica would
>> start indexing only afer leader finishes his?
>>
>> Thanks and best regards, Lisheng
>>
>> -Original Message-
>> From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
>> Sent: Thursday, April 11, 2013 8:11 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: SolrCloud leader to replica
>>
>>
>> I believe it indexes locally on leader first.  Otherwise one could end
>> up with a situation where indexing to replica(s) succeeds and indexing
>> to leader fails, which I suspect might create a mess.
>>
>> Otis
>> --
>> Solr & ElasticSearch Support
>> http://sematext.com/
>>
>>
>>
>>
>>
>> On Thu, Apr 11, 2013 at 2:53 AM, Zhang, Lisheng
>>  wrote:
>>> Hi,
>>>
>>> In solr 4x solrCloud, suppose we have only one shard and
>>> two replica, when leader receives the indexing request,
>>> does it immediately forward request to two replicas or
>>> it first indexes request itself, then sends request to its
>>> two replica?
>>>
>>> Thanks very much for helps, Lisheng
>>>
>>>


RE: SolrCloud leader to replica

2013-04-11 Thread Zhang, Lisheng
Thanks very much for your helps!

-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
Sent: Thursday, April 11, 2013 1:23 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud leader to replica


Yes, I *think* that is the case.  Some distributed systems have the
option to return success to caller only after data has been
added/indexed to N other nodes, but I think Solr doesn't have this
yet.  Somebody please correct me if I'm wrong.

See: http://search-lucene.com/?q=eventually+consistent&fc_project=Solr

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Thu, Apr 11, 2013 at 12:51 PM, Zhang, Lisheng
 wrote:
> Hi Otis,
>
> Thanks very much for the quick help! We are considering to upgrade
> from solr 3.6 to 4x and use solrCloud, but we are concerned about
> performance related to replica? In this scenario it seems that the
> replica would be a few seconds beyond leader because replica would
> start indexing only afer leader finishes his?
>
> Thanks and best regards, Lisheng
>
> -Original Message-
> From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
> Sent: Thursday, April 11, 2013 8:11 AM
> To: solr-user@lucene.apache.org
> Subject: Re: SolrCloud leader to replica
>
>
> I believe it indexes locally on leader first.  Otherwise one could end
> up with a situation where indexing to replica(s) succeeds and indexing
> to leader fails, which I suspect might create a mess.
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Thu, Apr 11, 2013 at 2:53 AM, Zhang, Lisheng
>  wrote:
>> Hi,
>>
>> In solr 4x solrCloud, suppose we have only one shard and
>> two replica, when leader receives the indexing request,
>> does it immediately forward request to two replicas or
>> it first indexes request itself, then sends request to its
>> two replica?
>>
>> Thanks very much for helps, Lisheng
>>
>>


RE: SolrCloud leader to replica

2013-04-11 Thread Zhang, Lisheng
Hi Otis,

Thanks very much for the quick help! We are considering to upgrade
from solr 3.6 to 4x and use solrCloud, but we are concerned about
performance related to replica? In this scenario it seems that the
replica would be a few seconds beyond leader because replica would
start indexing only afer leader finishes his?

Thanks and best regards, Lisheng

-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
Sent: Thursday, April 11, 2013 8:11 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud leader to replica


I believe it indexes locally on leader first.  Otherwise one could end
up with a situation where indexing to replica(s) succeeds and indexing
to leader fails, which I suspect might create a mess.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Thu, Apr 11, 2013 at 2:53 AM, Zhang, Lisheng
 wrote:
> Hi,
>
> In solr 4x solrCloud, suppose we have only one shard and
> two replica, when leader receives the indexing request,
> does it immediately forward request to two replicas or
> it first indexes request itself, then sends request to its
> two replica?
>
> Thanks very much for helps, Lisheng
>
>


SolrCloud leader to replica

2013-04-10 Thread Zhang, Lisheng
Hi,

In solr 4x solrCloud, suppose we have only one shard and
two replica, when leader receives the indexing request,
does it immediately forward request to two replicas or
it first indexes request itself, then sends request to its
two replica?

Thanks very much for helps, Lisheng




RE: Solr language-dependent sort

2013-04-08 Thread Zhang, Lisheng
Hi,

Thanks very much for quick help!

In our case we mainly need to sort a field based on language defined at run 
time,
but I understood that the principle is the same.

Thanks and best regards, Lisheng

-Original Message-
From: Sujit Pal [mailto:sujitatgt...@gmail.com]On Behalf Of SUJIT PAL
Sent: Monday, April 08, 2013 1:27 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr language-dependent sort


Hi Lisheng,

We did something similar in Solr using a custom handler (but I think you could 
just build a custom QeryParser to do this), but you could do this in your 
application as well, ie, get the language and then rewrite your query to use 
the language specific fields. Come to think of it, the QueryParser would 
probably be sufficiently general to qualify as a patch for custom functionality.

-sujit

On Apr 8, 2013, at 12:28 PM, Zhang, Lisheng wrote:

> 
> Hi,
> 
> I found that in solr we need to define a special fieldType for each
> language (http://wiki.apache.org/solr/UnicodeCollation), then point
> a field to this type.
> 
> But in our application one field (like 'title') can be used by various
> users for their languages (user1 used for English, user2 used it for
> Japanese ..), so it is even difficult for us to use dynamical field,
> we would prefer to pass in a parameter like 
> 
> language = 'en'
> 
> at run time, then solr API may use this parameter to call lucene API
> to sort a field. This approach would be much more flexible (we programmed
> this way when using lucene directly)?
> 
> Thanks very much for helps, Lisheng



Solr language-dependent sort

2013-04-08 Thread Zhang, Lisheng

Hi,

I found that in solr we need to define a special fieldType for each
language (http://wiki.apache.org/solr/UnicodeCollation), then point
a field to this type.

But in our application one field (like 'title') can be used by various
users for their languages (user1 used for English, user2 used it for
Japanese ..), so it is even difficult for us to use dynamical field,
we would prefer to pass in a parameter like 

language = 'en'

at run time, then solr API may use this parameter to call lucene API
to sort a field. This approach would be much more flexible (we programmed
this way when using lucene directly)?

Thanks very much for helps, Lisheng


RE: Solr 3.6.1 ClassNotFound Exception

2013-03-17 Thread Zhang, Lisheng
Hi Erick,

Thanks! Actually this is other people's installation and I help to
debug.

I guess is that in solrconfig.xml, the line:



(or other similar line) somehow does not work , I will try to look
more.

Thanks very much for helps, Lisheng



-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Sunday, March 17, 2013 7:26 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6.1 ClassNotFound Exception


Hmmm, you shouldn't have to go looking for this, it should just be there.
My guess is that you have some kind of classpath issue.

If you have access to a machine that has never seen Solr, or a VM, I'd try
installing a fresh copy of Solr. If that works, then you can be pretty sure
you've changed your environment (perhaps inadvertently).

Best
Erick


On Sat, Mar 16, 2013 at 10:11 AM, Zhang, Lisheng <
lisheng.zh...@broadvision.com> wrote:

>
> Hi,
>
> This is perhaps a trivial question but somehow I could not pin-down:
> when trying to index a file (using solr 3.6.1) I got the error:
>
> Caused by: org.apache.solr.common.SolrException: Error loading class
> 'solr.extraction.ExtractingRequestHandler'
>
> I know in solrconfig.xml we have defined
>
> ///
>startup="lazy"
>   class="solr.extraction.ExtractingRequestHandler" >
> ///
>
> and the jar file should be:
>
> /dist/apache-solr-cell-3.6.1.jar
>
> But above jar file only have class:
>
> jar tvf apache-solr-cell-3.6.1.jar | grep ExtractingRequestHandler
>   5332 Tue Jul 17 12:45:40 PDT 2012
> org/apache/solr/handler/extraction/ExtractingRequestHandler.class
>
> Where we can find "solr.extraction.ExtractingRequestHandler" ?
>
> Thanks very much for helps, Lisheng
>


Solr 3.6.1 ClassNotFound Exception

2013-03-16 Thread Zhang, Lisheng

Hi,

This is perhaps a trivial question but somehow I could not pin-down:
when trying to index a file (using solr 3.6.1) I got the error:

Caused by: org.apache.solr.common.SolrException: Error loading class 
'solr.extraction.ExtractingRequestHandler'

I know in solrconfig.xml we have defined 

///

///

and the jar file should be:

/dist/apache-solr-cell-3.6.1.jar

But above jar file only have class:

jar tvf apache-solr-cell-3.6.1.jar | grep ExtractingRequestHandler
  5332 Tue Jul 17 12:45:40 PDT 2012 
org/apache/solr/handler/extraction/ExtractingRequestHandler.class

Where we can find "solr.extraction.ExtractingRequestHandler" ?

Thanks very much for helps, Lisheng


lucene merge policy in solr

2013-03-05 Thread Zhang, Lisheng
Hi,

In earlier lucene version it merges segements periodically
according to merge policy, when it reached merge time, indexing
request may take longer time to finish (in my test it may delay
10-30 seconds, depending on indexed data size).

I read solr 3.6 - 4.1 doc and we have entries in solrconfig.xml
to control segment merge. I am wondering if someone gives me
a very high-level confirmation: in solr 3.6 - 4.1, indexing could 
be delayed also when big merge happens, and before merging finishes
we cannot index (since collection is locked)?

Thanks very much for helps, Lisheng


RE: Solr query parser, needs to call setAutoGeneratePhraseQueries(true)

2013-02-11 Thread Zhang, Lisheng
Thanks very much, it worked perfectly !!

Best regards, Lisheng

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Friday, February 08, 2013 1:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr query parser, needs to call
setAutoGeneratePhraseQueries(true)


(Sorry for my split message)...

See the text_en_splitting field type for an example:


...

-- Jack Krupansky

-Original Message- 
From: Zhang, Lisheng
Sent: Friday, February 08, 2013 3:20 PM
To: solr-user@lucene.apache.org
Subject: Solr query parser, needs to call setAutoGeneratePhraseQueries(true)


Hi,

In our application we need to call method

setAutoGeneratePhraseQueries(true)

on lucene QueryParser, this is the way used to work in earlier versions
and it seems to me that is the much natural way?

But in current solr 3.6.1, the only way to do so is to set

LUCENE_30

in solrconfig.xml (if I read souce code correctly), but I donot want to
do so because this will change the whole behavior of lucene, and I only
want to change this query parser behavior, not other lucene features?

Please guide me if there is a better way other than to change solr source
code?

Thanks very much for helps, Lisheng 



RE: Solr query parser, needs to call setAutoGeneratePhraseQueries(true)

2013-02-08 Thread Zhang, Lisheng
Thanks very much for your valuable help, it worked perfectly !!!

Lisheng

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Friday, February 08, 2013 12:54 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr query parser, needs to call
setAutoGeneratePhraseQueries(true)


Simply add the "autoGeneratePhraseQueries" attribute with a value of "true" 
to all of your "text" field types in your schema.xml.

See the text_en_splitting field type for an example:


...

-- Jack Krupansky

-Original Message- 
From: Jack Krupansky
Sent: Friday, February 08, 2013 3:51 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr query parser, needs to call 
setAutoGeneratePhraseQueries(true)

Simply add the "autoGeneratePhraseQueries" attribute with a value of "true"
to all of your "text" field types in your schema.xml.

See the text_




-- Jack Krupansky
-Original Message- 
From: Zhang, Lisheng
Sent: Friday, February 08, 2013 3:20 PM
To: solr-user@lucene.apache.org
Subject: Solr query parser, needs to call setAutoGeneratePhraseQueries(true)


Hi,

In our application we need to call method

setAutoGeneratePhraseQueries(true)

on lucene QueryParser, this is the way used to work in earlier versions
and it seems to me that is the much natural way?

But in current solr 3.6.1, the only way to do so is to set

LUCENE_30

in solrconfig.xml (if I read souce code correctly), but I donot want to
do so because this will change the whole behavior of lucene, and I only
want to change this query parser behavior, not other lucene features?

Please guide me if there is a better way other than to change solr source
code?

Thanks very much for helps, Lisheng 



Solr query parser, needs to call setAutoGeneratePhraseQueries(true)

2013-02-08 Thread Zhang, Lisheng

Hi,

In our application we need to call method

setAutoGeneratePhraseQueries(true)

on lucene QueryParser, this is the way used to work in earlier versions
and it seems to me that is the much natural way?

But in current solr 3.6.1, the only way to do so is to set 

LUCENE_30

in solrconfig.xml (if I read souce code correctly), but I donot want to 
do so because this will change the whole behavior of lucene, and I only 
want to change this query parser behavior, not other lucene features?

Please guide me if there is a better way other than to change solr source
code?

Thanks very much for helps, Lisheng


RE: Solr exception when parsing XML

2013-01-16 Thread Zhang, Lisheng
Hi,

Thanks very much for helps! I checked solr source code, what happened is that
for XML text inside one element, solr does not call URLDecoder (but to pass
CTRL character, I have to call urlencode from PHP).

So either I try to remove CTRL character from PHP side, or I change solr 
XMLReader
slightly to call URLDecoder on text.

Thanks and best regards, Lisheng


-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Wednesday, January 16, 2013 2:41 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr exception when parsing XML


In Apache Nutch we strip non-character code points with a simple method. Check 
the patch, the relevant part is easily ported to any language: 
https://issues.apache.org/jira/browse/NUTCH-1016

 
 
-Original message-
> From:Zhang, Lisheng 
> Sent: Wed 16-Jan-2013 20:48
> To: solr-user@lucene.apache.org
> Subject: RE: Solr exception when parsing XML
> 
> Hi Alex,
> 
> Thanks very much for helps! I switched to (I am using PHP in client side)
> 
> createTextNode(urlencode($value))
> 
> so CTRL character problem is avoided, but I noticed that somehow solr did
> not perform urldecode($value), so my initial value
> 
> abc xyz
> 
> becomes 
> 
> abc+xyz 
> 
> I have not fully read through solr code on this part, but guess maybe it
> is a configuration issue (when using CDATA I donot have this issue)?
> 
> Thanks and best regards, Lisheng
> 
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Tuesday, January 15, 2013 12:56 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr exception when parsing XML
> 
> 
> Interesting point. Looks like CDATA is more limiting than I thought:
> http://en.wikipedia.org/wiki/CDATA#Issues_with_encoding . Basically, the
> recommendation is to avoid CDATA and automatically encode characters such
> as yours, as well as less/more and ampersand.
> 
> Regards,
>Alex.
> 


RE: Solr exception when parsing XML

2013-01-16 Thread Zhang, Lisheng
Hi Alex,

Thanks very much for helps! I switched to (I am using PHP in client side)

createTextNode(urlencode($value))

so CTRL character problem is avoided, but I noticed that somehow solr did
not perform urldecode($value), so my initial value

abc xyz

becomes 

abc+xyz 

I have not fully read through solr code on this part, but guess maybe it
is a configuration issue (when using CDATA I donot have this issue)?

Thanks and best regards, Lisheng

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Sent: Tuesday, January 15, 2013 12:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr exception when parsing XML


Interesting point. Looks like CDATA is more limiting than I thought:
http://en.wikipedia.org/wiki/CDATA#Issues_with_encoding . Basically, the
recommendation is to avoid CDATA and automatically encode characters such
as yours, as well as less/more and ampersand.

Regards,
   Alex.


Solr exception when parsing XML

2013-01-15 Thread Zhang, Lisheng
Hi,
 
I got SolrException when submitting XML for indexing (using solr 3.6.1)
 

Jan 15, 2013 10:22:42 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Illegal character ((CTRL-CHAR, cod
e 31))
 at [row,col {unknown-source}]: [2,1169]
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:81)
 
Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character 
((CTRL-CHAR, code 31))
...
 at [row,col {unknown-source}]: [2,1169]
at 
com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
at 
com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:660)
at 
com.ctc.wstx.sr.BasicStreamReader.readCDataPrimary(BasicStreamReader.java:4240)
at 
com.ctc.wstx.sr.BasicStreamReader.nextFromTreeCommentOrCData(BasicStreamReader.java:3280)
at 
com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2824)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:309)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:156)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)

 
I checked details, the data causing trouble is 
 
word1chr(31)word2
 
here both word1 and word2 are normail English characters and "chr(31)" is just 
the returning value of PHP
function chr(31). Our XML is well constructed and encoding/charset are well 
defined. 
 
The problem is due to chr(31), if I replace it with another UTF-8 character, 
indexing is OK. 
 
I checked source code com.ctc.wstx.sr.BasicStreamReader.java, it seems that it 
is by design any CTRL
character is not allowed inside CDATA text, but I am puzzled that how could we 
avoid CTRL character in
text in general (sure it is not a common occurance but can still happen)?
 
Thanks very much for helps, Lisheng


RE: theory of sets

2013-01-07 Thread Zhang, Lisheng
Hi,

Just thought this possibility: I think dynamic field is solr concept, on lcene
level all fields are the same, but in initial startup, lucene should load all 
field information into memory (not field data, but schema).

If we have too many fields (like *_my_fields, * => a1, a2, ...), does this take 
too much memory and slow down performance (even if very few fields are really 
used)?

Best regards, Lisheng

-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk]
Sent: Monday, January 07, 2013 2:57 PM
To: solr-user@lucene.apache.org
Subject: Re: theory of sets


Dynamic fields resulted in poor response times? How many fields did each
document have? I can't see how a dynamic field should have any
difference from any other field in terms of response time.

Or are you querying across a large number of dynamic fields
concurrently? I can imagine that slowing things down.

Upayavira

On Mon, Jan 7, 2013, at 05:18 PM, Uwe Reh wrote:
> Hi Robi,
> 
> thank you for the contribution. It's exiting to read, that your index 
> isn't contaminated by the number of fields. I can't exclude other 
> mistakes, but my first experience with extensive use of dynamic fields 
> have been very poor response times.
> 
> Even though I found an other solution, I should give the straight 
> forward solution a second chance.
> 
> Uwe
> 
> Am 07.01.2013 17:40, schrieb Petersen, Robert:
> > Hi Uwe,
> >
> > We have hundreds of dynamic fields but since most of our docs only use some 
> > of them it doesn't seem to be a performance drag.  They can be viewed as a 
> > sparse matrix of fields in your indexed docs.  Then if you make the 
> > sortinfo_for_groupx an int then that could be used in a function query to 
> > perform your sorting.  See  http://wiki.apache.org/solr/FunctionQuery
> 


RE: File content indexing

2012-09-27 Thread Zhang, Lisheng
Hi Erik,

I really meant to send this message earlier, I read code and tested,
your suggestion solved my problem, really appreciate!

Thanks very much for helps, Lisheng

-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com]
Sent: Tuesday, September 18, 2012 5:04 PM
To: solr-user@lucene.apache.org
Subject: Re: File content indexing


Solr Cell can already do this.  See the stream.file parameter and content steam 
info on the wiki. 

Erik

On Sep 18, 2012, at 19:56, "Zhang, Lisheng"  
wrote:

> Hi, 
> 
> Sorry I just sent out an unfinished message!
> 
> Reading Solr cell, we indexing a file by first upload it through HTTP to 
> solr, in my
> experience it is rather expensive to pass a big file through HTTP.
> 
> If the file is local, maybe the better way is to pass file path to solr so 
> that solr can
> use java.io API to get file content, maybe this can be much faster?
> 
> I am thinking to change solr a little to do, do you think this is a sensible 
> thing to 
> do (I know how to do, but not sure it can improve performance significantly)?
> 
> Thanks very much for helps, Lisheng


File content indexing

2012-09-18 Thread Zhang, Lisheng
Hi, 
 
Sorry I just sent out an unfinished message!
 
Reading Solr cell, we indexing a file by first upload it through HTTP to solr, 
in my
experience it is rather expensive to pass a big file through HTTP.
 
If the file is local, maybe the better way is to pass file path to solr so that 
solr can
use java.io API to get file content, maybe this can be much faster?
 
I am thinking to change solr a little to do, do you think this is a sensible 
thing to 
do (I know how to do, but not sure it can improve performance significantly)?
 
Thanks very much for helps, Lisheng


File content indexing

2012-09-18 Thread Zhang, Lisheng
Hi,
 
Reading Solr cell, we indexing a file by first upload it through HTTP to solr, 
in my
experience it is rather expensive to pass a big file through HTTP.
 
If the file is local, maybe the better way is to pass file path to solr so that 
solr can
use java.io API to get file content, maybe this can be much faster?
 
I am thinking to change solr a little to do 


RE: In multi-core, special dataDir is not used?

2012-09-17 Thread Zhang, Lisheng
Thanks very much for your quick guidance, which is very helpful!

Lisheng

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: Monday, September 17, 2012 6:30 PM
To: solr-user@lucene.apache.org
Subject: Re: In multi-core, special dataDir is not used?



: I can't reproduce the problem you are seeing -- can you please provide 
: more details..

Correction: i can reproduce this.

This was in fact some odd behavior in the 1.x and 3.x lines that has been 
changed for 4.x in SOLR-1897.

If you had no  in your solrconfig.xml, or if you had a *blank* 
 then prior to 4.x the dataDir option specified when 
CREATEing a core would override the default -- but if you had any real 
path specified, then it would trump anything specified at runtime.

The "workarround" i believe (but i haven't tested exhaustively) for 
3.4-3.6.1 is not to specify a hardcoded dataDir in your solrconig.xml, but 
instead specify a property with a "default" value for the dataDir that can, and 
then 
use that property when issuing the CREATE command, ie...

  ${yourPropertyName:/some/default/path}

?action=CREATE&name=yourCoreName&instanceDir=yourCoreDir&property.yourPropertyName=/override/path


-Hoss


In multi-core, special dataDir is not used?

2012-09-17 Thread Zhang, Lisheng
Hi,
 
I am using solr 3.6.1, I created a new core "whatever3" dynamically, and I see 
solr.xml updated
as:
 

  
...

  

 
But when I update data like 
http://localhost:8080/solr/whatever3/update?commit=true";, the data
did not go to the newly specified dataDir (I can see core "whatver3" is 
apparently used from log)?
 
Only way to make it work is NOT to define dataDir in solrconfig.xml, is this by 
design or I missed
sth?
 
Thanks very much for helps, Lisheng


RE: SolrCloud vs SolrReplication

2012-09-08 Thread Zhang, Lisheng
Hi Erick,

You mentioned that "it'll automatically use old-style
replication to do the bulk synchronization" in solr
cloud, so it uses HTTP for replication as in 3.6, does
this mean the synchronization in solrCloud is not real 
time (has to have some delays)?

Thanks very much for helps, Lisheng

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Saturday, September 08, 2012 1:44 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud vs SolrReplication


See inline

On Sat, Sep 8, 2012 at 1:09 AM, thaihai  wrote:
> Hi All,
>
> im little bit confussed about the new cloud functinalities.
>
> some questions:
>
> 1) its possible to use the old style solrreplication in solr4 (it means not
> using solrcloud. not starting with zk params) ?
>

Yes. If you use SolrCloud (the Zookeeper options), you don't need to
set up replication. But if you don't use SolrCloud it's just like it was
in 3.x.


> 2) in our production-environment we use solr 3.6 with solrreplication. we
> have 1 index server und 2 front (slave) server. one webapp use the both
> front-server for searching. another application push index-requests to the
> index-server. the app have queueing. so we dont must have HA here.
> if we make index (schema) changes or need to scratch and reeindex the whole
> index we have do following szenario:
>  1 remove replication for both front-server
>  2 scratch index server
>  3 reeindex index server
>  4 remove front 1 server from web app (at this point webapp use only front2
> for searches)
>  5 scratch front 1
>  6 enable front 1 replication
>  7 test front 1 server with searches over lucene admin ui on front 1
>  8 if all correct, enable front 1 for web app
>  9 done all with second slave at point 4
>
> so, my problem is to do the same functionality with solr cloud ?
>
> supposed, i have a 2 shared with replicas cluster. how can i make a complete
> re-eindex with no affects for the web app during the index process ? and i
> will check the rebuild before i approve the new index to the web app. ???
>
> any ideas or tips ?
>
> sorry for the bad english
>
>

I'm not entirely sure about this, meaning I haven't done it personally. But
I think you can do this...

Let's take the simple 2-shard case, each with a leader and replica.
Take one machine
out of each slice (or have two other machines you can use). Make your schema
changes and re-index to these non-user-facing machines. These are now a complete
new index of two shards.

Now point your user traffic to these new indexes (they are SolrCloud machines).
Now simply scratch your old machines and bring them up in the same
cluster as the
two new machines, and SolrCloud will automatically
1> assign them as replicas of your two shards appropriately
2> synchronize the index (actually, it'll automatically use old-style
replication
 to do the bulk synchronization, you don't have to configure anything).
3> route searches to the new replicas as appropriate.

You really have to forget most of what you know about Solr replication when
moving to the Solr Cloud world, it's all magic ...

Best
Erick

>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrCloud-vs-SolrReplication-tp4006327.html
> Sent from the Solr - User mailing list archive at Nabble.com.


RE: Bulk Indexing

2012-07-27 Thread Zhang, Lisheng
Hi,

Previously I asked a similar question and I have not fully implemented yet.

My plan is:
1) use Solr only for search, not for indexing
2) have a separate java process to index (calling lucene API directly, maybe
   can call Solr API, I need to check more details).

As other people pointed earlier, the problem with above plan is that Solr
does not know when to reload IndexSearcher (namely underlying IndexReader)
after indexing is done, since indexer and Solr are two separate processes?

My plan is to let Solr not to cache any IndexReader (each time when performing
search, just create a new IndexSearcher), because:

1) our app is made of many lucene indexed data folders (in Solr language, many
   cores), caching IndexSearcher would be too expensive.
2) in my experience, without caching search is still quite fast (this is 
   maybe partially due to the fact our indexed data is not large, per folder).

This is just my plan (not fully implemented yet).

Best regards, Lisheng

-Original Message-
From: Sohail Aboobaker [mailto:sabooba...@gmail.com]
Sent: Friday, July 27, 2012 6:56 AM
To: solr-user@lucene.apache.org
Subject: Bulk Indexing


Hi,

We have created a search service which is responsible for providing
interface between Solr and rest of our application. It basically takes one
document at a time and updates or adds it to appropriate index.

Now, in application, we have processes, that add products (our document are
based on products) in bulk using a data bulk load process. At this point,
we use the same search service to add the documents in a loop. These can be
up to 20,000 documents in one load.

In a recent solr user discussion, it seems like this is a no-no strategy
with red flags all around it.

What are other alternatives?

Thanks,

Regards,
Sohail Aboobaker.


RE: Bulk indexing data into solr

2012-07-26 Thread Zhang, Lisheng
Hi,

I really appreciate your quick helps!

1) I want to let solr not cache any IndexerReader (hopefully it is possible),
because our app is made of many lucene folders and each of them not very
large, from my previous test it seems that performance is fine if each time
we just create IndexerReader. Hopefully doing this way we have no sync issue?

2) Our data is mainly in RDB (currently in mySQL and will move to Cassendra
later). My main concern is that by using Solr we need to pass rather large 
amount of data through network layer via HTTP, which could be a problem?

Best regards, Lisheng

-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com]
Sent: Thursday, July 26, 2012 12:46 PM
To: solr-user@lucene.apache.org
Subject: Re: Bulk indexing data into solr


IIRC about a two month ago problem with such scheme discussed here, but I
can remember exact details.
Scheme is generally correct. But you didn't tell how do you let solr know
that it need to reread new index generation, after indexer fsync segments
get.

btw, it might be a possible issue:
https://lucene.apache.org/core/old_versioned_docs//versions/3_0_1/api/all/org/apache/lucene/index/IndexWriter.html#commit()
 Note that this operation calls Directory.sync on the index files. That
call should not return until the file contents & metadata are on stable
storage. For FSDirectory, this calls the OS's fsync. But, beware: some
hardware devices may in fact cache writes even during fsync, and return
before the bits are actually on stable storage, to give the appearance of
faster performance.

you should ensure that after segments.get is fsync'ed, all other index
files are fsynced for other processes too.

Could you tell more about your data:
what's the format?
whether they are located relatively to indexer?
And why you can't use remote streaming by Solr's upd handler or indexer
client app with StreamingUpdateServer ?

On Thu, Jul 26, 2012 at 10:47 PM, Zhang, Lisheng <
lisheng.zh...@broadvision.com> wrote:

> Hi,
>
> I think at least before lucene 4.0 we can only allow one process/thread to
> write on
> a lucene folder. Based on this fact my initial plan is:
>
> 1) There is one set of lucene index folders.
> 2) Solr server only perform queries in those servers
> 3) Having a separate process (multi-threads) to index those lucene folders
> (each
>folder is a separate app). Only one thread will index one given lucene
> folder.
>
> Thanks very much for helps, Lisheng
>
>
> -Original Message-
> From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com]
> Sent: Thursday, July 26, 2012 10:15 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Bulk indexing data into solr
>
>
> Coming back to your original question. I'm puzzled a little.
> It's not clear where you wanna call Lucene API directly from.
> if you mean that you has standalone indexer, which write index files. Then
> it stops and these files become available for Solr Process it will work.
> Sharing index between processes, or using EmbeddedServer is looking for
> problem (despite Lucene has Locks mechanism, which I'm not completely aware
> of).
> I can conclude that your data for indexing is collocate with the solr
> server. In this case consider
> http://wiki.apache.org/solr/ContentStream#RemoteStreaming
>
> Please give more details about your design.
>
> On Thu, Jul 26, 2012 at 1:22 PM, Zhang, Lisheng <
> lisheng.zh...@broadvision.com> wrote:
>
> >
> > Hi,
> >
> > I am starting to use solr, now I need to index a rather large amount of
> > data, it seems
> > that calling solr to pass data through HTTP is rather inefficient, I am
> > think still call
> > lucene API directly for bulk index but to use solr for search, is this
> > design OK?
> >
> > Thanks very much for helps, Lisheng
> >
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Tech Lead
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  
>



-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

<http://www.griddynamics.com>
 


RE: Bulk indexing data into solr

2012-07-26 Thread Zhang, Lisheng
Hi,

I think at least before lucene 4.0 we can only allow one process/thread to 
write on
a lucene folder. Based on this fact my initial plan is:

1) There is one set of lucene index folders.
2) Solr server only perform queries in those servers
3) Having a separate process (multi-threads) to index those lucene folders 
(each 
   folder is a separate app). Only one thread will index one given lucene 
folder.

Thanks very much for helps, Lisheng


-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com]
Sent: Thursday, July 26, 2012 10:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Bulk indexing data into solr


Coming back to your original question. I'm puzzled a little.
It's not clear where you wanna call Lucene API directly from.
if you mean that you has standalone indexer, which write index files. Then
it stops and these files become available for Solr Process it will work.
Sharing index between processes, or using EmbeddedServer is looking for
problem (despite Lucene has Locks mechanism, which I'm not completely aware
of).
I can conclude that your data for indexing is collocate with the solr
server. In this case consider
http://wiki.apache.org/solr/ContentStream#RemoteStreaming

Please give more details about your design.

On Thu, Jul 26, 2012 at 1:22 PM, Zhang, Lisheng <
lisheng.zh...@broadvision.com> wrote:

>
> Hi,
>
> I am starting to use solr, now I need to index a rather large amount of
> data, it seems
> that calling solr to pass data through HTTP is rather inefficient, I am
> think still call
> lucene API directly for bulk index but to use solr for search, is this
> design OK?
>
> Thanks very much for helps, Lisheng
>
>


-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

<http://www.griddynamics.com>
 


RE: Bulk indexing data into solr

2012-07-26 Thread Zhang, Lisheng
Thanks very much, both your and Rafal's advice are very helpful!

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org]
Sent: Thursday, July 26, 2012 8:47 AM
To: solr-user@lucene.apache.org
Subject: Re: Bulk indexing data into solr


On 7/26/2012 7:34 AM, Rafał Kuć wrote:
> If you use Java (and I think you do, because you mention Lucene) you
> should take a look at StreamingUpdateSolrServer. It not only allows
> you to send data in batches, but also index using multiple threads.

A caveat to what Rafał said:

The streaming object has no error detection out of the box.  It queues 
everything up internally and returns immediately.  Behind the scenes, it 
uses multiple threads to send documents to Solr, but any errors 
encountered are simply sent to the logging mechanism, then ignored.  
When you use HttpSolrServer, all errors encountered will throw 
exceptions, but you have to wait for completion.  If you need both 
concurrent capability and error detection, you would have to manage 
multiple indexing threads yourself.

Apparently there is a method in the concurrent class that you can 
override and handle errors differently, though I have not seen how to 
write code so your program would know that an error occurred.  I filed 
an issue with a patch to solve this, but some of the developers have 
come up with an idea that might be better.  None of the ideas have been 
committed to the project.

https://issues.apache.org/jira/browse/SOLR-3284

Just an FYI, the streaming class was renamed to 
ConcurrentUpdateSolrServer in Solr 4.0 Alpha.  Both are available in 3.6.x.

Thanks,
Shawn



Bulk indexing data into solr

2012-07-26 Thread Zhang, Lisheng

Hi,

I am starting to use solr, now I need to index a rather large amount of data, 
it seems
that calling solr to pass data through HTTP is rather inefficient, I am think 
still call 
lucene API directly for bulk index but to use solr for search, is this design 
OK?

Thanks very much for helps, Lisheng



RE: Could I use Solr to index multiple applications?

2012-07-18 Thread Zhang, Lisheng
Yury and Shashi,

Thanks very much for helps! I am studying the options pointed
out by you (Solr multiple cores and Elasticsearch).

Best regards, Lisheng

-Original Message-
From: Yury Kats [mailto:yuryk...@yahoo.com]
Sent: Tuesday, July 17, 2012 7:19 PM
To: solr-user@lucene.apache.org
Subject: Re: Could I use Solr to index multiple applications?


On 7/17/2012 9:26 PM, Zhang, Lisheng wrote:
> Thanks very much for quick help! Multicore sounds interesting,
> I roughly read the doc, so we need to put each core name into
> Solr config XML, if we add another core and change XML, do we
> need to restart Solr?

You can add/create cores on the fly, without restarting.
See http://wiki.apache.org/solr/CoreAdmin#CREATE


RE: Could I use Solr to index multiple applications?

2012-07-17 Thread Zhang, Lisheng
Thanks very much for quick help! Multicore sounds interesting,
I roughly read the doc, so we need to put each core name into
Solr config XML, if we add another core and change XML, do we
need to restart Solr?

Best regards, Lisheng

-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com]On Behalf Of
Shashi Kant
Sent: Tuesday, July 17, 2012 5:46 PM
To: solr-user@lucene.apache.org
Subject: Re: Could I use Solr to index multiple applications?


Look up multicore solr. Another choice could be ElasticSearch - which
is more straightforward in managing multiple indexes IMO.



On Tue, Jul 17, 2012 at 7:53 PM, Zhang, Lisheng
 wrote:
> Hi,
>
> We have an application where we index data into many different directories 
> (each directory
> is corresponding to a different lucene IndexSearcher).
>
> Looking at Solr config it seems that Solr expects there is only one indexed 
> data directory,
> can we use Solr for our application?
>
> Thanks very much for helps, Lisheng
>


Could I use Solr to index multiple applications?

2012-07-17 Thread Zhang, Lisheng
Hi,
 
We have an application where we index data into many different directories 
(each directory
is corresponding to a different lucene IndexSearcher).
 
Looking at Solr config it seems that Solr expects there is only one indexed 
data directory,
can we use Solr for our application?
 
Thanks very much for helps, Lisheng