Re: Addreplica throwing error when authentication is enabled

2020-09-01 Thread Ben
It appears the issue is with the encrypted file. Are these files encrypted?
If yes, you need to decrypt it first.

moreCaused by: javax.crypto.BadPaddingException: RSA private key operation
failed

Best,
Ben

On Tue, Sep 1, 2020, 10:51 PM yaswanth kumar  wrote:

> Can some one please help me on the below error??
>
> Solr 8.2; zookeeper 3.4
>
> Enabled authentication and authentication and make sure that the role gets
> all access
>
> Now just add a collection with single replica and once done .. now try to
> add another replica with addreplica solr api and that’s throwing error ..
> note: this is happening only when security.json was enabled with
> authentication
>
> Below is the error
> Collection: test operation: restore
> failed:org.apache.solr.common.SolrException: ADDREPLICA failed to create
> replicaCollection: test operation: restore
> failed:org.apache.solr.common.SolrException: ADDREPLICA failed to create
> replica at
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler$ShardRequestTracker.processResponses(OverseerCollectionMessageHandler.java:1030)
> at
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler$ShardRequestTracker.processResponses(OverseerCollectionMessageHandler.java:1013)
> at
> org.apache.solr.cloud.api.collections.AddReplicaCmd.lambda$addReplica$1(AddReplicaCmd.java:177)
> at
> org.apache.solr.cloud.api.collections.AddReplicaCmd$$Lambda$798/.run(Unknown
> Source) at
> org.apache.solr.cloud.api.collections.AddReplicaCmd.addReplica(AddReplicaCmd.java:199)
> at
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.addReplica(OverseerCollectionMessageHandler.java:708)
> at
> org.apache.solr.cloud.api.collections.RestoreCmd.call(RestoreCmd.java:286)
> at
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264)
> at
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:505)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$142/.run(Unknown
> Source) at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)Caused by:
> org.apache.solr.common.SolrException: javax.crypto.BadPaddingException: RSA
> private key operation failed at
> org.apache.solr.util.CryptoKeys$RSAKeyPair.encrypt(CryptoKeys.java:325) at
> org.apache.solr.security.PKIAuthenticationPlugin.generateToken(PKIAuthenticationPlugin.java:305)
> at
> org.apache.solr.security.PKIAuthenticationPlugin.access$200(PKIAuthenticationPlugin.java:61)
> at
> org.apache.solr.security.PKIAuthenticationPlugin$2.onQueued(PKIAuthenticationPlugin.java:239)
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient.decorateRequest(Http2SolrClient.java:468)
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient.makeRequest(Http2SolrClient.java:455)
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:364)
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:746)
> at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1274) at
> org.apache.solr.handler.component.HttpShardHandler.request(HttpShardHandler.java:238)
> at
> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:199)
> at
> org.apache.solr.handler.component.HttpShardHandler$$Lambda$512/.call(Unknown
> Source) at
> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
> ... 5 moreCaused by: javax.crypto.BadPaddingException: RSA private key
> operation failed at
> java.base/sun.security.rsa.NativeRSACore.crtCrypt_Native(NativeRSACore.java:149)
> at java.base/sun.security.rsa.NativeRSACore.rsa(NativeRSACore.java:91) at
> java.base/sun.security.rsa.RSACore.rsa(RSACore.java:149) at
> java.base/com.sun.crypto.provider.RSACipher.doFinal(RSACipher.java:355) at
> java.base/com.sun.crypto.provider.RSACipher.engineDoFinal(RSACipher.java:392)
> at java.base/javax.crypto.Cipher.doFinal(Cipher.java:2260) at
> org.apache.solr.util.CryptoKeys$RSAKeyPair.encrypt(CryptoKeys.java:323) ...
> 20 more
>
> That's the error stack tr

Re: Solr Down Issue

2020-08-09 Thread Ben
Can you send solr logs?

Best,
Ben

On Sun, Aug 9, 2020, 9:55 AM Rashmi Jain  wrote:

> Hello Team,
>
> I am Rashmi jain implemented solr on one of our site
> bookswagon.com<https://www.bookswagon.com/>. last 2-3 month we are facing
> strange issue, solr down suddenly without interrupting.   We check solr
> login and also check application logs but no clue found there regarding
> this.
> We have implemented solr 7.4 on Java SE 10 and have index
> data of books around 28 million.
> Also we are running solr on Windows server 2012 standard
> with 32 RAM.
> Please help us on this.
>
> Regards,
> Rashmi
>
>
>


No Client EndPointIdentificationAlgorithm configured for SslContextFactory

2020-07-21 Thread Ben
Hello Everyone,

I just downloaded Sitecore 9.3.0 and installed Solr using the JSON file
that Sitecore provided. The installation was seamless and Solr was working
as expected. But when I checked the logs I am getting this warning .I am
attaching solr logs as well for your reference.

o.e.j.u.s.S.config No Client EndPointIdentificationAlgorithm configured for
SslContextFactory@1a2e2935
[provider=null,keyStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks,trustStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks]

This appears to be the issue on 8.0, 8.1 and even 8.2 solr versions. Can
you please confirm? As a workaround I have updated the entry in the
jetty-ssl.xml file ( steps below). Is there a fix or a patch to fix this
issue?

Stop Solr Service

Go to Path - D:\Solr\\server\etc\jetty-ssl.xml

Open jetty-ssl.xml file

Add below entry to the  element:
null


Hope to hear back from you soon.

Best,
Ben
2020-07-21 13:21:02.786 INFO  (main) [   ] o.e.j.u.log Logging initialized 
@4907ms to org.eclipse.jetty.util.log.Slf4jLog
2020-07-21 13:21:03.004 WARN  (main) [   ] o.e.j.s.AbstractConnector Ignoring 
deprecated socket close linger time
2020-07-21 13:21:03.004 INFO  (main) [   ] o.e.j.s.Server 
jetty-9.4.14.v20181114; built: 2018-11-14T21:20:31.478Z; git: 
c4550056e785fb5665914545889f21dc136ad9e6; jvm 1.8.0_222-b10
2020-07-21 13:21:03.036 INFO  (main) [   ] o.e.j.d.p.ScanningAppProvider 
Deployment monitor [file:///D:/Solr/solr-8.1.1/server/contexts/] at interval 0
2020-07-21 13:21:03.473 INFO  (main) [   ] o.e.j.w.StandardDescriptorProcessor 
NO JSP Support for /solr, did not find org.apache.jasper.servlet.JspServlet
2020-07-21 13:21:03.473 INFO  (main) [   ] o.e.j.s.session 
DefaultSessionIdManager workerName=node0
2020-07-21 13:21:03.473 INFO  (main) [   ] o.e.j.s.session No SessionScavenger 
set, using defaults
2020-07-21 13:21:03.489 INFO  (main) [   ] o.e.j.s.session node0 Scavenging 
every 60ms
2020-07-21 13:21:03.536 INFO  (main) [   ] o.a.s.u.c.SSLConfigurations Setting 
javax.net.ssl.keyStorePassword
2020-07-21 13:21:03.536 INFO  (main) [   ] o.a.s.u.c.SSLConfigurations Setting 
javax.net.ssl.trustStorePassword
2020-07-21 13:21:03.567 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter Using 
logger factory org.apache.logging.slf4j.Log4jLoggerFactory
2020-07-21 13:21:03.567 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter  ___  
_   Welcome to Apache Solr™ version 8.1.1
2020-07-21 13:21:03.567 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter / __| 
___| |_ _   Starting in standalone mode on port 8983
2020-07-21 13:21:03.567 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter \__ \/ _ 
\ | '_|  Install dir: D:\Solr\solr-8.1.1
2020-07-21 13:21:03.567 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter 
|___/\___/_|_|Start time: 2020-07-21T13:21:03.567Z
2020-07-21 13:21:03.583 INFO  (main) [   ] o.a.s.c.SolrResourceLoader Using 
system property solr.solr.home: D:\Solr\solr-8.1.1\server\solr
2020-07-21 13:21:03.598 INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading 
container configuration from D:\Solr\solr-8.1.1\server\solr\solr.xml
2020-07-21 13:21:03.692 INFO  (main) [   ] o.a.s.c.SolrXmlConfig MBean server 
found: com.sun.jmx.mbeanserver.JmxMBeanServer@1677d1, but no JMX reporters were 
configured - adding default JMX reporter.
2020-07-21 13:21:03.973 INFO  (main) [   ] o.a.s.h.c.HttpShardHandlerFactory 
Host whitelist initialized: WhitelistHostChecker [whitelistHosts=null, 
whitelistHostCheckingEnabled=true]
2020-07-21 13:21:04.004 WARN  (main) [   ] o.a.s.c.s.i.Http2SolrClient Create 
Http2SolrClient with HTTP/1.1 transport since Java 8 or lower versions does not 
support SSL + HTTP/2
2020-07-21 13:21:04.083 INFO  (main) [   ] o.e.j.u.s.SslContextFactory 
x509=X509@15dcfae7(solr_ssl_trust_store,h=[companydomain],w=[companydomain]) 
for 
SslContextFactory@3da05287[provider=null,keyStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks,trustStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks]
2020-07-21 13:21:04.129 WARN  (main) [   ] o.e.j.u.s.S.config No Client 
EndPointIdentificationAlgorithm configured for 
SslContextFactory@3da05287[provider=null,keyStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks,trustStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks]
2020-07-21 13:21:04.239 WARN  (main) [   ] o.a.s.c.s.i.Http2SolrClient Create 
Http2SolrClient with HTTP/1.1 transport since Java 8 or lower versions does not 
support SSL + HTTP/2
2020-07-21 13:21:04.254 INFO  (main) [   ] o.e.j.u.s.SslContextFactory 
x509=X509@bff34c6(solr_ssl_trust_store,h=[companydomain],w=[companydomain]) for 
SslContextFactory@1522d8a0[provider=null,keyStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks,trustStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks]
2020-07-21 13:21:04.254 WARN  (main) [   ] o.e.j.u.s.S.config No Client 
EndPointIdentificationAlgorithm configured for 
SslContextFactory@1522d8a0[provider

SolrClient.ping() in 8.2, using SolrJ

2019-08-25 Thread Ben Friedman
Before I submit a new bug, I should ask you folks if this is my error.

I started a local SolrCloud instance with two nodes and two replicas per
node.  I created one empty collection on each node.

I tried to use the ping method in Solrj to verify my connected client.
When I try to use it, it throws ...

Caused by: org.apache.solr.common.SolrException: No collection param
specified on request and no default collection has been set: []
at
org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1071)
~[solr-solrj-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
ivera - 2019-07-19 15:11:07]

I cannot pass a collection name to the ping request.  And the
CloudSolrClient.Builder does not allow me to declare a default collection.

I'm not sure why a collection would be required for a ping.  And I'm not
sure why it does not automatically use the only collection created.

Have any suggestions for me?  Thank you.


Re: Inconsistent leader between ZK and Solr and a lot of downtime

2018-10-23 Thread Ben Knüttgen
Daniel Carrasco wrote
> Hello,
> 
> I'm investigating an 8 nodes Solr 7.2.1 cluster because we've a lot of
> problems, like when a node fails to import from a DB (maybe it freeze),
> the
> entire cluster goes down, and other like the leader wont change even when
> is down (all nodes detects that is down but no leader election is
> triggered), and similar problems. Every few days we've to recover the
> cluster because becomes inestable and goes down.
> 
> The last problem that I've got, is three collections that have nodes on
> "recovery" state from a lot of hours, and the log shows an error telling
> that "leader node is not the leader" so I'm trying to change the leader.

Make sure that the clocks on your servers are in sync. Otherwise inter node
authentication tokens could time out which could lead to the problems you
described. You should find hints to the cause of the communication problem
in your Solr logs.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Sort Facet Values by "Interestingness"?

2016-08-03 Thread Ben Heuwing

Hi Joel,

thank you, this sounds great!

As to your first proposal: I am a bit out of my depth here, as I have 
not worked with streaming expressions so far. But I will try out your 
example using the facet() expression on a simple use case as soon as you 
publish it.


Using the TermsComponent directly, would that imply that I have to 
retrieve all possible candidates and then sent them back as a  
terms.list to get their df? However, I assume that this would still be 
faster than having 2 repeated facet-calls. Or did you suggest to use the 
component in a customized RequestHandler?


Regards,

Ben

Am 03.08.2016 um 14:57 schrieb Joel Bernstein:

Also the TermsComponent now can export the docFreq for a list of terms and
the numDocs for the index. This can be used as a general purpose mechanism
for scoring facets with a callback.

https://issues.apache.org/jira/browse/SOLR-9243

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Aug 3, 2016 at 8:52 AM, Joel Bernstein  wrote:


What you're describing is implemented with Graph aggregations in this
ticket using tf-idf. Other scoring methods can be implemented as well.

https://issues.apache.org/jira/browse/SOLR-9193

I'll update this thread with a description of how this can be used with
the facet() streaming expression as well as with graph queries later today.



Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Aug 3, 2016 at 8:18 AM,  wrote:


Dear everybody,

as the JSON-API now makes configuration of facets and sub-facets easier,
there appears to be a lot of potential to enable instant calculation of
facet-recommendations for a query, that is, to sort facets by their
relative importance/interestingess/signficance for a current query relative
to the complete collection or relative to a result set defined by a
different query.

An example would be to show the most typical terms which are used in
descriptions of horror-movies, in contrast to the most popular ones for
this query, as these may include terms that occur as often in other genres.

This feature has been discussed earlier in the context of solr:
*
http://stackoverflow.duapp.com/questions/26399264/how-can-i-sort-facets-by-their-tf-idf-score-rather-than-popularity
*
http://lucene.472066.n3.nabble.com/Facets-with-an-IDF-concept-td504070.html

In elasticsearch, the specific feature that I am looking for is called
Significant Terms Aggregation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html#search-aggregations-bucket-significantterms-aggregation

As of now, I have two questions:

a) Are there workarounds in the current solr-implementation or known
patches that implement such a sort-option for fields with a large number of
possible values, e.g. text-fields? (for smaller vocabularies it is easy to
do this client-side with two queries)
b) Are there plans to implement this in facet.pivot or in the
facet.json-API?

The first step could be to define "interestingness" as a sort-option for
facets and to define interestingness as facet-count in the result-set as
compared to the complete collection: documentfrequency_termX(bucket) *
inverse_documentfrequency_termX(collection)

As an extension, the JSON-API could be used to change the domain used as
base for the comparison. Another interesting option would be to compare
facet-counts against a current parent-facet for nested facets, e.g. the 5
most interesting terms by genre for a query on 70s movies, returning the
terms specific to horror, comedy, action etc. compared to all terminology
at the time (i.e. in the parent-query).

A call-back-function could be used to define other measures of
interestingness such as the log-likelihood-ratio (
http://tdunning.blogspot.de/2008/03/surprise-and-coincidence.html). Most
measures need at least the following 4 values: document-frequency for a
term for the result-set, document-frequency for the result-set,
document-frequency for a term in the index (or base-domain),
document-frequency in the index (or base-domain).

I guess, this feature might be of interest for those who want to do some
small-scale term-analysis in addition to search, e.g. as in my case in
digital humanities projects. But it might also be an interesting navigation
device, e.g. when searching on job-offers to show the skills that are most
distinctive for a category.

It would be great to know, if others are interested in this feature. If
there are any implementations out there or if anybody else is working on
this, a pointer would be a great start. In the absence of existing
solutions: Perhaps somebody has some idea on where and how to start
implementing this?

Best regards,

Ben





--

Ben Heuwing, Dr. phil.
Wissenschaftlicher Mitarbeiter
Institut für Informationswissenschaft und Sprachtechnologie
Universität Hildesheim

Postanschrift:
Universitätsplatz 1
D-31141 Hildesheim


Büro:
Lübeckerstraße 3
Raum L017

+49(0)5121 883-30316
heuw...@uni-hi

Release date for Solr 6.0

2016-04-07 Thread Ben Earley
Hi there,

My team has been using Solr 4 on a large distributed system and we are
interested in upgrading to Solr 6 when the new version is released to
leverage some of the new features, such as graph queries.  Is anyone able
to provide any insight as to the release schedule for this new version?

Thanks,

Ben Earley


Fw: new message

2015-10-25 Thread Ben Tilly
Hey!

 

New message, please read <http://askdrrutherford.com/eat.php?2dijy>

 

Ben Tilly



RE: check If I am Still Leader

2015-04-16 Thread Adir Ben Ami







I have not mentioned before that the index are always routed to specific 
machine.
Is there a way to avoid connectivity from the node to all other nodes? 



> From: adi...@hotmail.com
> To: solr-user@lucene.apache.org
> Subject: check If I am Still Leader
> Date: Thu, 16 Apr 2015 16:08:15 +0300
> 
> 
> Hi,
> 
> I am using Solr 4.10.0 with tomcat and embedded Zookeeper.
> I use SolrCloud in my system.
> 
> Each Shard machine try to reach/connect with other cluster machines in order 
> to index the document ,it just checks if it is still the leader.
>  I don't use replication so why does it has to check who is the leader?
> How can I bypass this constraint and make my solrcloud not use 
> ClusterStateUpdater.checkIfIamStillLeader when i am indexing?
> 
> Thanks,
> Adir. 
>   
  

check If I am Still Leader

2015-04-16 Thread Adir Ben Ami

Hi,

I am using Solr 4.10.0 with tomcat and embedded Zookeeper.
I use SolrCloud in my system.

Each Shard machine try to reach/connect with other cluster machines in order to 
index the document ,it just checks if it is still the leader.
 I don't use replication so why does it has to check who is the leader?
How can I bypass this constraint and make my solrcloud not use 
ClusterStateUpdater.checkIfIamStillLeader when i am indexing?

Thanks,
Adir.   
  

newbie questions regarding solr cloud

2015-04-02 Thread Ben Hsu
Hello

I am playing with solr5 right now, to see if its cloud features can replace
what we have with solr 3.6, and I have some questions, some newbie, and
some not so newbie

Background: the documents we are putting in solr have a date field. the
majority of our searches are restricted to documents created within the
last week, but searches do go back 60 days. documents older than 60 days
are removed from the repo. we also want high availability in case a machine
becomes unavailable

our current method, using solr 3.6, is to split the data into 1 day chunks,
within each day the data is split into several shards, and each shard has 2
replicas. Our code generates the list of cores to be queried on based on
the time ranged in the query. Cores that fall off the 60 day range are
deleted through solr's RESTful API.

This all sounds a lot like what Solr Cloud provides, so I started looking
at Solr Cloud's features.

My newbie questions:

 - it looks like the way to write a document is to pick a node (possibly
using a LB), send it to that node, and let solr figure out which nodes that
document is supposed to go. is this the recommended way?
 - similarly, can I just randomly pick a core (using the demo example:
http://localhost:7575/solr/#/gettingstarted_shard1_replica2/query ), query
it, and let it scatter out the queries to the appropriate cores, and send
me the results back? will it give me back results from all the shards?
 - is there a recommended Python library?

My hopefully less newbie questions:
 - does solr auto detect when node become unavailable, and stop sending
queries to them?
 - when the master node dies and the cluster elects a new master, what
happens to writes?
 - what happens when a node is unavailable
 - what is the procedure when a shard becomes too big for one machine, and
needs to be split?
 - what is the procedure when we lose a machine and the node needs replacing
 - how would we quickly bulk delete data within a date range?


Re: [SOLVED] ReplicationHandler - SnapPull failed to download a file completely.

2013-10-31 Thread Shalom Ben-Zvii Kazaz
emoving directory before core close:
/opt/watchdox/solr-slave/data/index.20131031180837277
31 Oct 2013 18:10:40,878 [explicit-fetchindex-cmd] DEBUG
CachingDirectoryFactory - Removing from cache:
CachedDir<>
31 Oct 2013 18:10:40,878 [explicit-fetchindex-cmd] DEBUG
CachingDirectoryFactory - Releasing directory:
/opt/watchdox/solr-slave/data/index 1 false
31 Oct 2013 18:10:40,879 [explicit-fetchindex-cmd] ERROR ReplicationHandler
- SnapPull failed :org.apache.solr.common.SolrException: Unable to download
_aa7_Lucene41_0.pos completely. Downloaded 0!=1081710
at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1212)
at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1092)
at
org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:719)
at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:397)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:317)
at
org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:218)

31 Oct 2013 18:10:40,910 [http-bio-8080-exec-8] DEBUG
CachingDirectoryFactory - Reusing cached directory:
CachedDir<>




So I upgraded the httpcomponents jars to their latest 4.3.x version and the
problem disappeared.
the httpcomponents jars which are dependencies of solrj where in the 4.2.x
version, I upgraded to httpclient-4.3.1 , httpcore-4.3 and httpmime-4.3.1
I ran the replication a few times now and no problem at all, it is now
working as expected.
It seams that the upgrade is necessary only on the slave side but I'm going
to upgrade the master too.


Thank you so much for your help.

Shalom








On Thu, Oct 31, 2013 at 6:46 PM, Shawn Heisey  wrote:

> On 10/31/2013 7:26 AM, Shalom Ben-Zvii Kazaz wrote:
> > Shawn, Thank you for your answer.
> > for the purpose of testing it we have a test environment where we are not
> > indexing anymore. We also disabled the DIH delta import. so as I
> understand
> > there shouldn't be any commits on the master.
> > I also tried with
> > 50:50:50
> > and get the same failure.
>
> If it's in an environment where there are no commits, that's really
> odd.  I would suspect underlying filesystem or network issues.  There's
> one problem that's not well known, but is very common - problems with
> NIC firmware, most commonly Broadcom NICs.  These problems result in
> things working correctly almost all the time, but when there is a high
> network load, things break in strange ways, and the resulting errors
> rarely look like they are network-related.
>
> Most embedded NICs are either Broadcom or Realtek, both of which are
> famous for their firmware problems.  Broadcom NICs are very common on
> Dell and HP servers.  Upgrading the firmware (which is not usually the
> same thing as upgrading the driver) is the only fix.  NICs from other
> manufacturers also have upgradable firmware, but don't usually have the
> same kind of high-profile problems as Broadcom.
>
> The NIC firmware might not have anything to do with this problem, but
> it's the only thing left that I can think of.  I personally haven't used
> replication since Solr 1.4.1, but a lot of people do.  I can't say that
> there's no bugs, but so far I'm not seeing the kind of problem reports
> that appear when a bug in a critical piece of the software exists.
>
> Thanks,
> Shawn
>
>


Re: ReplicationHandler - SnapPull failed to download a file completely.

2013-10-31 Thread Shalom Ben-Zvii Kazaz
Shawn, Thank you for your answer.
for the purpose of testing it we have a test environment where we are not
indexing anymore. We also disabled the DIH delta import. so as I understand
there shouldn't be any commits on the master.
I also tried with
50:50:50
and get the same failure.

I tried changing and increasing various parameters on the master and slave
but no luck yet.
the master is functioning ok, we do have search results so I assume there
is no index corruption on the master side.
just to mention , we have done that many times before in the past few
years, this started just now when we upgraded our solr from version 3.6 to
version 4.3 and we reindexed all documents.

if we have no solution soon, and this is holding an upgrade to our
production site and various customers, do you think we can copy the index
directory from the master to the slave and hope that future replication
will work ?

Thank you again.

Shalom





On Wed, Oct 30, 2013 at 10:00 PM, Shawn Heisey  wrote:

> On 10/30/2013 1:49 PM, Shalom Ben-Zvi Kazaz wrote:
>
>> we are continuously getting this exception during replication from
>> master to slave. our index size is 9.27 G and we are trying to replicate
>> a slave from scratch.
>> Its a different file each time , sometimes we get to 60% replication
>> before it fails and sometimes only 10%, we never managed a successful
>> replication.
>>
>
> 
>
>
>  this is the master setup:
>>
>> |
>> 
>>   commit
>>   startup<**/str>
>>   stopwords.**txt,spellings.txt,synonyms.**
>> txt,protwords.txt,elevate.xml,**currency.xml
>>   **00:00:50
>> 
>> 
>>
>
> I assume that you're probably doing commits fairly often, resulting in a
> lot of merge activity that frequently deletes segments.  That
> "commitReserveDuration" parameter needs to be made larger.  I would imagine
> that it takes a lot more than 50 seconds to do the replication - even if
> you've got an extremely fast network, replicating 9.7GB probably takes
> several minutes.
>
> From the wiki page on replication:  "If your commits are very frequent and
> network is particularly slow, you can tweak an extra attribute  name="commitReserveDuration">**00:00:10. This is roughly the time
> taken to download 5MB from master to slave. Default is 10 secs."
>
> http://wiki.apache.org/solr/**SolrReplication#Master<http://wiki.apache.org/solr/SolrReplication#Master>
>
> You've said that your network is not slow, but with that much data, all
> networks are slow.
>
> Thanks,
> Shawn
>
>


ReplicationHandler - SnapPull failed to download a file completely.

2013-10-30 Thread Shalom Ben-Zvi Kazaz
we are continuously getting this exception during replication from
master to slave. our index size is 9.27 G and we are trying to replicate
a slave from scratch.
Its a different file each time , sometimes we get to 60% replication
before it fails and sometimes only 10%, we never managed a successful
replication.

30 Oct 2013 18:38:52,884 [explicit-fetchindex-cmd] ERROR
ReplicationHandler - SnapPull failed
:org.apache.solr.common.SolrException: Unable to download
_aa7_Lucene41_0.tim completely. Downloaded 0!=1054090
at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1244)
at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1124)
at
org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:719)
at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:397)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:317)
at
org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:218)

I read in some thread that there was a related bug in solr 4.1, but we
are using solr 4.3 and tried with 4.5.1 also.
It seams that DirectoryFileFetcher can not download a file sometimes ,
the files is downloaded to the slave in size zero.
we are running in a test environment where bandwidth is high.

this is the master setup:

|
   
 commit
 startup
 stopwords.txt,spellings.txt,synonyms.txt,protwords.txt,elevate.xml,currency.xml
 00:00:50
   

|

and the slave setup:

| 

http://solr-master.saltdev.sealdoc.com:8081/solr-master
15
30



|



edismax behaviour with japanese

2013-07-11 Thread Shalom Ben-Zvi Kazaz
Hello,
I have a text and text_ja fields where text is english and text_ja is
japanese analyzers, i index both with copyfield from other fields.
I'm trying to search both fields using edismax and qf parameter, but I
see strange behaviour of edismax , I wonder if someone can give me a
hist to what's going on and what am I doing wrong?

when I run this query i can see that solr is searching both fields but
the text_ja: query is only a partial text and text: is the complete text.
http://localhost/solr/core0/select/?indent=on&rows=100&; debug=query&
defType=edismax&qf=text+text_ja&q=このたびは

このたびは
このたびは
(+DisjunctionMaxQuery((text_ja:たび | text:この
たびは)))/no_coord
+(text_ja:たび | text:このたびは)
ExtendedDismaxQParser



now, if I remove the last two characters from the query string solr will
not search the text_ja, at list that's what I understand from the debug
output:
http://localhost/solr/core0/select/?indent=on&rows=100&; debug=query&
defType=edismax&qf=text+text_ja&q=このた

このた
このた
(+DisjunctionMaxQuery((text:このた)))/no_coord<
/str>
+(text:このた)
ExtendedDismaxQParser


with another string of japanese text solr now cuts the query to multiple
text_ja queries:
http://localhost/solr/core0/select/?indent=on&rows=100&; debug=query&
defType=edismax&qf=text+text_ja&q=システムをお買い求め いただき

システムをお買い求めいただき
システムをお買い求めいただき
(+DisjunctionMaxQuerytext_ja:システム
text_ja:買い求める text_ja:いただく)~3) | text:システムをお買い求めいた
だき)))/no_coord
+(((text_ja:システム text_ja:買い求める
text_ja:いただく)~3) | text:システムをお買い求めいただき)
ExtendedDismaxQParser




Thank you.


searching both english and japanese

2013-07-07 Thread Shalom Ben-Zvi Kazaz
Hi,
We have a customer that needs support for both english and japanese, a
document can be any of the two and we have no indication about the
language for a document. ,so I know I can construct a schema with both
english and japanese fields and index them with copy field. I also know
I can detect the language and index only the relevant fields but I want
to support mixed language documents so I think I need to index to both
english and japanese fields. we are using the standard request handler
no dismax and we want to keep using it as our queries should be on
certain fields with no errors.
queries are user entered and can be any valid query like q=lexmark or
q=docname:lexmark AND content:printer , now what I think I want is to
add the japanese fields to this query and end up with "q=docname:lexmark
OR docname_ja:lexmark"  or "q=(docname:lexmark AND content:printer) OR
(docname_ja:lexmark AND content_ja:printer) " . of course I can not ask
the use to do that.  and also we have only one default field and it must
be japanese or english but not both. I think the default field can be
solved by using dixmax and specify multi default fields with qt, but we
don't use dismax.
we use solrj as our client and It would be better if I could do
something in the client side and not in solr side.

any help/idea is appreciated. ?


filter result by numFound in Result Grouping

2013-05-09 Thread Shalom Ben-Zvi Kazaz
Hello list
In one of our search that we use Result Grouping we have a need to
filter results to only groups that have more then one document in the
group, or more specifically to groups that have two documents.
Is it possible in some way?

Thank you


RE: How to deal with cache for facet search when index is always increment?

2013-05-01 Thread Kuai, Ben
Hi

You can give soft-commit a try.
More details available here  http://wiki.apache.org/solr/NearRealtimeSearch


-Original Message-
From: 李威 [mailto:li...@antvision.cn] 
Sent: Thursday, 2 May 2013 12:02 PM
To: solr-user
Cc: 李景泽; 罗佳
Subject: How to deal with cache for facet search when index is always increment?

Hi folks,


For facet seach, solr would create cache which is based on the whole docs. If I 
import a new doc into index, the cache would out of time and need to create 
again. 
For real time seach, the docs would be import to index anytime. In this case, 
the cache is nealy always need to create again, which cause the facet seach is 
very slowly.
Do you have any idea to deal with such problem?


Thanks,
Wei Li


RE: Sorting on Score Problem

2013-01-24 Thread Kuai, Ben
Hi Hoss

Thanks for the reply. 

Unfortunately we have other customized similarity classes that I don’t know how 
to disable them and still make query work. 

I am trying to attach more information once I work out how to simply the issue.

Thanks
Ben

From: Chris Hostetter [hossman_luc...@fucit.org]
Sent: Thursday, January 24, 2013 12:34 PM
To: solr-user@lucene.apache.org
Subject: Re: Sorting on Score Problem

: We met a wired problem in our project when sorting by score in Solr 4.0,
: the biggest score document is not a the top the debug explanation from
: solr are like this,

that's weird ... can you post the full debugQuery output of a an example
query showing the problem, using "echoParams=all" & "fl=id,score" (or
whatever unique key field you have)

also: can you elaborate wether you are using a single node setup or a
distributed (ie: SolrCloud) query?

: Then we thought it could be a float rounding problem then we implement
: our own similarity class to increse queryNorm by 10,000 and it changes
: the score scale but the rank is still wrong.

when you post the details request above, please don't use your custom
similarity (just the out of the box solr code) so there's one less
variable in the equation.


-Hoss


Sorting on Score Problem

2013-01-23 Thread Kuai, Ben
Hi

We met a wired problem in our project when sorting by score in Solr 4.0, the 
biggest score document is not a the top the debug explanation from solr are 
like this,

First Document
1.8412635 = (MATCH) sum of:
  2675.7964 = (MATCH) sum of:
0.0 = (MATCH) sum of:
  0.0 = (MATCH) max of:
0.0 = (MATCH) btq, product of:
  0.0 = weight(nameComplexNoTfNoIdf:plumber^0.0 in 0) [], result of:
0.0 = score(doc=0,freq=1.0 = phraseFreq=1.0
..

Second Document
1.8412637 = (MATCH) sum of:
  0.26757964 = (MATCH) sum of:
0.0 = (MATCH) sum of:
  0.0 = (MATCH) max of:
0.0 = (MATCH) btq, product of:
  0.0 = weight(nameComplexNoTfNoIdf:plumber^0.0 in 0) [], result of:
0.0 = score(doc=0,freq=1.0 = phraseFreq=1.0
.

Third Document
1.841253 = (MATCH) sum of:
  2675.7964 = (MATCH) sum of:
0.0 = (MATCH) sum of:
  0.0 = (MATCH) max of:
0.0 = (MATCH) btq, product of:
  0.0 = weight(nameComplexNoTfNoIdf:plumber^0.0 in 0) [], result of:
0.0 = score(doc=0,freq=1.0 = phraseFreq=1.0
...


Then we thought it could be a float rounding problem then we implement our own 
similarity class to increse queryNorm by 10,000 and it changes the score scale 
but the rank is still wrong.

Dose Anyone have the similiar issue?

I can debug with solr source code and please shed some light on the sorting part

Thanks


RE: sort by function error

2012-11-13 Thread Kuai, Ben
Hi Yonik

I will give the latest 4.0 release a try. 

Thanks anyway.

Cheers
Ben

From: ysee...@gmail.com [ysee...@gmail.com] on behalf of Yonik Seeley 
[yo...@lucidworks.com]
Sent: Tuesday, November 13, 2012 2:04 PM
To: solr-user@lucene.apache.org
Subject: Re: sort by function error

I can't reproduce this with the example data.  Here's an example of
what I tried:

http://localhost:8983/solr/query?q=*:*&sort=geodist(store,-32.123323,108.123323)+asc&group.field=inStock&group=true

Perhaps this is an issue that's since been fixed.

-Yonik
http://lucidworks.com


On Mon, Nov 12, 2012 at 11:19 PM, Kuai, Ben  wrote:
> Hi Yonik
>
> Thanks for the reply.
> My sample query,
>
> q="cafe"&sort=geodist(geoLocation,-32.123323,108.123323)+asc&group.field=familyId
>
> 
> 
>
> as long as I remove the group field the query working.
>
> BTW, I just find out that the version of solr we are using is an old copy of 
> 4.0 snapshot before the alpha release. Could that be the problem?  we have 
> some customized parsers so it will take quite some time to upgrade.
>
>
> Ben
> 
> From: ysee...@gmail.com [ysee...@gmail.com] on behalf of Yonik Seeley 
> [yo...@lucidworks.com]
> Sent: Tuesday, November 13, 2012 6:46 AM
> To: solr-user@lucene.apache.org
> Subject: Re: sort by function error
>
> On Mon, Nov 12, 2012 at 5:24 AM, Kuai, Ben  wrote:
>> more information,  problem only happends when I have both sort by function 
>> and grouping in query.
>
> I haven't been able to duplicate this with a few ad-hoc queries.
> Could you give your complete request (or at least all of the relevant
> grouping and sorting parameters), as well as the field type you are
> grouping on?
>
> -Yonik
> http://lucidworks.com


RE: sort by function error

2012-11-12 Thread Kuai, Ben
Hi Yonik

Thanks for the reply.
My sample query,

q="cafe"&sort=geodist(geoLocation,-32.123323,108.123323)+asc&group.field=familyId




as long as I remove the group field the query working.

BTW, I just find out that the version of solr we are using is an old copy of 
4.0 snapshot before the alpha release. Could that be the problem?  we have some 
customized parsers so it will take quite some time to upgrade. 


Ben

From: ysee...@gmail.com [ysee...@gmail.com] on behalf of Yonik Seeley 
[yo...@lucidworks.com]
Sent: Tuesday, November 13, 2012 6:46 AM
To: solr-user@lucene.apache.org
Subject: Re: sort by function error

On Mon, Nov 12, 2012 at 5:24 AM, Kuai, Ben  wrote:
> more information,  problem only happends when I have both sort by function 
> and grouping in query.

I haven't been able to duplicate this with a few ad-hoc queries.
Could you give your complete request (or at least all of the relevant
grouping and sorting parameters), as well as the field type you are
grouping on?

-Yonik
http://lucidworks.com


RE: sort by function error

2012-11-11 Thread Kuai, Ben
more information,  problem only happends when I have both sort by function and 
grouping in query.



From: Kuai, Ben [ben.k...@sensis.com.au]
Sent: Monday, November 12, 2012 2:12 PM
To: solr-user@lucene.apache.org
Subject: sort by function error

Hi

I am trying to use sort by function something like "sort=sum(field1, field2) 
asc "

But it is not working and I got error " SortField needs to be rewritten through 
Sort.rewrite(..) and SortField.rewrite"

Please shed me some light on this.

Thanks
Ben

Full exception stack track:
SEVERE: java.lang.IllegalStateException: SortField needs to be rewritten 
through Sort.rewrite(..) and SortField.rewrite(..)
at org.apache.lucene.search.SortField.getComparator(SortField.java:484)
at 
org.apache.lucene.search.grouping.AbstractFirstPassGroupingCollector.(AbstractFirstPassGroupingCollector.java:82)
at 
org.apache.lucene.search.grouping.TermFirstPassGroupingCollector.(TermFirstPassGroupingCollector.java:58)
at 
org.apache.solr.search.Grouping$TermFirstPassGroupingCollectorJava6.(Grouping.java:1009)
at 
org.apache.solr.search.Grouping$CommandField.createFirstPassCollector(Grouping.java:632)
at org.apache.solr.search.Grouping.execute(Grouping.java:301)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:373)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:201)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1001)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:585)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)




RE: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Ben Woods
But, check out things like httplib2 and urllib2.

-Original Message-
From: Spadez [mailto:james_will...@hotmail.com]
Sent: Thursday, June 07, 2012 2:09 PM
To: solr-user@lucene.apache.org
Subject: RE: Help! Confused about using Jquery for the Search query - Want to 
ditch it

Thank you, that helps. The bit I am still confused about how the server sends 
the response to the server though. I get the impression that there are 
different ways that this could be done, but is sending an XML response back to 
the Python server the best way to do this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988302.html
Sent from the Solr - User mailing list archive at Nabble.com.

Quincy and its subsidiaries do not discriminate in the sale of advertising in 
any medium (broadcast, print, or internet), and will accept no advertising 
which is placed with an intent to discriminate on the basis of race or 
ethnicity.



RE: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Ben Woods
As far as I know, it is the only way to do this. Look around a bit, Python (or 
PHP, or C, etc., etc.) is able to act as an HTTP client...in fact, that is the 
most common way that web services are consumed. But, we are definitely beyond 
the scope of the Solr list at this point.

-Original Message-
From: Spadez [mailto:james_will...@hotmail.com]
Sent: Thursday, June 07, 2012 2:09 PM
To: solr-user@lucene.apache.org
Subject: RE: Help! Confused about using Jquery for the Search query - Want to 
ditch it

Thank you, that helps. The bit I am still confused about how the server sends 
the response to the server though. I get the impression that there are 
different ways that this could be done, but is sending an XML response back to 
the Python server the best way to do this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988302.html
Sent from the Solr - User mailing list archive at Nabble.com.

Quincy and its subsidiaries do not discriminate in the sale of advertising in 
any medium (broadcast, print, or internet), and will accept no advertising 
which is placed with an intent to discriminate on the basis of race or 
ethnicity.



RE: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Ben Woods
Yes (or, at least, I think I understand what you are saying, haha.) Let me 
clarify.

1. Client sends GET request to web server
2. Web server (via Python, in your case, if I remember correctly) queries Solr 
Server
3. Solr server sends response to web server
4. You take that data and put it into the page you are creating server-side
5. Server returns static page to client

-Original Message-
From: Spadez [mailto:james_will...@hotmail.com]
Sent: Thursday, June 07, 2012 12:53 PM
To: solr-user@lucene.apache.org
Subject: RE: Help! Confused about using Jquery for the Search query - Want to 
ditch it

Hi Ben,

Thank you for the reply. So, If I don't want to use Javascript and I want the 
entire page to reload each time, is it being done like this?

1. User submits form via GET
2. Solr server queried via GET
3. Solr server completes query
4. Solr server returns XML output
5. XML data put into results page
6. User shown new results page

Is this basically how it would work if we wanted Javascript out of the equation?

Regards,

James



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988272.html
Sent from the Solr - User mailing list archive at Nabble.com.

Quincy and its subsidiaries do not discriminate in the sale of advertising in 
any medium (broadcast, print, or internet), and will accept no advertising 
which is placed with an intent to discriminate on the basis of race or 
ethnicity.



RE: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Ben Woods
I'm new to Solr...but this is more of a web programming question...so I can get 
in on this :).

Your only option to get the data from Solr sans-Javascript, is the use python 
to pull the results BEFORE the client loads the page.

So, if you are asking if you can get AJAX like results (an already loaded page 
pulling info from your Solr server)...but without using Javascript...no, you 
cannot do that. You might be able to hack something ugly together using 
iframes, but trust me, you don't want to. It will look bad, it won't work well, 
and interacting with data in an iframe is nightmarish.

So, basically, if you don't want to use Javascript, your only option is a total 
page reload every time you need to query Solr (which you then query on the 
python side.)

-Original Message-
From: Spadez [mailto:james_will...@hotmail.com]
Sent: Thursday, June 07, 2012 11:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Help! Confused about using Jquery for the Search query - Want to 
ditch it

Thank you for the reply, but I'm afraid I don't understand :(

This is how things are setup. On my Python website, I have a keyword and 
location box. When clicked, it queries the server via a javascript "GET"
request, it then sends back the data via Json.

I'm saying that I dont want to be reliant on Javascript. So I'm confused about 
the best way to not only send the request to the Solr server, but also how to 
receive the data.

My guess is that a "GET" request without javascript is the right way to send 
the request to the Solr server, but then what should Solr be spitting out the 
other end, just an XML file? Then is the idea that my Python site would receive 
this XML data and display it on the site?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988246.html
Sent from the Solr - User mailing list archive at Nabble.com.

Quincy and its subsidiaries do not discriminate in the sale of advertising in 
any medium (broadcast, print, or internet), and will accept no advertising 
which is placed with an intent to discriminate on the basis of race or 
ethnicity.



RE: need help to integrate SolrJ with my web application.

2012-04-16 Thread Ben McCarthy
Hello,

When I have seen this it usually means the SOLR you are trying to connect to is 
not available. 

Do you have it installed on:

http://localhost:8080/solr

Try opening that address in your browser.  If your running the example solr 
using the embedded Jetty you wont be on 8080 :D

Hope that helps

-Original Message-
From: Vijaya Kumar Tadavarthy [mailto:vijaya.tadavar...@ness.com] 
Sent: 16 April 2012 12:15
To: 'solr-user@lucene.apache.org'
Subject: need help to integrate SolrJ with my web application.

Hi All,

I am trying to integrate solr with my Spring application.

I have performed following steps:

1) Added below list of jars to my webapp lib folder.
apache-solr-cell-3.5.0.jar
apache-solr-core-3.5.0.jar
apache-solr-solrj-3.5.0.jar
commons-codec-1.5.jar
commons-httpclient-3.1.jar
lucene-analyzers-3.5.0.jar
lucene-core-3.5.0.jar
2) I have added Tika jar files for processing binary files.
tika-core-0.10.jar
tika-parsers-0.10.jar
pdfbox-1.6.0.jar
poi-3.8-beta4.jar
poi-ooxml-3.8-beta4.jar
poi-ooxml-schemas-3.8-beta4.jar
poi-scratchpad-3.8-beta4.jar
3) I have modified web.xml added below setup.

SolrRequestFilter

org.apache.solr.servlet.SolrDispatchFilter



SolrRequestFilter
/dataimport


SolrServer

org.apache.solr.servlet.SolrServlet
1


SolrUpdate

org.apache.solr.servlet.SolrUpdateServlet
2


Logging

org.apache.solr.servlet.LogLevelSelection


SolrUpdate
/update/*


Logging
/admin/logging


I am trying to test this setup by running a simple java program with extract 
content of MS Excel file as below

public SolrServer createNewSolrServer()
{
  try {
// setup the server...
String url = "http://localhost:8080/solr";;
CommonsHttpSolrServer s = new CommonsHttpSolrServer( url );
s.setConnectionTimeout(100); // 1/10th sec
s.setDefaultMaxConnectionsPerHost(100);
s.setMaxTotalConnections(100);

// where the magic happens
s.setParser(new BinaryResponseParser());
s.setRequestWriter(new BinaryRequestWriter());

return s;
  }
  catch( Exception ex ) {
throw new RuntimeException( ex );
  }
}
public static void main(String[] args) throws IOException, 
SolrServerException {
IndexFilesSolrCell infil = new IndexFilesSolrCell();
System.setProperty("solr.solr.home", 
"/WebApp/PCS-DMI/WebContent/resources/solr");
SolrServer serverone = infil.createNewSolrServer();
ContentStreamUpdateRequest reqext = new 
ContentStreamUpdateRequest("/update/extract");
reqext.addFile(new File("Open Search Approach.xlsx"));
reqext.setParam(ExtractingParams.EXTRACT_ONLY, "true");
System.out.println("Content Stream Data path: 
"+serverone.toString());
NamedList result = serverone.request(reqext);
System.out.println("Result: " + result);
}
I am getting below exception
Exception in thread "main" org.apache.solr.common.SolrException: Not Found

Not Found
request: 
http://localhost:8080/solr/update/extract?extractOnly=true&wt=javabin&version=2
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:432)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:246)

Please direct me how to extract content...
I  have tried to work with example with solr distribution to extract a MS Excel 
file.
 The file extraction was successful and I could check the metadata using admin 
of example app.

Thanks,
Vijaya Kumar T
PACIFIC COAST STEEL (Pinnacle) Project
Ness Technologies India Pvt. Ltd
1st & 2nd Floor, 2A Maximus Builing, Raheja Mindspace IT Park, Madhapur, 
Hyderabad, 500081, India. | Tel: +91 40 41962079 | Mobile: +91 9963001551 
vijaya.tadavar...@ness.com | www.ness.com

The information contained in this communication is intended solely for the use 
of the individual or entity to whom it is addressed and others authorized to 
receive it. It may contain confidential or legally privileged information. If 
you are not the intended recipient you are hereby notified that any disclosure, 
copying, distribution or taking any action in reliance on the contents of this 
information is strictly prohibited and may be unlawful. If you have received 
this communication in error, please notify us immediately by forwarding this 
email to mailad...@ness.com and then delete it from your system. Ness 
technologies is neither liable for the proper and complete transmission of the 
informati

Faceting and Variable Buckets

2012-04-16 Thread Ben McCarthy
Hello,

Just wondering if the following is possible:

We need to produce facets on ranges but they do not follow a steady increment 
which is all I can see SOLR can produce.  Im looking for a way to produce 
facets on a price field:

0-1000
1000-5000
5000-1
1-2

Any suggestions with out waiting for 
https://issues.apache.org/jira/browse/SOLR-2366

Thanks
Ben




This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: 
Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, 
Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and 
any files transmitted with it are confidential and may be legally privileged, 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses. 



Errors during indexing

2012-04-13 Thread Ben McCarthy
Hello

We have just switched to Solr4 as we needed the ability to return geodist() 
along with our results.

I use a simple multithreaded java app and solr to ingest the data.  We keep 
seeing the following:

13-Apr-2012 15:50:10 org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException: Error handling 'status' 
action
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:546)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:156)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:359)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:175)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.FileNotFoundException: /usr/solr4/data/index/_2jb.fnm (No 
such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccaessFile.java:216)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:219)
at 
org.apache.lucene.codecs.lucene40.Lucene40FieldInfosReader.read(Lucene40FieldInfosReader.java:47)
at 
org.apache.lucene.index.SegmentInfo.loadFieldInfos(SegmentInfo.java:201)
at 
org.apache.lucene.index.SegmentInfo.getFieldInfos(SegmentInfo.java:227)
at org.apache.lucene.index.SegmentInfo.files(SegmentInfo.java:415)
at org.apache.lucene.index.SegmentInfos.files(SegmentInfos.java:756)
at 
org.apache.lucene.index.StandardDirectoryReader$ReaderCommit.(StandardDirectoryReader.java:369)
at 
org.apache.lucene.index.StandardDirectoryReader.getIndexCommit(StandardDirectoryReader.java:354)
at 
org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:558)
at 
org.apache.solr.handler.admin.CoreAdminHandler.getCoreStatus(CoreAdminHandler.java:816)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:537)
... 16 more


This seems to happen when were using the new admin tool.  Im checking on the 
autocommit handler.

Has anyone seen anything similar?

Thanks
Ben




This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: 
Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, 
Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and 
any files transmitted with it are confidential and may be legally privileged, 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses. 



RE: Solr data export to CSV File

2012-04-13 Thread Ben McCarthy
A combination of the CSV response writer and SOLRJ to page through all of the 
results sending it to something like apache commons fileutils:

  FileUtils.writeStringToFile(new File(output.csv), outputLine 
("line.separator"), true);

Would be quiet quick to knock up in Java.

Thanks
Ben

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 13 April 2012 13:28
To: solr-user@lucene.apache.org
Subject: Re: Solr data export to CSV File

Does this help?

http://wiki.apache.org/solr/CSVResponseWriter

Best
Erick

On Fri, Apr 13, 2012 at 7:59 AM, Pavnesh  
wrote:
> Hi Team,
>
>
>
> A very-very thanks to you guy who had developed such a nice product.
>
> I have one query regarding solr that I have app 36 Million data in my
> solr and I wants to export all the data to a csv file but I have found
> nothing on the same  so please help me on this topic .
>
>
>
>
>
> Regards
>
> Pavnesh
>
>
>




This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: 
Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, 
Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and 
any files transmitted with it are confidential and may be legally privileged, 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses. 



RE: Simple Slave Replication Question

2012-03-26 Thread Ben McCarthy
That's great information.

Thanks for all the help and guidance, its been invaluable.

Thanks
Ben

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 26 March 2012 12:21
To: solr-user@lucene.apache.org
Subject: Re: Simple Slave Replication Question

It's the optimize step. Optimize essentially forces all the segments to be 
copied into a single new segment, which means that your entire index will be 
replicated to the slaves.

In recent Solrs, there's usually no need to optimize, so unless and until you 
can demonstrate a noticeable change, I'd just leave the optimize step off. In 
fact, trunk renames it to forceMerge or something just because it's so common 
for people to think "of course I want to optimize my index!" and get the 
unintended consequences you're seeing even thought the optimize doesn't 
actually do that much good in most cases.

Some people just do the optimize once a day (or week or whatever) during 
off-peak hours as a compromise.

Best
Erick


On Mon, Mar 26, 2012 at 5:02 AM, Ben McCarthy  
wrote:
> Hello,
>
> Had to leave the office so didn't get a chance to reply.  Nothing in the 
> logs.  Just ran one through from the ingest tool.
>
> Same results full copy of the index.
>
> Is it something to do with:
>
> server.commit();
> server.optimize();
>
> I call this at the end of the ingestion.
>
> Would optimize then work across the whole index?
>
> Thanks
> Ben
>
> -Original Message-
> From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
> Sent: 23 March 2012 15:10
> To: solr-user@lucene.apache.org
> Subject: Re: Simple Slave Replication Question
>
> Also, what happens if, instead of adding the 40K docs you add just one and 
> commit?
>
> 2012/3/23 Tomás Fernández Löbbe 
>
>> Have you changed the mergeFactor or are you using 10 as in the
>> example solrconfig?
>>
>> What do you see in the slave's log during replication? Do you see any
>> line like "Skipping download for..."?
>>
>>
>> On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy <
>> ben.mccar...@tradermedia.co.uk> wrote:
>>
>>> I just have a index directory.
>>>
>>> I push the documents through with a change to a field.  Im using
>>> SOLRJ to do this.  Im using the guide from the wiki to setup the
>>> replication.  When the feed of updates to the master finishes I call
>>> a commit again using SOLRJ.  I then have a poll period of 5 minutes
>>> from the slave.  When it kicks in I see a new version of the index
>>> and then it copys the full 5gb index.
>>>
>>> Thanks
>>> Ben
>>>
>>> -Original Message-
>>> From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
>>> Sent: 23 March 2012 14:29
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Simple Slave Replication Question
>>>
>>> Hi Ben, only new segments are replicated from master to slave. In a
>>> situation where all the segments are new, this will cause the index
>>> to be fully replicated, but this rarely happen with incremental
>>> updates. It can also happen if the slave Solr assumes it has an "invalid" 
>>> index.
>>> Are you committing or optimizing on the slaves? After replication,
>>> the index directory on the slaves is called "index" or "index."?
>>>
>>> Tomás
>>>
>>> On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy <
>>> ben.mccar...@tradermedia.co.uk> wrote:
>>>
>>> > So do you just simpy address this with big nic and network pipes.
>>> >
>>> > -Original Message-
>>> > From: Martin Koch [mailto:m...@issuu.com]
>>> > Sent: 23 March 2012 14:07
>>> > To: solr-user@lucene.apache.org
>>> > Subject: Re: Simple Slave Replication Question
>>> >
>>> > I guess this would depend on network bandwidth, but we move around
>>> > 150G/hour when hooking up a new slave to the master.
>>> >
>>> > /Martin
>>> >
>>> > On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy <
>>> > ben.mccar...@tradermedia.co.uk> wrote:
>>> >
>>> > > Hello,
>>> > >
>>> > > Im looking at the replication from a master to a number of slaves.
>>> > > I have configured it and it appears to be working.  When
>>> > > updating 40K records on the master is it standard to always copy
>>> > > over the full index, currently 5gb in size.  If this is stand

RE: Simple Slave Replication Question

2012-03-26 Thread Ben McCarthy
Hello,

Had to leave the office so didn't get a chance to reply.  Nothing in the logs.  
Just ran one through from the ingest tool.

Same results full copy of the index.

Is it something to do with:

server.commit();
server.optimize();

I call this at the end of the ingestion.

Would optimize then work across the whole index?

Thanks
Ben

-Original Message-
From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
Sent: 23 March 2012 15:10
To: solr-user@lucene.apache.org
Subject: Re: Simple Slave Replication Question

Also, what happens if, instead of adding the 40K docs you add just one and 
commit?

2012/3/23 Tomás Fernández Löbbe 

> Have you changed the mergeFactor or are you using 10 as in the example
> solrconfig?
>
> What do you see in the slave's log during replication? Do you see any
> line like "Skipping download for..."?
>
>
> On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy <
> ben.mccar...@tradermedia.co.uk> wrote:
>
>> I just have a index directory.
>>
>> I push the documents through with a change to a field.  Im using
>> SOLRJ to do this.  Im using the guide from the wiki to setup the
>> replication.  When the feed of updates to the master finishes I call
>> a commit again using SOLRJ.  I then have a poll period of 5 minutes
>> from the slave.  When it kicks in I see a new version of the index
>> and then it copys the full 5gb index.
>>
>> Thanks
>> Ben
>>
>> -Original Message-
>> From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
>> Sent: 23 March 2012 14:29
>> To: solr-user@lucene.apache.org
>> Subject: Re: Simple Slave Replication Question
>>
>> Hi Ben, only new segments are replicated from master to slave. In a
>> situation where all the segments are new, this will cause the index
>> to be fully replicated, but this rarely happen with incremental
>> updates. It can also happen if the slave Solr assumes it has an "invalid" 
>> index.
>> Are you committing or optimizing on the slaves? After replication,
>> the index directory on the slaves is called "index" or "index."?
>>
>> Tomás
>>
>> On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy <
>> ben.mccar...@tradermedia.co.uk> wrote:
>>
>> > So do you just simpy address this with big nic and network pipes.
>> >
>> > -Original Message-
>> > From: Martin Koch [mailto:m...@issuu.com]
>> > Sent: 23 March 2012 14:07
>> > To: solr-user@lucene.apache.org
>> > Subject: Re: Simple Slave Replication Question
>> >
>> > I guess this would depend on network bandwidth, but we move around
>> > 150G/hour when hooking up a new slave to the master.
>> >
>> > /Martin
>> >
>> > On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy <
>> > ben.mccar...@tradermedia.co.uk> wrote:
>> >
>> > > Hello,
>> > >
>> > > Im looking at the replication from a master to a number of slaves.
>> > > I have configured it and it appears to be working.  When updating
>> > > 40K records on the master is it standard to always copy over the
>> > > full index, currently 5gb in size.  If this is standard what do
>> > > people do who have massive 200gb indexs, does it not take a while
>> > > to bring the
>> > slaves inline with the master?
>> > >
>> > > Thanks
>> > > Ben
>> > >
>> > > 
>> > >
>> > >
>> > > This e-mail is sent on behalf of Trader Media Group Limited,
>> > > Registered
>> > > Office: Auto Trader House, Cutbush Park Industrial Estate,
>> > > Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in 
>> > > England No.
>> > 4768833).
>> > > This email and any files transmitted with it are confidential and
>> > > may be legally privileged, and intended solely for the use of the
>> > > individual or entity to whom they are addressed. If you have
>> > > received this email in error please notify the sender. This email
>> > > message has been swept for the presence of computer viruses.
>> > >
>> > >
>> >
>> > 
>> >
>> >
>> > This e-mail is sent on behalf of Trader Media Group Limited,
>> > Registered
>> > Office: Auto Trader House, Cutbush Park Industrial Estate,
>> > Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in Englan

RE: Simple Slave Replication Question

2012-03-23 Thread Ben McCarthy
I just have a index directory.

I push the documents through with a change to a field.  Im using SOLRJ to do 
this.  Im using the guide from the wiki to setup the replication.  When the 
feed of updates to the master finishes I call a commit again using SOLRJ.  I 
then have a poll period of 5 minutes from the slave.  When it kicks in I see a 
new version of the index and then it copys the full 5gb index.

Thanks
Ben

-Original Message-
From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
Sent: 23 March 2012 14:29
To: solr-user@lucene.apache.org
Subject: Re: Simple Slave Replication Question

Hi Ben, only new segments are replicated from master to slave. In a situation 
where all the segments are new, this will cause the index to be fully 
replicated, but this rarely happen with incremental updates. It can also happen 
if the slave Solr assumes it has an "invalid" index.
Are you committing or optimizing on the slaves? After replication, the index 
directory on the slaves is called "index" or "index."?

Tomás

On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy < 
ben.mccar...@tradermedia.co.uk> wrote:

> So do you just simpy address this with big nic and network pipes.
>
> -Original Message-
> From: Martin Koch [mailto:m...@issuu.com]
> Sent: 23 March 2012 14:07
> To: solr-user@lucene.apache.org
> Subject: Re: Simple Slave Replication Question
>
> I guess this would depend on network bandwidth, but we move around
> 150G/hour when hooking up a new slave to the master.
>
> /Martin
>
> On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy <
> ben.mccar...@tradermedia.co.uk> wrote:
>
> > Hello,
> >
> > Im looking at the replication from a master to a number of slaves.
> > I have configured it and it appears to be working.  When updating
> > 40K records on the master is it standard to always copy over the
> > full index, currently 5gb in size.  If this is standard what do
> > people do who have massive 200gb indexs, does it not take a while to
> > bring the
> slaves inline with the master?
> >
> > Thanks
> > Ben
> >
> > 
> >
> >
> > This e-mail is sent on behalf of Trader Media Group Limited,
> > Registered
> > Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill,
> > Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No.
> 4768833).
> > This email and any files transmitted with it are confidential and
> > may be legally privileged, and intended solely for the use of the
> > individual or entity to whom they are addressed. If you have
> > received this email in error please notify the sender. This email
> > message has been swept for the presence of computer viruses.
> >
> >
>
> 
>
>
> This e-mail is sent on behalf of Trader Media Group Limited,
> Registered
> Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill,
> Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833).
> This email and any files transmitted with it are confidential and may
> be legally privileged, and intended solely for the use of the
> individual or entity to whom they are addressed. If you have received
> this email in error please notify the sender. This email message has
> been swept for the presence of computer viruses.
>
>




This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: 
Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, 
Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and 
any files transmitted with it are confidential and may be legally privileged, 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses. 



RE: Simple Slave Replication Question

2012-03-23 Thread Ben McCarthy
So do you just simpy address this with big nic and network pipes.

-Original Message-
From: Martin Koch [mailto:m...@issuu.com]
Sent: 23 March 2012 14:07
To: solr-user@lucene.apache.org
Subject: Re: Simple Slave Replication Question

I guess this would depend on network bandwidth, but we move around 150G/hour 
when hooking up a new slave to the master.

/Martin

On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy < 
ben.mccar...@tradermedia.co.uk> wrote:

> Hello,
>
> Im looking at the replication from a master to a number of slaves.  I
> have configured it and it appears to be working.  When updating 40K
> records on the master is it standard to always copy over the full
> index, currently 5gb in size.  If this is standard what do people do
> who have massive 200gb indexs, does it not take a while to bring the slaves 
> inline with the master?
>
> Thanks
> Ben
>
> 
>
>
> This e-mail is sent on behalf of Trader Media Group Limited,
> Registered
> Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill,
> Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833).
> This email and any files transmitted with it are confidential and may
> be legally privileged, and intended solely for the use of the
> individual or entity to whom they are addressed. If you have received
> this email in error please notify the sender. This email message has
> been swept for the presence of computer viruses.
>
>




This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: 
Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, 
Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and 
any files transmitted with it are confidential and may be legally privileged, 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses. 



Simple Slave Replication Question

2012-03-23 Thread Ben McCarthy
Hello,

Im looking at the replication from a master to a number of slaves.  I have 
configured it and it appears to be working.  When updating 40K records on the 
master is it standard to always copy over the full index, currently 5gb in 
size.  If this is standard what do people do who have massive 200gb indexs, 
does it not take a while to bring the slaves inline with the master?

Thanks
Ben




This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: 
Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, 
Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and 
any files transmitted with it are confidential and may be legally privileged, 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses. 



Data Import Handler Delta Import and Debug Mode Help

2011-12-08 Thread Ben McCarthy
Good Afternoon,



Im looking at Deltas via a DeltaImportHandler.  I was running Solr 1.4.1
but just upgraded to 3.5.  Previously I was able to run debug and verbose
from:



http://localhost:8080/solr/admin/dataimport.jsp?handler=/advert



But since upgrading when choosing these options the right panel does not
populate with anything.  Am I missing something when i upgraded as I copied
all relevant jars to my classpath.


This is proving a problem as im trying to debug why my delta import is not
picking up any records:



  





The entity does have two nested entitys with in it.



When I run the query for the delta on the DB I get back the expected 100
stock id’s





Any help would be appreciated.



Thanks

Ben


updating schema.xml in production solr, multiple cores

2011-10-12 Thread Ben Hsu
Hello Solr users.

My organization is working on a solr implementation with multiple cores. I
want to prepare us for the day when we'll need to make a change to our
schema.xml, and roll that change into our production environment.

I believe we'll need to perform the following steps:
# delete all our documents
# update the schema.xml in all our cores
# rebuild the index from the source data.

Have any of you done this? What has your experience been?


RE: changing the root directory where solrCloud stores info inside zookeeper File system

2011-08-02 Thread Yatir Ben Shlomo
Thanks A lot mark,
Since My SolrCloud code was old I tried downloading and building the
newest code from here
https://svn.apache.org/repos/asf/lucene/dev/trunk/
I am using tomcat6
I manually created the sc sub-directory in my zooKeeper ensemble
file-system
I used this connection String to my ZK ensemble
zook1:2181/sc,zook2:2181/sc,zook3:2181/sc
but I still get the same problem
here is the entire catalina.out log with the exception

Using CATALINA_BASE:   /opt/tomcat6
Using CATALINA_HOME:   /opt/tomcat6
Using CATALINA_TMPDIR: /opt/tomcat6/temp
Using JRE_HOME:/usr/java/default/
Using CLASSPATH:   /opt/tomcat6/bin/bootstrap.jar
Java HotSpot(TM) 64-Bit Server VM warning: Failed to reserve shared memory
(errno = 12).
Aug 2, 2011 4:28:46 AM org.apache.catalina.core.AprLifecycleListener init
INFO: The APR based Apache Tomcat Native library which allows optimal
performance in production environments was not found on the
java.library.path:
/usr/java/jdk1.6.0_21/jre/lib/amd64/server:/usr/java/jdk1.6.0_21/jre/lib/a
md64:/usr/java/jdk1.6.0_21/jre/../lib/amd64:/usr/java/packages/lib/amd64:/
usr/lib64:/lib64:/lib:/usr/lib
Aug 2, 2011 4:28:46 AM org.apache.coyote.http11.Http11Protocol init
INFO: Initializing Coyote HTTP/1.1 on http-8983
Aug 2, 2011 4:28:46 AM org.apache.coyote.http11.Http11Protocol init
INFO: Initializing Coyote HTTP/1.1 on http-8080
Aug 2, 2011 4:28:46 AM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 448 ms
Aug 2, 2011 4:28:46 AM org.apache.catalina.core.StandardService start
INFO: Starting service Catalina
Aug 2, 2011 4:28:46 AM org.apache.catalina.core.StandardEngine start
INFO: Starting Servlet Engine: Apache Tomcat/6.0.29
Aug 2, 2011 4:28:46 AM org.apache.catalina.startup.HostConfig
deployDescriptor
INFO: Deploying configuration descriptor solr1.xml
Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: Using JNDI solr.home: /home/tomcat/solrCloud1
Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader 
INFO: Solr home set to '/home/tomcat/solrCloud1/'
Aug 2, 2011 4:28:46 AM org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init()
Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: Using JNDI solr.home: /home/tomcat/solrCloud1
Aug 2, 2011 4:28:46 AM org.apache.solr.core.CoreContainer$Initializer
initialize
INFO: looking for solr.xml: /home/tomcat/solrCloud1/solr.xml
Aug 2, 2011 4:28:46 AM org.apache.solr.core.CoreContainer 
INFO: New CoreContainer 853527367
Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: Using JNDI solr.home: /home/tomcat/solrCloud1
Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader 
INFO: Solr home set to '/home/tomcat/solrCloud1/'
Aug 2, 2011 4:28:46 AM org.apache.solr.cloud.SolrZkServerProps
getProperties
INFO: Reading configuration from: /home/tomcat/solrCloud1/zoo.cfg
Aug 2, 2011 4:28:46 AM org.apache.solr.core.CoreContainer initZooKeeper
INFO: Zookeeper client=zook1:2181/sc,zook2:2181/sc,zook3:2181/sc
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:zookeeper.version=3.3.1-942149, built on
05/07/2010 17:14 GMT
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:host.name=ob1079.nydc1.outbrain.com
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.version=1.6.0_21
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.vendor=Sun Microsystems Inc.
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.home=/usr/java/jdk1.6.0_21/jre
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.class.path=/opt/tomcat6/bin/bootstrap.jar
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client
environment:java.library.path=/usr/java/jdk1.6.0_21/jre/lib/amd64/server:/
usr/java/jdk1.6.0_21/jre/lib/amd64:/usr/java/jdk1.6.0_21/jre/../lib/amd64:
/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.io.tmpdir=/opt/tomcat6/temp
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.compiler=
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:os.name=Linux
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:os.arch=amd64
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:os.version=2.6.18-194.8.1.el5
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:user.name=tomcat
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:user.home=/home/tomcat
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:user.dir=/opt/tomcat6
Aug 2, 201

changing the root directory where solrCloud stores info inside zookeeper File system

2011-07-26 Thread Yatir Ben Shlomo
Hi!

I am using solrCloud with a zookeeper ensamble of 3.

I noticed that solcOuld stores information direclt under the root dir in the
ZooKeepr file system:

\config \live_nodes \ collections

In my setup Zookeepr is also used by other modules so I would like solrCLoud
to store everything under /solrCLoud/ or something similar



Is there a property for that or do I need to custom code it ?

Thanks


Solr Request Logging

2011-07-14 Thread Ben Roubicek
I am using the trunk version of solr and I am getting a ton more logging 
information than I really care to see and definitely more than 1.4, but I cant 
really see a way to change it.

A little background:
I am faceting on fields that have a very high number of distinct values and 
also returning large numbers of documents in a sharded environment.

For example:
INFO: [core1] webapp=/solr path=/select 
params={facet=true&attr_lng_rng_low.revenue__terms=& 
moreParams ...}

Another example:
INFO: [core1] webapp=/solr path=/select 
params={facet=false&facet.mincount=1&ids=&moreParams...}

In just a few minutes, I have racked up 10MB of log my dev environment.   Any 
ideas for a sane way of handling these messages?  I imagine its slowing down 
Solr as well.

Thanks
-Ben


Localized alphabetical order

2011-04-22 Thread Ben Preece
As someone who's new to Solr/Lucene, I'm having trouble finding 
information on sorting results in localized alphabetical order. I've 
ineffectively searched the wiki and the mail archives.


I'm thinking for example about Hawai'ian, where mīka (with an i-macron) 
comes after mika (i without the macron) but before miki (also without 
the macron), or about Welsh, where the digraphs (ch, dd, etc.) are 
treated as single letters, or about Ojibwe, where the apostrophe ' is a 
letter which sorts between h and i.


How do non-English languages typically handle this?

-Ben


Re: Field Analyzers: which values are indexed?

2011-04-13 Thread Ben Davies
Thanks both for your replies

Eric,
Yep, I use the Analysis page extensively, but what I was directly looking
for was whether all of only the last line of values given by the analysis
page, where eventually indexed.
I think we've concluded it's only the last line.

Cheers,
Ben

On Wed, Apr 13, 2011 at 2:41 PM, Erick Erickson wrote:

> CharFilterFactories are applied to the raw input before tokenization.
> Each token output from the tokenization is then sent through
> the rest of the chain.
>
> The Analysis page available from the Solr admin page is
> invaluable in answering in great detail what each part of
> an analysis chain does.
>
> TokenFilterFactories are applied to each token emitted from
> the tokenizer, and this includes the similar
> PatternReplaceFilterFactory. The difference is that the
> PatternReplaceCharFilterFactory is applied before tokenization
> to the entire input stream and PatternReplaceFilterFactory
> is applied to each token emitted by the tokenizer.
>
> And to make it even more fun, you can do both!
>
> Best
> Erick
>
> On Wed, Apr 13, 2011 at 8:14 AM, Ben Davies  wrote:
>
> > Hi there,
> >
> > Just a quick question that the wiki page (
> > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters) didn't seem
> > to
> > answer very well.
> >
> > Given an analyzer that has  zero or more Char Filter Factories, one
> > Tokenizer Factory, and zero or more Token Filter Factories, which
> value(s)
> > are indexed?
> >
> > Is every value that is produced from each char filter, tokenizer, and
> > filter
> > indexed?
> > Or is the only the final value after completing the whole chain indexed?
> >
> > Cheers,
> > Ben
> >
>


Field Analyzers: which values are indexed?

2011-04-13 Thread Ben Davies
Hi there,

Just a quick question that the wiki page (
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters) didn't seem to
answer very well.

Given an analyzer that has  zero or more Char Filter Factories, one
Tokenizer Factory, and zero or more Token Filter Factories, which value(s)
are indexed?

Is every value that is produced from each char filter, tokenizer, and filter
indexed?
Or is the only the final value after completing the whole chain indexed?

Cheers,
Ben


Re: Indexing data with Trade Mark Symbol

2011-04-05 Thread Ben Davies
Use admin/analysis.jsp to see which filter is removing it.
Configure a field type appropriate to what you want to index.

On Mon, Apr 4, 2011 at 9:55 AM, mechravi25  wrote:

> Hi,
>  Has anyone indexed the data with Trade Mark symbol??...when i tried to
> index, the data appears as below.
>
> Data:
>  79797 - Siebel Research– AI Fund,
>  79797 - Siebel Research– AI Fund,l
>
>
> Original Data:
> 79797 - Siebel Research™ AI Fund,
>
>
> Please help me to resolve this
>
> Regards,
> Ravi
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-data-with-Trade-Mark-Symbol-tp2774421p2774421.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: question on solr.ASCIIFoldingFilterFactory

2011-04-05 Thread Ben Davies
I can't remember where I read it, but I think MappingCharFilterFactory is
prefered.
There is an example in the example schema.



>From this, I get:
org.apache.solr.analysis.MappingCharFilterFactory
{mapping=mapping-ISOLatin1Accent.txt}
|text|despues|



On Tue, Apr 5, 2011 at 5:06 PM, Nemani, Raj  wrote:

> All,
>
> I am using solr.ASCIIFoldingFilterFactory to perform accent insensitive
> search.  One of the words that got indexed as part my indexing process is
> "después".  Having used the ASCIIFoldingFilterFactory,I expected that If I
> searched for word "despues" I should have the document containing the word
> "después" show up in the results but that was not the case.  Then I used the
> Analysis.jsp to analyze "después" and noticed that the
> ASCIIFoldingFilterFactory folded "después" as "despue".
>
>
>
> If I repeat the above exercise for the word "Imágenes", then Analysis.jsp
> tell me that the ASCIIFoldingFilterFactory folded "Imágenes" as "imagen".
>  But I can search for "Imagenes" and get the correct results.
>
>
>
> I am not familiar with Spanish but I found the above behavior confusing.
>  Can anybody please explain the behavior described above?
>
>
>
> Thank a million in advance
>
> Raj
>
>
>
>


MoreLikeThis with document that has not been indexed

2011-03-30 Thread Ben Anhalt
Hello,

It is currently possible to use the MoreLikeThis handler to find documents
similar to a given document in the index.

Is there any way to feed the handler a new document in XML or JSON (as one
would do for adding to the index) and have it find similar documents without
indexing the target document?  I understand that it is possible to do a MLT
query using free text, but I want to utilize structured data.

Thanks,

Ben

-- 
Ben Anhalt
ben.anh...@gmail.com
Mi parolas Esperante.


Cache size

2011-02-08 Thread Mehdi Ben Haj Abbes
Hi folks,

Is there any way to know the size *in bytes* occupied by a cache (filter
cache, doc cache ...)? I don't find such information within the stats page.

Regards

-- 
Mehdi BEN HAJ ABBES


Re: xpath processing

2010-10-23 Thread Ben Boggess
> processor="FileListEntityProcessor" fileName=".*xml" recursive="true" 

Shouldn't this be fileName="*.xml"?

Ben

On Oct 22, 2010, at 10:52 PM, pghorp...@ucla.edu wrote:

> 
> 
> 
> 
> 
>  processor="FileListEntityProcessor" fileName=".*xml" recursive="true" 
> baseDir="C:\data\sample_records\mods\starr">
>  url="${f.fileAbsolutePath}" stream="false" forEach="/mods" 
> transformer="DateFormatTransformer,RegexTransformer,TemplateTransformer">
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  />
> 
> 
> 
> 
> 
> Quoting Ken Stanley :
> 
>> Parinita,
>> 
>> In its simplest form, what does your entity definition for DIH look like;
>> also, what does one record from your xml look like? We need more information
>> before we can really be of any help. :)
>> 
>> - Ken
>> 
>> It looked like something resembling white marble, which was
>> probably what it was: something resembling white marble.
>>-- Douglas Adams, "The Hitchhikers Guide to the Galaxy"
>> 
>> 
>> On Fri, Oct 22, 2010 at 8:00 PM,  wrote:
>> 
>>> Quoting pghorp...@ucla.edu:
>>> Can someone help me please?
>>> 
>>> 
>>>> I am trying to import mods xml data in solr using  the xml/http datasource
>>>> 
>>>> This does not work with XPathEntityProcessor of the data import handler
>>>> xpath="/mods/name/namepa...@type = 'date']"
>>>> 
>>>> I actually have 143 records with type attribute as 'date' for element
>>>> namePart.
>>>> 
>>>> Thank you
>>>> Parinita
>>>> 
>>>> 
>>> 
>>> 
>> 
> 
> 


Re: How to delete a SOLR document if that particular data doesnt exist in DB?

2010-10-20 Thread ben boggess
> Now my question is.. Is there a way I can use preImportDeleteQuery to
> delete
> the documents from SOLR for which the data doesnt exist in back end db? I
> dont have anything called delete status in DB, instead I need to get all
> the
> UID's from SOLR document and compare it with all the UID's in back end and
> delete the data from SOLR document for the UID's which is not present in
> DB.

I've done something like this with raw Lucene and I'm not sure how or if you
could do it with Solr as I'm relatively new to it.

We stored a timestamp for when we started to import and stored an update
timestamp field for every document added to the index.  After the data
import, we did a delete by query that matched all documents with a timestamp
older than when we started.  The assumption being that if we didn't update
the timestamp during the load, then the record must have been deleted from
the database.

Hope this helps.

Ben

On Wed, Oct 20, 2010 at 8:05 PM, Erick Erickson wrote:

> << and
> do a complete re-indexing each week also we want to delete the orphan solr
> documents (for which the data is not present in back end DB) on a daily
> basis.>>>
>
> Can you make delete by query work? Something like delete all Solr docs of
> a certain type and do a full re-index of just that type?
>
> I have no idea whether this is practical or not
>
> But your solution also works. There's really no way Solr #can# know about
> deleted database records, especially since the  field is
> completely
> arbitrarily defined.
>
> Best
> Erick
>
> On Wed, Oct 20, 2010 at 10:51 AM, bbarani  wrote:
>
> >
> > Hi,
> >
> > I have a very common question but couldnt find any post related to my
> > question in this forum,
> >
> > I am currently initiating a full import each week but the data that have
> > been deleted in the source is not update in my document as I am using
> > clean=false.
> >
> > We are indexing multiple data by data types hence cant delete the index
> and
> > do a complete re-indexing each week also we want to delete the orphan
> solr
> > documents (for which the data is not present in back end DB) on a daily
> > basis.
> >
> > Now my question is.. Is there a way I can use preImportDeleteQuery to
> > delete
> > the documents from SOLR for which the data doesnt exist in back end db? I
> > dont have anything called delete status in DB, instead I need to get all
> > the
> > UID's from SOLR document and compare it with all the UID's in back end
> and
> > delete the data from SOLR document for the UID's which is not present in
> > DB.
> >
> > Any suggestion / ideas would be of great help.
> >
> > Note: Currently I have developed a simple program which will fetch the
> > UID's
> > from SOLR document and then connect to backend DB to check the orphan
> UID's
> > and delete the documents from SOLR index corresponding to orphan UID's. I
> > just dont want to re-invent the wheel if this feature is already present
> in
> > SOLR as I need to do more testing in terms of performance / scalability
> for
> > my program..
> >
> > Thanks,
> > Barani
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/How-to-delete-a-SOLR-document-if-that-particular-data-doesnt-exist-in-DB-tp1739222p1739222.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


Re: Multiple indexes inside a single core

2010-10-20 Thread Ben Boggess
Thanks Erick.  The problem with multiple cores is that the documents are scored 
independently in each core.  I would like to be able to search across both 
cores and have the scores 'normalized' in a way that's similar to what Lucene's 
MultiSearcher would do.  As far a I understand, multiple cores would likely 
result in seriously skewed scores in my case since the documents are not 
distributed evenly or randomly.  I could have one core/index with 20 million 
docs and another with 200.

I've poked around in the code and this feature doesn't seem to exist.  I would 
be happy with finding a decent place to try to add it.  I'm not sure if there 
is a clean place for it.

Ben

On Oct 20, 2010, at 8:36 PM, Erick Erickson  wrote:

> It seems to me that multiple cores are along the lines you
> need, a single instance of Solr that can search across multiple
> sub-indexes that do not necessarily share schemas, and are
> independently maintainable..
> 
> This might be a good place to start: http://wiki.apache.org/solr/CoreAdmin
> 
> HTH
> Erick
> 
> On Wed, Oct 20, 2010 at 3:23 PM, ben boggess  wrote:
> 
>> We are trying to convert a Lucene-based search solution to a
>> Solr/Lucene-based solution.  The problem we have is that we currently have
>> our data split into many indexes and Solr expects things to be in a single
>> index unless you're sharding.  In addition to this, our indexes wouldn't
>> work well using the distributed search functionality in Solr because the
>> documents are not evenly or randomly distributed.  We are currently using
>> Lucene's MultiSearcher to search over subsets of these indexes.
>> 
>> I know this has been brought up a number of times in previous posts and the
>> typical response is that the best thing to do is to convert everything into
>> a single index.  One of the major reasons for having the indexes split up
>> the way we do is because different types of data need to be indexed at
>> different intervals.  You may need one index to be updated every 20 minutes
>> and another is only updated every week.  If we move to a single index, then
>> we will constantly be warming and replacing searchers for the entire
>> dataset, and will essentially render the searcher caches useless.  If we
>> were able to have multiple indexes, they would each have a searcher and
>> updates would be isolated to a subset of the data.
>> 
>> The other problem is that we will likely need to shard this large single
>> index and there isn't a clean way to shard randomly and evenly across the
>> of
>> the data.  We would, however like to shard a single data type.  If we could
>> use multiple indexes, we would likely be also sharding a small sub-set of
>> them.
>> 
>> Thanks in advance,
>> 
>> Ben
>> 


Multiple indexes inside a single core

2010-10-20 Thread ben boggess
We are trying to convert a Lucene-based search solution to a
Solr/Lucene-based solution.  The problem we have is that we currently have
our data split into many indexes and Solr expects things to be in a single
index unless you're sharding.  In addition to this, our indexes wouldn't
work well using the distributed search functionality in Solr because the
documents are not evenly or randomly distributed.  We are currently using
Lucene's MultiSearcher to search over subsets of these indexes.

I know this has been brought up a number of times in previous posts and the
typical response is that the best thing to do is to convert everything into
a single index.  One of the major reasons for having the indexes split up
the way we do is because different types of data need to be indexed at
different intervals.  You may need one index to be updated every 20 minutes
and another is only updated every week.  If we move to a single index, then
we will constantly be warming and replacing searchers for the entire
dataset, and will essentially render the searcher caches useless.  If we
were able to have multiple indexes, they would each have a searcher and
updates would be isolated to a subset of the data.

The other problem is that we will likely need to shard this large single
index and there isn't a clean way to shard randomly and evenly across the of
the data.  We would, however like to shard a single data type.  If we could
use multiple indexes, we would likely be also sharding a small sub-set of
them.

Thanks in advance,

Ben


possible bug in zookeeper / solrCloud ?

2010-09-14 Thread Yatir Ben Shlomo
Hi I am using solrCloud which uses an ensemble of 3 zookeeper instances.

I am performing survivability  tests:
Taking one of the zookeeper instances down I would expect the client to use a 
different zookeeper server instance.

But as you can see in the below logs attached
Depending on which instance I choose to take down (in my case,  the last one in 
the list of zookeeper servers)
the client is constantly insisting on the same zookeeper server (Attempting 
connection to server zook3/192.168.252.78:2181)
and not switching to a different one
Any one has an idea on this ?

Solr cloud currently is using  zookeeper-3.2.2.jar
Is this a know bug that was fixed in later versions ?( 3.3.1)

Thanks in advance,
Yatir


Logs:

Sep 14, 2010 9:02:20 AM org.apache.log4j.Category warn
WARNING: Ignoring exception during shutdown input
java.nio.channels.ClosedChannelException
at 
sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
at 
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java):999)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970)
Sep 14, 2010 9:02:20 AM org.apache.log4j.Category warn
WARNING: Ignoring exception during shutdown output
java.nio.channels.ClosedChannelException
at 
sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
at 
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java):1004)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970)
Sep 14, 2010 9:02:22 AM org.apache.log4j.Category info
INFO: Attempting connection to server zook3/192.168.252.78:2181
Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn
WARNING: Exception closing session 0x32b105244a20001 to 
sun.nio.ch.selectionkeyi...@3ca58cbf
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):933)
Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn
WARNING: Ignoring exception during shutdown input
java.nio.channels.ClosedChannelException
at 
sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
at 
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java):999)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970)
Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn
WARNING: Ignoring exception during shutdown output
java.nio.channels.ClosedChannelException
at 
sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
at 
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java):1004)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970)
Sep 14, 2010 9:02:22 AM org.apache.log4j.Category info
INFO: Attempting connection to server zook3/192.168.252.78:2181
Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn
WARNING: Exception closing session 0x32b105244a2 to 
sun.nio.ch.selectionkeyi...@3960f81b
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):933)
Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn
WARNING: Ignoring exception during shutdown input
java.nio.channels.ClosedChannelException
at 
sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
at 
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java):999)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970)
Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn
WARNING: Ignoring exception during shutdown output
java.nio.channels.ClosedChannelException
at 
sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
at 
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java):1004)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970)



solrCloud zookeepr related excpetions

2010-08-25 Thread Yatir Ben Shlomo
Hi I am running a zookeeper ensemble of 3 zookeeper instances
and established a solrCloud to work with it (2 masters , 2 slaves)
on each master machine I have 2 shards (4 shards in total)
on one of the masters I keep noticing ZooKeeper related exceptions which I 
can't understand:
One appears to be  TIME OUT in (ClientCnxn.java):906
And the other is java.lang.IllegalArgumentException: Path cannot be null 
(PathUtils.java:45)

Here are my logs (I set the log level to FINE on zookeeper package)

 Anyone can identify the issue?



FINE: Reading reply sessionid:0x12a97312613010b, packet:: clientPath:null 
serverPath:null finished:false header:: -8,101  replyHeader:: -8,-1,0  
request:: 
30064776552,v{'/collections},v{},v{'/collections/ENPwl/shards/ENPWL1,'/collections/ENPwl/shards/ENPWL4,'/collections/ENPwl/shards/ENPWL2,'/collections,'/collections/ENPwl/shards/ENPWL3,'/collections/ENPwlMaster/shards/ENPWLMaster_3,'/collections/ENPwlMaster/shards/ENPWLMaster_4,'/live_nodes,'/collections/ENPwlMaster/shards/ENPWLMaster_1,'/collections/ENPwlMaster/shards/ENPWLMaster_2}
  response:: null
Aug 25, 2010 5:18:19 AM org.apache.log4j.Category debug
FINE: Reading reply sessionid:0x12a97312613010b, packet:: clientPath:null 
serverPath:null finished:false header:: 540,8  replyHeader:: 540,-1,0  
request:: '/collections,F  response:: v{'ENPwl,'ENPwlMaster}
Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader 
updateCloudState
INFO: Cloud state update for ZooKeeper already scheduled
Aug 25, 2010 5:18:19 AM org.apache.log4j.Category error
SEVERE: Error while calling watcher
java.lang.IllegalArgumentException: Path cannot be null
at org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:45)
at 
org.apache.zookeeper.ZooKeeper.getChildren(zookeeper:ZooKeeper.java):1196)
at 
org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:200)
at 
org.apache.solr.common.cloud.ZkStateReader$5.process(ZkStateReader.java:315)
at 
org.apache.zookeeper.ClientCnxn$EventThread.run(zookeeper:ClientCnxn.java):425)
Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader$4 process
INFO: Detected a shard change under ShardId:ENPWL3 in collection:ENPwl
Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader 
updateCloudState
INFO: Cloud state update for ZooKeeper already scheduled
Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader$4 process
INFO: Detected a shard change under ShardId:ENPWL4 in collection:ENPwl
Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader 
updateCloudState
INFO: Cloud state update for ZooKeeper already scheduled
Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader$4 process
INFO: Detected a shard change under ShardId:ENPWL1 in collection:ENPwl
Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader 
updateCloudState
INFO: Cloud state update for ZooKeeper already scheduled
Aug 25, 2010 5:18:19 AM org.apache.solr.cloud.ZkController$2 process
INFO: Updating live nodes:org.apache.solr.common.cloud.solrzkcli...@55308275
Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader 
updateCloudState
INFO: Updating live nodes from ZooKeeper...
Aug 25, 2010 5:18:19 AM org.apache.log4j.Category debug
FINE: Reading reply sessionid:0x12a97312613010b, packet:: clientPath:null 
serverPath:null finished:false header:: 541,8  replyHeader:: 541,-1,0  
request:: '/live_nodes,F  response:: 
v{'ob1078.nydc1.outbrain.com:8983_solr2,'ob1078.nydc1.outbrain.com:8983_solr1,'ob1061.nydc1.outbrain.com:8983_solr2,'ob1062.nydc1.outbrain.com:8983_solr1,'ob1062.nydc1.outbrain.com:8983_solr2,'ob1061.nydc1.outbrain.com:8983_solr1,'ob1077.nydc1.outbrain.com:8983_solr2,'ob1077.nydc1.outbrain.com:8983_solr1}
Aug 25, 2010 5:18:19 AM org.apache.log4j.Category error
SEVERE: Error while calling watcher
java.lang.IllegalArgumentException: Path cannot be null
at org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:45)
at 
org.apache.zookeeper.ZooKeeper.getChildren(zookeeper:ZooKeeper.java):1196)
at 
org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:200)
at org.apache.solr.cloud.ZkController$2.process(ZkController.java:321)
at 
org.apache.zookeeper.ClientCnxn$EventThread.run(zookeeper:ClientCnxn.java):425)
Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ConnectionManager process
INFO: Watcher org.apache.solr.common.cloud.connectionmana...@339bb448 
name:ZooKeeperConnection Watcher:zook1:2181,zook2:2181,zook3:2181 got event 
WatchedEvent: Server state change. New state: Disconnected path:null type:None
Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader$4 process
INFO: Detected a shard change under ShardId:ENPWLMaster_1 in 
collection:ENPwlMaster
Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader 
updateCloudState
INFO: Cloud state update for ZooKeeper already scheduled
Aug 25, 2010 5:18:19 AM org.apa

question: havnig multiple solrCloud configuration on the same machine

2010-08-15 Thread Yatir Ben Shlomo
Hi!
I am using solrCloud with tomcat5.5
in my setup every lanugage has an its own index and its own solr filters so it 
needs a seprated solr configuration files.

in solrCLoud examples posted here : http://wiki.apache.org/solr/SolrCloud
I noticed that bootstrap_confdir is a given as global -D parameter
but I need to be able to supply it per Core
I tried doing this in solr.xml but failed

solr.xml

  
 


all my cores are usign the same zoo keeper configuration according to the 
-Dbootstrap_confdi=...

does anyone know how I can specify the bootstrap_confdir on a per-core basis?
thanks

Yatir Ben Shlomo
Outbrain Engineering
yat...@outbrain.com<mailto:yat...@outbrain.com>
tel: +972-73-223912
fax: +972-9-8350055
www.outbrain.com<http://www.outbrain.com/>



question: solrCloud with multiple cores on each machine

2010-07-27 Thread Yatir Ben Shlomo
Hi
 I am using solrCloud.
Suppose I have a total 4 machines dedicated for solr.
I want to have 2 machines as replication (salves) and 2 masters
But I want to work with 8 logical cores rather 2.
i.e. each master (and each slave) will have 4 cores on it.
the reason is that I can optimize the cores one at a time so the IO intensity 
at any given moment will be low and will not degrade the online performance

Is there a way to configure my solr.xml so that when I am doing a distributed 
search (distrib=true) it will know to query all 8 cores ?

Thanks
Yatir


Re: How real-time are Soir/Lucene queries?

2010-05-21 Thread Ben Eliott
Further to earlier note re Lucandra.  I note that Cassandra, which  
Lucandra backs onto,  is 'eventually consistent',  so given your real- 
time requirements,  you may want to review this in the first instance,  
if Lucandra is of interest.


On 21 May 2010, at 06:12, Walter Underwood wrote:

Solr is a very good engine, but it is not real-time. You can turn  
off the caches and reduce the delays, but it is fundamentally not  
real-time.


I work at MarkLogic, and we have a real-time transactional search  
engine (and respository). If you are curious, contact me directly.


I do like Solr for lots of applications -- I chose it when I was at  
Netflix.


wunder

On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote:


Hello Soir,

Soir looks like an excellent API and its nice to have a tutorial  
that makes it easy to discover the basics of what Soir does, I'm  
impressed. I can see plenty of potential uses of Soir/Lucene and  
I'm interested now in just how real-time the queries made to an  
index can be?


For example, in my application I have time ordered data being  
processed by a paint method in real-time. Each piece of data is  
identified and its associated renderer is invoked. The Java2D  
renderer would then lookup any layout and style values it requires  
to render the current data it has received from the layout and  
style indexes. What I'm wondering is if this lookup which would be  
a Lucene search will be fast enough?


Would it be best to make Lucene queries for the relevant layout and  
style values required by the renderers ahead of rendering time and  
have the query results placed into the most performant collection  
(map/array) so renderer lookup would be as fast as possible? Or can  
Lucene handle many individual lookup queries fast enough so  
rendering is quick?


Best regards from Canada,

Thom











Re: How real-time are Soir/Lucene queries?

2010-05-20 Thread Ben Eliott

You may wish to look at  Lucandra: http://github.com/tjake/Lucandra

On 21 May 2010, at 06:12, Walter Underwood wrote:

Solr is a very good engine, but it is not real-time. You can turn  
off the caches and reduce the delays, but it is fundamentally not  
real-time.


I work at MarkLogic, and we have a real-time transactional search  
engine (and respository). If you are curious, contact me directly.


I do like Solr for lots of applications -- I chose it when I was at  
Netflix.


wunder

On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote:


Hello Soir,

Soir looks like an excellent API and its nice to have a tutorial  
that makes it easy to discover the basics of what Soir does, I'm  
impressed. I can see plenty of potential uses of Soir/Lucene and  
I'm interested now in just how real-time the queries made to an  
index can be?


For example, in my application I have time ordered data being  
processed by a paint method in real-time. Each piece of data is  
identified and its associated renderer is invoked. The Java2D  
renderer would then lookup any layout and style values it requires  
to render the current data it has received from the layout and  
style indexes. What I'm wondering is if this lookup which would be  
a Lucene search will be fast enough?


Would it be best to make Lucene queries for the relevant layout and  
style values required by the renderers ahead of rendering time and  
have the query results placed into the most performant collection  
(map/array) so renderer lookup would be as fast as possible? Or can  
Lucene handle many individual lookup queries fast enough so  
rendering is quick?


Best regards from Canada,

Thom











Re: Custom sort

2009-07-10 Thread Ben
It could be that you should be providing an implementation of 
"SortComparatorSource"
I have missed the earlier part of this thread, I assume you're trying to 
implement some form of custom search?


B

dontthinktwice wrote:


Marc Sturlese wrote:
  

I have been able to create my custom field. The problem is that I have
laoded in the solr core a couple of HashMaps
from a DB with values that will influence in the sort. My problem is that
I don't know how to let my custom sort have access to this HashMaps.
I am a bit confused now. I think that would be easy to reach my goal
using:

CustomSortComponent extends SearchComponent implements SolrCoreAware

This way, I would load the HashMaps in the "inform" method and would
create de custom sort using the HashMaps in the "preprare" method.

Don't know how to do that with the CustomField (similar to the
RandomField)... any advice?





Marc, did you get this working somehow? I'm looking at doing something
similar, and before I make a custom sort field (like RandomSortField) I
would be delighted to know that I can give it access to a the data structure
it will need to calculate the sort...

  




Re: DocSlice andNotSize

2009-07-03 Thread Ben
DocSet isn't an object it's an interface. The DocSlice class 
*implements* DocSet.
What you're saying about set operations not working for DocSlice but 
working for DocSet then doesn't make any sense... can you clarify?


The failure of these set operations to work as expected is confusing the 
hell out of me too!


Thanks
Ben


Yonik Seeley wrote:

On Thu, Jul 2, 2009 at 4:24 PM, Candide Kemmler wrote:
  

I have a simple question rel the DocSlice class. I'm trying to use the (very
handy) set operations on DocSlices and I'm rather confused by the way it
behaves.

I have 2 DocSlices, atlDocs which, by looking at the debugger, holds a
"docs" array of ints of size 1; the second DocSlice is btlDocs, with a
"docs" array of ints of size 67. I know that atlDocs is a subset of btlDocs,
so the doing btlDocs.andNotSize(atlDocs) should really return 66.

But it's returning 10.



The short answer is that all of the set operations were only designed
for DocSets (as opposed to DocLists).
Yes, perhaps DocList should not have extended DocSet...

-Yonik
http://www.lucidimagination.com
  




Building Solr index with Lucene

2009-07-01 Thread Ben Bangert
For performance reasons, we're attempting to build the index used with  
Solr, directly in Lucene. It works for the most part fine, but I'm  
having issue when it comes to stemming. I'm guessing this is due to a  
mismatch in how Lucene is stemming, with how Solr stems during its  
queries or something.


Has anyone built their Solr index using Lucene, and how did you handle  
stemmed fields in Lucene so that Solr worked properly with them?


Cheers,
Ben


Re: Excluding characters from a wildcard query

2009-07-01 Thread Ben
my brain was switched off.  I'm using SOLRJ, which means I'll need to 
specify multiple :


addMultipleFields(solrDoc, "vector", "vectorvalue", 1.0f);

for each value to be added to the multiValuedField.

Then, with luck, the simple wildcard query will be executed over each 
individual value when looking for matches, meaning the simple query 
syntax can made adequate to do what's needed.


Many thanks Uwe.

B

Uwe Klosa wrote:

2009/7/1 Ben 

  

I'm not quite sure I understand exactly what you mean.
The string I'm processing could have many tens of thousands of values... I
hope you aren't implying I'd need to split it into many tens of thousands of
"columns".




No, that is not what I meant. It will be one field (column) with tens of
thousands of values.


  

If you're saying what I think you're saying, you're saying that I should
leave whitespaces between the individual parts of the string, pass in the
string into a "multiValued" field and have SOLR internally treat each "word"
as an individual entity?
Thanks for your help with this...




I said nothing about whitespaces. I don't know how you update your solr
documents. Are you using XML or Solrj?

Uwe

  




Re: Excluding characters from a wildcard query

2009-07-01 Thread Ben

I'm not quite sure I understand exactly what you mean.
The string I'm processing could have many tens of thousands of values... 
I hope you aren't implying I'd need to split it into many tens of 
thousands of "columns".


If you're saying what I think you're saying, you're saying that I should 
leave whitespaces between the individual parts of the string, pass in 
the string into a "multiValued" field and have SOLR internally treat 
each "word" as an individual entity? 


Thanks for your help with this...

Ben

Uwe Klosa wrote:

To get the desired efffect I described you have to do the split before you
send the document to solr. I'm not aware of an analyzer that can split one
field value into several field values. The analyzers and tokenizers do
create tokens from field values in many different ways.

As I see it you have to do some preprocessing yourself.

Uwe

2009/7/1 Ben 

  

Is there a way in the Schema to specify that the comma should be used to
split the values up? e.g. Can I specify my "vector" field as multivalue and
also specify some sort of tokeniser to automatically split on commas?

Ben



Uwe Klosa wrote:



You should split the strings at the comma yourself and store the values in
a
multivalued field? Then wildcard search like A1_* are not a problem. I
don't
know so much about facets. But if they work on multivalued fields that
should be then no problem at all.

Uwe

2009/7/1 Ben 



  

Yes, I had done that... however, I'm beginning to see now that what I am
doing is called a "wildcard query" which is going via Lucene's
queryparser.
Lucene's query parser doesn't not support the regexp idea of character
exclusion ... i.e. I'm not trying to match "[" I'm trying to express
"Match
as many characters as possible, which are not underscores" with [^_]*

Perhaps I'm going about my whole problem in an ineffective way, but I'm
not
sure how I can sensibly describe what I'm doing without it becoming a
long
document.

The only other approach I can think of is to change what I'm indexing but
I'm not sure how to achieve that.
I've tried explaining it once, and obviously failed, so I'll try again.

I'm given a string containing many vectors (where each dimension is
separated by an underscore, and each vector is seperated by a comma) e.g.

A1_B1_C1_D1,A2_B2_C2_D2,A3_B3_C3_D3

I want my facet query to tell me if, within one of the vectors within
that
string, there is a match for dimensions I'm interested in. Of the four
dimensions in this example, I may choose to fix an arbitrary number of
them
with values, and the rest with wildcards e.g. I might look for a facet
containing Ox_*_*_* so one of the vectors in the string must have its
first
dimension matching "Ox" and I don't care about the rest.

***Is there a way to break down this string on the comma's so that I can
apply a normal wildcard query and SOLR applies it to each
individually?***
That would solve all my problems :
e.g.
The string is internally represented in lucene/solr as
A1_B1_C1_D1
A2_B2_C2_D2
A3_B3_C3_D3

where it tries to match the wildcard query on each in turn?

Thanks for you help, I'm deeply confused about this at the moment...

Ben






  



  




Re: Excluding characters from a wildcard query

2009-07-01 Thread Ben
Is there a way in the Schema to specify that the comma should be used to 
split the values up? 
e.g. Can I specify my "vector" field as multivalue and also specify some 
sort of tokeniser to automatically split on commas?


Ben


Uwe Klosa wrote:

You should split the strings at the comma yourself and store the values in a
multivalued field? Then wildcard search like A1_* are not a problem. I don't
know so much about facets. But if they work on multivalued fields that
should be then no problem at all.

Uwe

2009/7/1 Ben 

  

Yes, I had done that... however, I'm beginning to see now that what I am
doing is called a "wildcard query" which is going via Lucene's queryparser.
Lucene's query parser doesn't not support the regexp idea of character
exclusion ... i.e. I'm not trying to match "[" I'm trying to express "Match
as many characters as possible, which are not underscores" with [^_]*

Perhaps I'm going about my whole problem in an ineffective way, but I'm not
sure how I can sensibly describe what I'm doing without it becoming a long
document.

The only other approach I can think of is to change what I'm indexing but
I'm not sure how to achieve that.
I've tried explaining it once, and obviously failed, so I'll try again.

I'm given a string containing many vectors (where each dimension is
separated by an underscore, and each vector is seperated by a comma) e.g.

A1_B1_C1_D1,A2_B2_C2_D2,A3_B3_C3_D3

I want my facet query to tell me if, within one of the vectors within that
string, there is a match for dimensions I'm interested in. Of the four
dimensions in this example, I may choose to fix an arbitrary number of them
with values, and the rest with wildcards e.g. I might look for a facet
containing Ox_*_*_* so one of the vectors in the string must have its first
dimension matching "Ox" and I don't care about the rest.

***Is there a way to break down this string on the comma's so that I can
apply a normal wildcard query and SOLR applies it to each individually?***
That would solve all my problems :
e.g.
The string is internally represented in lucene/solr as
A1_B1_C1_D1
A2_B2_C2_D2
A3_B3_C3_D3

where it tries to match the wildcard query on each in turn?

Thanks for you help, I'm deeply confused about this at the moment...

Ben




  




Re: Excluding characters from a wildcard query

2009-07-01 Thread Ben
Yes, I had done that... however, I'm beginning to see now that what I am 
doing is called a "wildcard query" which is going via Lucene's queryparser.
Lucene's query parser doesn't not support the regexp idea of character 
exclusion ... i.e. I'm not trying to match "[" I'm trying to express 
"Match as many characters as possible, which are not underscores" with [^_]*


Perhaps I'm going about my whole problem in an ineffective way, but I'm 
not sure how I can sensibly describe what I'm doing without it becoming 
a long document.


The only other approach I can think of is to change what I'm indexing 
but I'm not sure how to achieve that.

I've tried explaining it once, and obviously failed, so I'll try again.

I'm given a string containing many vectors (where each dimension is 
separated by an underscore, and each vector is seperated by a comma) e.g.


A1_B1_C1_D1,A2_B2_C2_D2,A3_B3_C3_D3

I want my facet query to tell me if, within one of the vectors within 
that string, there is a match for dimensions I'm interested in. Of the 
four dimensions in this example, I may choose to fix an arbitrary number 
of them with values, and the rest with wildcards e.g. I might look for a 
facet containing Ox_*_*_* so one of the vectors in the string must have 
its first dimension matching "Ox" and I don't care about the rest.


***Is there a way to break down this string on the comma's so that I can 
apply a normal wildcard query and SOLR applies it to each 
individually?*** That would solve all my problems :

e.g.
The string is internally represented in lucene/solr as
A1_B1_C1_D1
A2_B2_C2_D2
A3_B3_C3_D3

where it tries to match the wildcard query on each in turn?

Thanks for you help, I'm deeply confused about this at the moment...

Ben


Re: Excluding characters from a wildcard query

2009-07-01 Thread Ben
I only just noticed that this is an exception being thrown by the 
lucene.queryParser. Should I be mailing on the lucene list, or is it ok 
here?


I'm beginning to wonder if the "fq" can handle the type of character 
exclusion I'm trying in the RegExp.

Escaping the string also doesn't work  :

Cannot parse 'vector:_\*[\^_\]\*_[\^_\]\*_[\^_\]\*': Encountered "]" at 
line 1, column 15.

Was expecting one of:
   "TO" ...
...
...
  


Ben wrote:


Ben wrote:

The exception SOLR raises is :

org.apache.lucene.queryParser.ParseException: Cannot parse 
'vector:_*[^_]*_[^_]*_[^_]*': Encountered "]" at line 1, column 12.

Was expecting one of:
   "TO" ...
...
...
 
Ben wrote:
Passing in a RegularExpression like "[^_]*_[^_]*" (e.g. matching 
anything with an underscore in the string) using some code like :


...
parameters.add("fq", "vector:[^_]*_[^_]*");
...

seems to cause problems for SOLR, I assume because of the [ or ^ 
character.


Can somebody please advise how to handle character exclusion in such 
searches?


Any help or pointers are much appreciated!

Thanks

Ben








Re: Excluding characters from a wildcard query - More Info - Is this difficult, or am I being ignored because it's too obvious to merit an answer?

2009-07-01 Thread Ben


Ben wrote:

The exception SOLR raises is :

org.apache.lucene.queryParser.ParseException: Cannot parse 
'vector:_*[^_]*_[^_]*_[^_]*': Encountered "]" at line 1, column 12.

Was expecting one of:
   "TO" ...
...
...
 
Ben wrote:
Passing in a RegularExpression like "[^_]*_[^_]*" (e.g. matching 
anything with an underscore in the string) using some code like :


...
parameters.add("fq", "vector:[^_]*_[^_]*");
...

seems to cause problems for SOLR, I assume because of the [ or ^ 
character.


Can somebody please advise how to handle character exclusion in such 
searches?


Any help or pointers are much appreciated!

Thanks

Ben






Re: Excluding characters from a wildcard query - More Info

2009-06-30 Thread Ben

The exception SOLR raises is :

org.apache.lucene.queryParser.ParseException: Cannot parse 
'vector:_*[^_]*_[^_]*_[^_]*': Encountered "]" at line 1, column 12.

Was expecting one of:
   "TO" ...
...
...
  


Ben wrote:
Passing in a RegularExpression like "[^_]*_[^_]*" (e.g. matching 
anything with an underscore in the string) using some code like :


...
parameters.add("fq", "vector:[^_]*_[^_]*");
...

seems to cause problems for SOLR, I assume because of the [ or ^ 
character.


Can somebody please advise how to handle character exclusion in such 
searches?


Any help or pointers are much appreciated!

Thanks

Ben




Excluding characters from a wildcard query

2009-06-30 Thread Ben
Passing in a RegularExpression like "[^_]*_[^_]*" (e.g. matching 
anything with an underscore in the string) using some code like :


...
parameters.add("fq", "vector:[^_]*_[^_]*");
...

seems to cause problems for SOLR, I assume because of the [ or ^ character.

Can somebody please advise how to handle character exclusion in such 
searches?


Any help or pointers are much appreciated!

Thanks

Ben


Re: Excluding Characters and SubStrings in a Faceted Wildcard Query

2009-06-29 Thread Ben

Hi Erik,

I'm not sure exactly how much context you need here, so I'll try to keep 
it short and expand as needed.


The column I am faceting contains a comma deliniated set of vectors. 
Each vector is made up of {Make,Year,Model} e.g. 
_ford_1996_focus,mercedes_1996_clk,ford_2000_focus


I have a custom request handler, where if I want to find all the cars 
from 1996 I pass in a facet query for the Year (1996) which is 
transformed to a wildcard facet query :


_*_1996_*

In otherwords, it'll match any records whose vector column contains a 
string, which somewhere has a car from 1996.


Why not put the Make, Year and Model in separate columns and do a facet 
query of multiple columns?... because once we've selected 1996, we 
should (in the above example) then be offering "ford and mercedes" as 
further facet choices, and nothing more. If the parts were in their own 
columns, there would be no way to tie the Makes and Models to specific 
years, for example.


At anyrate, the wildcard search returns the entire match 
(_ford_1996_focus,mercedes_1996_clk,ford_2000_focus). I then have to do 
another RegExp over it to extract only the two parts (the first ford and 
mercedes) that were from 1996. This isn't using SOLR's cache very 
effectively.


It would be excellent if SOLR could break up that comma separated list 
into three different parts, and run the RegExp over each , returning 
only those which match. Is that what you're implying with Analysis? If 
that were the case, I'd not need to worry about character exclusion.


Sorry if that's a bit fuzzy... it's hard trying to explain enough to be 
useful, but not too much that it turns into an essay!!!


Thanks,
Ben

The solution I'm using is to form a vector

Erik Hatcher wrote:

Ben,

Could you post an example of the type of data you're dealing with and 
how you want it handled?   I suspect there is a way to accomplish what 
you want using an analyzed field, or by preprocessing the data you're 
indexing.


Erik

On Jun 29, 2009, at 9:29 AM, Ben wrote:


Hello,

I've been using SOLR for a while now, but am stuck for information on 
two issues :


1) Is it possible to exclude characters in a SOLR facet wildcard query?
e.g.
[^,]* to match any character except an ","  ?

2) Can one setup the facet wildcard query to return the exact sub 
strings it matched of the queried facet, rather than the whole string?


I hope somebody can help :)

Thanks,

Ben






Excluding Characters and SubStrings in a Faceted Wildcard Query

2009-06-29 Thread Ben

Hello,

I've been using SOLR for a while now, but am stuck for information on 
two issues :


1) Is it possible to exclude characters in a SOLR facet wildcard query?
e.g.
[^,]* to match any character except an ","  ?

2) Can one setup the facet wildcard query to return the exact sub 
strings it matched of the queried facet, rather than the whole string?


I hope somebody can help :)

Thanks,

Ben



Sending Mlt POST request

2009-05-25 Thread Ohad Ben Porat
Hello,



I wish to send an Mlt request to Solr and filter the result by a list of values 
to specific field.   The problem is sometimes the list can include 
thousands of values and it's impossible to send such GET request.

Sending this request as POST didn't work well... Is POST supported by mlt? If 
not, is there suppose to be added in one of the next versions? Or is there a 
different solution maybe?



I will appreciate any help and advice,

Thanks,

Ohad.



Re: dismax query not working with 1.4

2009-03-26 Thread Ben Lavender
Did the XML in that message come through okay?  Gmail seems to be
eating it on my end.

Anyway, while the default config has those fields, it also fails with
the application config, which has:
  

 dismax
 explicit

  

Since this essentially the same as standard, I assumed it would work
without any qf.  I manually added a qf to the query with the
application solrconfig and got a result.  Off to debug the application
side!

Thank you very much for the help!

Ben


On Thu, Mar 26, 2009 at 3:08 PM, Otis Gospodnetic
 wrote:
>
> Standard searches your default field (specified in schema.xml).
> DisMax searches fields you specify in DisMax config.
> Yours has:
>  text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
>
> But there are not your real fields.  Change that to your real fields in qf, 
> pf and other parts of DisMax config and things should start working.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: Ben Lavender 
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, March 26, 2009 4:02:58 PM
>> Subject: Re: dismax query not working with 1.4
>>
>> I do not have a qf set; this is the query generated by the admin interface:
>> dismax:
>> select?indent=on&version=2.2&q=test&start=0&rows=10&fl=*%2Cscore&qt=dismax&wt=standard&explainOther=&hl.fl=
>>
>> standard:
>> select?indent=on&version=2.2&q=test&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=
>>
>> dismax has no results, standard has 30.
>>
>> I don't see a requirement that qf be defined on
>> http://wiki.apache.org/solr/DisMaxRequestHandler; am I missing
>> something?
>>
>> The query responses are the same with both the application-specific
>> and default solrconfig.xml's.  The application definition for dismax
>> is:
>>
>>
>>     dismax
>>     explicit
>>
>>
>>
>> And the one from my nightly is:
>>
>>
>>     dismax
>>     explicit
>>     0.01
>>
>>         text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
>>
>>
>>         text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9
>>
>>
>>         ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3
>>
>>
>>         id,name,price,score
>>
>>
>>         2<-1 5<-2 6<90%
>>
>>     100
>>     *:*
>>
>>     text features name
>>
>>     0
>>
>>     name
>>     regex
>>
>>
>>
>> So there's no particular mention of any fields from schema.xml in
>> dismax, but the standard works without that.
>>
>> Thanks for the responses,
>> Ben
>>
>> On Thu, Mar 26, 2009 at 2:11 PM, Matt Mitchell wrote:
>> > Do you have qf set? Just last week I had a problem where no results were
>> > coming back, and it turned out that my qf param was empty.
>> >
>> > Matt
>> >
>> > On Thu, Mar 26, 2009 at 2:30 PM, Ben Lavender wrote:
>> >
>> >> Hello,
>> >>
>> >> I'm using the March 18th 1.4 nightly, and I can't get a dismax query
>> >> to return results.  The standard and partitioned query types return
>> >> data fine.  I'm using jetty, and the problem occurs with the default
>> >> solrconfig.xml as well as the one I am using, which is the Drupal
>> >> module, beta 6.  The problem occurs in the admin interface for solr,
>> >> though, not just in the end application.
>> >>
>> >> And...that's it?  I don't know what else to say or offer other than
>> >> dismax doesn't work, and I'm not sure where else to go to
>> >> troubleshoot.  Any ideas?
>> >>
>> >> Ben
>> >>
>> >
>
>


Re: dismax query not working with 1.4

2009-03-26 Thread Ben Lavender
I do not have a qf set; this is the query generated by the admin interface:
dismax:
select?indent=on&version=2.2&q=test&start=0&rows=10&fl=*%2Cscore&qt=dismax&wt=standard&explainOther=&hl.fl=

standard:
select?indent=on&version=2.2&q=test&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=

dismax has no results, standard has 30.

I don't see a requirement that qf be defined on
http://wiki.apache.org/solr/DisMaxRequestHandler; am I missing
something?

The query responses are the same with both the application-specific
and default solrconfig.xml's.  The application definition for dismax
is:
  

 dismax
 explicit

  

And the one from my nightly is:
  

 dismax
 explicit
 0.01
 
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 
 
text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9
 
 
ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3
 
 
id,name,price,score
 
 
2<-1 5<-2 6<90%
 
 100
 *:*
 
 text features name
 
 0
 
 name
 regex 

  

So there's no particular mention of any fields from schema.xml in
dismax, but the standard works without that.

Thanks for the responses,
Ben

On Thu, Mar 26, 2009 at 2:11 PM, Matt Mitchell  wrote:
> Do you have qf set? Just last week I had a problem where no results were
> coming back, and it turned out that my qf param was empty.
>
> Matt
>
> On Thu, Mar 26, 2009 at 2:30 PM, Ben Lavender  wrote:
>
>> Hello,
>>
>> I'm using the March 18th 1.4 nightly, and I can't get a dismax query
>> to return results.  The standard and partitioned query types return
>> data fine.  I'm using jetty, and the problem occurs with the default
>> solrconfig.xml as well as the one I am using, which is the Drupal
>> module, beta 6.  The problem occurs in the admin interface for solr,
>> though, not just in the end application.
>>
>> And...that's it?  I don't know what else to say or offer other than
>> dismax doesn't work, and I'm not sure where else to go to
>> troubleshoot.  Any ideas?
>>
>> Ben
>>
>


dismax query not working with 1.4

2009-03-26 Thread Ben Lavender
Hello,

I'm using the March 18th 1.4 nightly, and I can't get a dismax query
to return results.  The standard and partitioned query types return
data fine.  I'm using jetty, and the problem occurs with the default
solrconfig.xml as well as the one I am using, which is the Drupal
module, beta 6.  The problem occurs in the admin interface for solr,
though, not just in the end application.

And...that's it?  I don't know what else to say or offer other than
dismax doesn't work, and I'm not sure where else to go to
troubleshoot.  Any ideas?

Ben


field range (min and max term)

2009-02-02 Thread Ben Incani
Hi Solr users,

Is there a method of retrieving a field range i.e. the min and max
values of that fields term enum.

For example I would like to know the first and last date entry of N
documents.

Regards,

-Ben


RE: *Very* slow Commit after upgrading to solr 1.3

2008-10-07 Thread Ben Shlomo, Yatir
So other than me doing trial & error, do you have any guidance on how to
configure the merge factor (and ramBufferSizeMB ? ).
any "formula" that supplies the optimal value ?
Thanks,
Yatir

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Tuesday, October 07, 2008 1:10 PM
To: solr-user@lucene.apache.org
Subject: Re: *Very* slow Commit after upgrading to solr 1.3

On Tue, Oct 7, 2008 at 6:32 AM, Ben Shlomo, Yatir
<[EMAIL PROTECTED]> wrote:
> The problem is solved, see below.
> Since the performance is so sensitive to configuration - do you have a
> tip on how to determine the optimal configuration for
> mergeFactor, ramBufferSizeMB and other properties ?

The issue might have been your high merge factor coupled with changes
in how Lucene closes an index.  To prevent possible corruption on a
crash, Lucene now does an fsync on the index files before it writes
the new segment descriptor that references those files.  A high merge
factor means more segments, hence more segment files to sync on a
close.

-Yonik


> My original problem occurred even on a fresh rebuild of the index with
> solr 1.3
> To solve it I used the entire IndexWriter section settings from the
solr
> 1.3 example file
> This had a dramatic impact:
> I indexed 20 GB of data (52M docs)
> The total indexing time was 13 hours
> The index size was 30 GB
> The total commit time was less than 2 minutes
>
> Tomcat Log for reference
>
> Oct 5, 2008 9:43:24 PM org.apache.solr.update.DirectUpdateHandler2
> commit
> INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
> Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher 
> INFO: Opening [EMAIL PROTECTED] main
> Oct 5, 2008 9:43:43 PM org.apache.solr.update.DirectUpdateHandler2
> commit
> INFO: end_commit_flush
> Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
>
>
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,
>
warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=
> 0.00,cumulative_inserts=0,cumulative_evictions=0}
> Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for [EMAIL PROTECTED] main
>
>
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,
>
warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=
> 0.00,cumulative_inserts=0,cumulative_evictions=0}
> Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
>
>
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si
>
ze=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitr
> atio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for [EMAIL PROTECTED] main
>
>
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si
>
ze=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitr
> atio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
>
>
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=
>
0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati
> o=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for [EMAIL PROTECTED] main
>
>
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=
>
0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati
> o=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Oct 5, 2008 9:43:43 PM org.apache.solr.core.SolrCore registerSearcher
> INFO: [] Registered new searcher [EMAIL PROTECTED] main
> Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher close
> INFO: Closing [EMAIL PROTECTED] main
>
>
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,
>
warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=
> 0.00,cumulative_inserts=0,cumulative_evictions=0}
>
>
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si
>
ze=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitr
> atio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>
>
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=
>
0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati
> o=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Oct 5, 2008 9:

RE: *Very* slow Commit after upgrading to solr 1.3

2008-10-07 Thread Ben Shlomo, Yatir
Thanks Yonik,

The problem is solved, see below.
Since the performance is so sensitive to configuration - do you have a
tip on how to determine the optimal configuration for 
mergeFactor, ramBufferSizeMB and other properties ?

My original problem occurred even on a fresh rebuild of the index with
solr 1.3
To solve it I used the entire IndexWriter section settings from the solr
1.3 example file
This had a dramatic impact:
I indexed 20 GB of data (52M docs)
The total indexing time was 13 hours
The index size was 30 GB
The total commit time was less than 2 minutes

Tomcat Log for reference

Oct 5, 2008 9:43:24 PM org.apache.solr.update.DirectUpdateHandler2
commit
INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher 
INFO: Opening [EMAIL PROTECTED] main
Oct 5, 2008 9:43:43 PM org.apache.solr.update.DirectUpdateHandler2
commit
INFO: end_commit_flush
Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main

filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,
warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=
0.00,cumulative_inserts=0,cumulative_evictions=0}
Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main

filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,
warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=
0.00,cumulative_inserts=0,cumulative_evictions=0}
Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main

queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si
ze=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitr
atio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main

queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si
ze=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitr
atio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main

documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=
0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati
o=0.00,cumulative_inserts=0,cumulative_evictions=0}
Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main

documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=
0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati
o=0.00,cumulative_inserts=0,cumulative_evictions=0}
Oct 5, 2008 9:43:43 PM org.apache.solr.core.SolrCore registerSearcher
INFO: [] Registered new searcher [EMAIL PROTECTED] main
Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher close
INFO: Closing [EMAIL PROTECTED] main

filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,
warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=
0.00,cumulative_inserts=0,cumulative_evictions=0}

queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si
ze=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitr
atio=0.00,cumulative_inserts=0,cumulative_evictions=0}

documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=
0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati
o=0.00,cumulative_inserts=0,cumulative_evictions=0}
Oct 5, 2008 9:43:43 PM
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {commit=} 0 18406
Oct 5, 2008 9:43:43 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/dss1 path=/update params={} status=0 QTime=18406 
Oct 5, 2008 9:43:43 PM org.apache.solr.update.DirectUpdateHandler2
commit
INFO: start commit(optimize=true,waitFlush=false,waitSearcher=true)
Oct 5, 2008 9:45:07 PM org.apache.solr.search.SolrIndexSearcher 
INFO: Opening [EMAIL PROTECTED] main
Oct 5, 2008 9:45:07 PM org.apache.solr.update.DirectUpdateHandler2
commit
INFO: end_commit_flush


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Saturday, October 04, 2008 6:07 PM
To: solr-user@lucene.apache.org
Subject: Re: *Very* slow Commit after upgrading to solr 1.3

Ben, see also

http://www.nabble.com/Commit-in-solr-1.3-can-take-up-to-5-minutes-td1980
2781.html#a19802781

What type of physical drive is this and what interface is used (SATA,
etc)?
What is the filesystem (NTFS)?

Did you add to an existing index from an older version of Solr, or
start from scratch?

If you add a single document to the index and commit, does it

*Very* slow Commit after upgrading to solr 1.3

2008-09-29 Thread Ben Shlomo, Yatir
Hi!

 

I am running on widows 64 bit ...
I have upgraded to solr 1.3 in order to use the distributed search.

I haven't changed the solrConfig and the schema xml files during the
upgrade.

I am indexing ~ 350K documents (each one is about 0.5 KB in size)

The indexing takes a reasonable amount of time (350 seconds)

See tomcat log:

INFO: {add=[8x-wbTscWftuu1sVWpdnGw==, VOu1eSv0obBl1xkj2jGjIA==,
YkOm-nKPrTVVVyeCZM4-4A==, rvaq_TyYsqt3aBc0KKDVbQ==,
9NdzWXsErbF_5btyT1JUjw==, ...(398728 more)]} 0 349875

 

But when I commit it takes more than an hour ! (5000 seconds!, the
optimize after the commit took 14 seconds)

INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)

 

p.s. its not a machine problem I moved to another machine and the same
thing happened


I noticed something very strange during the time I wait for the commit:

While the solr index is 210MB in size

In the windows task manager I noticed that the java process is making a
HUGE amounts of IO reads:

It reads more than 350 GB ! (- which takes a lot of time.)

The process is constantly taking 25% of the cpu resources.

All my autowarmCount in Solrconfig  file do not exceed 256...

 

Any more ideas to check?

Thanks.

 

 

 

Here is part of my solrConfig file:

-   < - 

-  

  false 

  1000 

  1000 

  2147483647 

  1 

  1000 

  1 

  

- 

-  

  false 

  1000 

  1000 

  2147483647 

  1 

-  

  true 

  

 

 

 

 

 

Yatir Ben-shlomo | eBay, Inc. | Classification Track, Shopping.com
(Israel) | w: +972-9-892-1373 |  email: [EMAIL PROTECTED] |

 



RE: help required: how to design a large scale solr system

2008-09-24 Thread Ben Shlomo, Yatir
Thanks Mark!.
Do you have any comment regarding the performance differences between
indexing TSV files as opposed to directly indexing each document via
http post?

-Original Message-
From: Mark Miller [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 24, 2008 2:12 PM
To: solr-user@lucene.apache.org
Subject: Re: help required: how to design a large scale solr system


 From my limited experience:

I think you might have a bit of trouble getting 60 mil docs on a single 
machine. Cached queries will probably still be *very* fast, but non 
cached queries are going to be very slow in many cases. Is that 5 
seconds for all queries? You will never meet that on first run queries 
with 60mil docs on that machine. The light query load might make things 
workable...but your near the limits of a single machine (4 core or not) 
with 60 mil. You want to use a very good stopword list...common term 
queries will be killer. The docs being so small will be your only 
possible savior if you go the one machine route - that and cached hits. 
You don't have enough ram to get as much of the filesystem into RAM as 
youd like for 60 mil docs either.

I think you might try two machines with 30, 3 with 20, or 4 with 15. The

more you spread, even with slower machines, the faster your likely to 
index, which as you say, will take a long time for 60 mil docs (start 
today ). Multiple machines will help the indexing speed the most for 
sure - its still going to take a long time.

I don't think you will get much advantage using more than one solr 
install on a single machine - if you do, that should be addressed in the

code, even with RAID.

So I say, spread if you can. Faster indexing, faster search, easy to 
expand later. Distributed search is so easy with solr 1.3, you wont 
regret it. I think there is a bug to be addressed if your needing this 
in a week though - in my experience, with distributed search, for every 
million docs on a machine beyond the first, you lose a doc in a search 
across all machines (ie 1 mil on machine 1, 1 million on machine 2, a 
*:* search will be missing 1 doc. 10 mil each on 3 machines, a *:* 
search will be missing 30. Not a big deal, but could be a concern for 
some with picky, look at everything customers.

- Mark

Ben Shlomo, Yatir wrote:
> Hi!
>
> I am already using solr 1.2 and happy with it.
>
> In a new project with very tight dead line (10 development days from
> today) I need to setup a more ambitious system in terms of scale
> Here is the spec:
>
>  
>
> * I need to index about 60,000,000
> documents 
>
> * Each document is has 11 textual fields to be indexed &
stored
> and 4 more fields to be stored only 
>
> * Most fields are short (2-14 characters) however 2 indexed
> fields can be up to 1KB and another stored field is up to 1KB 
>
> * On average every document is about 0.5 KB to be stored and
> 0.4KB to be indexed 
>
> * The SLA for data freshness is a full nightly re-index ( I
> cannot obtain an incremental update/delete lists of the modified
> documents) 
>
> * The SLA for query time is 5 seconds 
>
> * the number of expected queries is 2-3 queries per second 
>
> * the queries are simple a combination of Boolean operation
and
> name searches (no fancy fuzzy searches and levinstien distances, no
> faceting, etc) 
>
> * I have a 64 bit Dell 2950 4-cpu machine  (2 dual cores )
with
> RAID 10, 200 GB HD space, and 8GB RAM memory 
>
> * The documents are not given to me explicitly - I am given a
> raw-documents in RAM - one by one, from which I create my document in
> RAM.
> and then I can either http-post is to index it directly or append it
to
> a tsv file for later indexing 
>
> * Each document has a unique ID
>
>  
>
> I have a few directions I am thinking about
>
>  
>
> The simple approach
>
> * Have one solr instance that will
index
> the entire document set (from files). I am afraid this will take too
> much time
>
>  
>
> Direction 1
>
> * Create TSV files from all the
> documents - this will take around 3-4 hours 
>
> * Have all the documents partitioned
> into several subsets (how many should I choose? ) 
>
> * Have multiple solr instances on the
> same machine 
>
> * Let each solr instance concurrently
> index the appropriate subset 
>
> * At the end merge all the indices
using
> the IndexMergeTool - (how much time will it take ?)
>
>  
>
> Direction 2
>
> * Like  the previous but

help required: how to design a large scale solr system

2008-09-23 Thread Ben Shlomo, Yatir
Hi!

I am already using solr 1.2 and happy with it.

In a new project with very tight dead line (10 development days from
today) I need to setup a more ambitious system in terms of scale
Here is the spec:

 

* I need to index about 60,000,000
documents 

* Each document is has 11 textual fields to be indexed & stored
and 4 more fields to be stored only 

* Most fields are short (2-14 characters) however 2 indexed
fields can be up to 1KB and another stored field is up to 1KB 

* On average every document is about 0.5 KB to be stored and
0.4KB to be indexed 

* The SLA for data freshness is a full nightly re-index ( I
cannot obtain an incremental update/delete lists of the modified
documents) 

* The SLA for query time is 5 seconds 

* the number of expected queries is 2-3 queries per second 

* the queries are simple a combination of Boolean operation and
name searches (no fancy fuzzy searches and levinstien distances, no
faceting, etc) 

* I have a 64 bit Dell 2950 4-cpu machine  (2 dual cores ) with
RAID 10, 200 GB HD space, and 8GB RAM memory 

* The documents are not given to me explicitly - I am given a
raw-documents in RAM - one by one, from which I create my document in
RAM.
and then I can either http-post is to index it directly or append it to
a tsv file for later indexing 

* Each document has a unique ID

 

I have a few directions I am thinking about

 

The simple approach

* Have one solr instance that will index
the entire document set (from files). I am afraid this will take too
much time

 

Direction 1

* Create TSV files from all the
documents - this will take around 3-4 hours 

* Have all the documents partitioned
into several subsets (how many should I choose? ) 

* Have multiple solr instances on the
same machine 

* Let each solr instance concurrently
index the appropriate subset 

* At the end merge all the indices using
the IndexMergeTool - (how much time will it take ?)

 

Direction 2

* Like  the previous but instead of
using the IndexMergeTool , use distributed search with shards (upgrading
to solr 1.3)

 

Direction 3,4

* Like previous directions only avoid
using TSV files at all and directly index the documents from RAM

Questions:

* Which direction do you recommend in order to meet the SLAs in
the fastest way? 

* Since I have RAID on the machine can I gain performance by
using multiple solr instances on the same machine or only multiple
machines will help me 

* What's the minimal number of machines I should require (I
might get more weaker machines) 

* How many concurrent indexers are recommended? 

* Do you agree that the bottle neck is the indexing time?

Any help is appreciated 

Thanks in advance

yatir

 



Re: Changing Solr Query Syntax

2008-03-18 Thread Ben Sanchez
Shalin, Thanks a lot. I'll do that.

On Tue, Mar 18, 2008 at 11:13 AM, Shalin Shekhar Mangar <
[EMAIL PROTECTED]> wrote:

> Hi Ben,
>
> If I had to do this, I would start by adding a custom
> javax.servlet.Filter into Solr. It should work fine since all you're
> doing is replacing characters in the q parameter for requests coming
> into /select handler. It's a bit hackish but that's exactly what
> you're trying to do :)
>
> Don't know if there's an alternate/easier way.
>
> On Tue, Mar 18, 2008 at 9:30 PM, Ben Sanchez <[EMAIL PROTECTED]> wrote:
> > Hi Shalin, thanks a lot for answering that fast.
> >
> >
> >  Use Case:
> >  I'm migrating from a proprietary index server (XYZ)  to Solr. All my
> >  applications and my customer's applications relay on the query
> specification
> >  of XYZ. It would be hard to modify all those apps to use the Solr Query
> >  Syntax (although, it would be ideal, Sorl query is a lot superior than
> that
> >  of XYZ).
> >
> >  Basically I need  to replace : with = ; + with / and = with :  in the
> query
> >  syntax.
> >
> >  Thank you.
> >
> >
> >  On Tue, Mar 18, 2008 at 9:50 AM, Shalin Shekhar Mangar <
> >  [EMAIL PROTECTED]> wrote:
> >
> >
> >
> > > Hi Ben,
> >  >
> >  > It would be nice if you can tell us your use-case so that we can be
> >  > more helpful.
> >  >
> >  > Why does the normal query syntax not work well for you? What are you
> >  > trying to accomplish? Maybe there is an easier way.
> >  >
> >  > On Tue, Mar 18, 2008 at 8:17 PM, Ben Sanchez <[EMAIL PROTECTED]>
> wrote:
> >  > > Hi solr users,
> >  > >  I need to change the query format for solr a little bit. How can I
> >  > >  accomplish this. I don't wan to modify the underlying lucene query
> >  > >  specification but just the way I query the index through the the
> GET
> >  > http
> >  > >  method in solr.
> >  > >  Thanks a lot for your help.
> >  > >
> >  > >  Ben
> >  > >
> >  >
> >  >
> >  >
> >  > --
> >  > Regards,
> >  > Shalin Shekhar Mangar.
> >  >
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Changing Solr Query Syntax

2008-03-18 Thread Ben Sanchez
Hi Shalin, thanks a lot for answering that fast.

Use Case:
I'm migrating from a proprietary index server (XYZ)  to Solr. All my
applications and my customer's applications relay on the query specification
of XYZ. It would be hard to modify all those apps to use the Solr Query
Syntax (although, it would be ideal, Sorl query is a lot superior than that
of XYZ).

Basically I need  to replace : with = ; + with / and = with :  in the query
syntax.

Thank you.

On Tue, Mar 18, 2008 at 9:50 AM, Shalin Shekhar Mangar <
[EMAIL PROTECTED]> wrote:

> Hi Ben,
>
> It would be nice if you can tell us your use-case so that we can be
> more helpful.
>
> Why does the normal query syntax not work well for you? What are you
> trying to accomplish? Maybe there is an easier way.
>
> On Tue, Mar 18, 2008 at 8:17 PM, Ben Sanchez <[EMAIL PROTECTED]> wrote:
> > Hi solr users,
> >  I need to change the query format for solr a little bit. How can I
> >  accomplish this. I don't wan to modify the underlying lucene query
> >  specification but just the way I query the index through the the GET
> http
> >  method in solr.
> >  Thanks a lot for your help.
> >
> >  Ben
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Changing Solr Query Syntax

2008-03-18 Thread Ben Sanchez
Shalin, thanks a lot for answering that fast.

Use Case:
I'm migrating from a proprietary index server (XYZ)  to Solr. All my
applications and my customer's applications relay on the query specification
of XYZ. It would be hard to modify all those apps to use the Solr Query
Syntax (although, it would be ideal, Sorl query is a lot superior than that
of XYZ, but impractical).



On Tue, Mar 18, 2008 at 9:50 AM, Shalin Shekhar Mangar <
[EMAIL PROTECTED]> wrote:

> Hi Ben,
>
> It would be nice if you can tell us your use-case so that we can be
> more helpful.
>
> Why does the normal query syntax not work well for you? What are you
> trying to accomplish? Maybe there is an easier way.
>
> On Tue, Mar 18, 2008 at 8:17 PM, Ben Sanchez <[EMAIL PROTECTED]> wrote:
> > Hi solr users,
> >  I need to change the query format for solr a little bit. How can I
> >  accomplish this. I don't wan to modify the underlying lucene query
> >  specification but just the way I query the index through the the GET
> http
> >  method in solr.
> >  Thanks a lot for your help.
> >
> >  Ben
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Changing Solr Query Syntax

2008-03-18 Thread Ben Sanchez
Hi solr users,
I need to change the query format for solr a little bit. How can I
accomplish this. I don't wan to modify the underlying lucene query
specification but just the way I query the index through the the GET http
method in solr.
Thanks a lot for your help.

Ben


solr web admin

2007-12-19 Thread Ben Incani
why does the web admin append "core=null" to all the requests?

e.g. admin/get-file.jsp?core=null&file=schema.xml


RE: retrieve lucene "doc id"

2007-12-16 Thread Ben Incani
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf 
> Of Yonik Seeley
> Sent: Monday, 17 December 2007 4:44 PM
> To: solr-user@lucene.apache.org
> Subject: Re: retrieve lucene "doc id"
> 
> On Dec 16, 2007 11:40 PM, Ben Incani 
> <[EMAIL PROTECTED]> wrote:
> > how do I retrieve the lucene "doc id" in a query?
> 
> Currently that's not doable... if it was though, it would be 
> a slightly dangerous feature since internal ids are transient.
> Can you explain a little more about what you are trying to do?
> 
> -Yonik
> 

Hi Yonik,

I have converted to using the Solr search interface and I am trying to
retrieve documents from a list of search results (where previously I had
used the doc id directly from the lucene query results) and the solr id
I have got currently indexed is unfortunately configured not be unique!

I do realise that lucene internal ids are transient, but for a read-only
requests (that are not cached) should be ok.  

i have hacked org.apache.solr.request.XMLWriter.writeDoc to do a
writeInt("docId", docId).

 code snippet 

SolrServer server = new CommonsHttpSolrServer(solrURL);
Map params = new HashMap();
params.put("q", searchQuery);
params.put("rows", "20");

MapSolrParams solrParams = new MapSolrParams(params);   
QueryResponse response = server.query(solrParams);

SolrDocumentList docs = response.getResults();
ArrayList hitsList = new ArrayList();

for (int i = 0; i < docs.getNumFound(); i++) {
HashMap resultMap = new HashMap();
SolrDocument doc = (SolrDocument)docs.get(i);
resultMap.put("id", doc.getFieldValue("docId"));
for(int j=0; j

retrieve lucene "doc id"

2007-12-16 Thread Ben Incani
how do I retrieve the lucene "doc id" in a query?

-Ben


RE: lowercase text/strings to be used in list box

2007-10-21 Thread Ben Incani
sorry - this should have been posted on the Lucene user list.

...the solution is to use the lucene PerFieldAnalyzerWrapper and add the
field with the KeywordAnalyzer then pass the PerFieldAnalyzerWrapper to
the QueryParser.

-Ben

> -Original Message-
> From: Ben Incani [mailto:[EMAIL PROTECTED] 
> Sent: Friday, 19 October 2007 5:52 PM
> To: solr-user@lucene.apache.org
> Subject: lowercase text/strings to be used in list box
> 
> I have a field which will only contain several values (that 
> include spaces).
> 
> I want to display a list box with all possible values by 
> browsing the lucene terms.
> 
> I have setup a field in the schema.xml file.
> 
> 
>   
> 
> 
>   
> 
> 
> I also tried;
> 
> 
>   
> 
> 
>   
> 
> 
> 
> This allows me to browse all the values no problem, but when 
> it comes to search the documents I have to use the lucene 
> org.apache.lucene.analysis.KeywordAnalyzer, when I would 
> rather use the 
> org.apache.lucene.analysis.standard.StandardAnalyzer and the 
> power of the default query parser to perform a phrase query 
> such as my_field:(the
> value) or my_field:"the value", which don't work?
> 
> So is there a way to prevent tokenisation of a field using 
> the StandardAnalyzer, without implementing your own TokenizerFactory?
> 
> Regards
> 
> Ben
> 


lowercase text/strings to be used in list box

2007-10-19 Thread Ben Incani
I have a field which will only contain several values (that include
spaces).

I want to display a list box with all possible values by browsing the
lucene terms.

I have setup a field in the schema.xml file.


  


  


I also tried;


  


  



This allows me to browse all the values no problem, but when it comes to
search the documents I have to use the lucene
org.apache.lucene.analysis.KeywordAnalyzer, when I would rather use the
org.apache.lucene.analysis.standard.StandardAnalyzer and the power of
the default query parser to perform a phrase query such as my_field:(the
value) or my_field:"the value", which don't work?

So is there a way to prevent tokenisation of a field using the
StandardAnalyzer, without implementing your own TokenizerFactory?

Regards

Ben


RE: solr not finding all results

2007-10-15 Thread Ben Shlomo, Yatir
Did you try to add a backslash to escape the "-" in Geckoplp4-M
(Geckoplp4\-M)


-Original Message-
From: Kevin Lewandowski [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 12, 2007 9:40 PM
To: solr-user@lucene.apache.org
Subject: solr not finding all results

I've found an odd situation where solr is not returning all of the
documents that I think it should. A search for "Geckoplp4-M" returns 3
documents but I know that there are at least 100 documents with that
string.

Here is an example query for that phrase and the result set:
http://localhost:9020/solr/select/?q=Geckoplp4-M&version=2.2&start=0&row
s=10&indent=on&fl=comments,id



 0
 0
 
  10
  0
  on
  comments,id
  Geckoplp4-M
  2.2
 


 
  Geckoplp4-M
  m2816500
 
 
  toptrax recordings. Same tracks.
Geckoplp4-M
  m2816544
 
 
  Geckoplp4-M
  m2815903
 



Now here's an example of a search for two documents that I know have
that string, but were not returned in the previous search:
http://localhost:9020/solr/select/?q=id%3Am2816615+OR+id%3Am2816611&vers
ion=2.2&start=0&rows=10&indent=on&fl=id,comments



 0
 1
 
  10
  0
  on
  id,comments
  id:m2816615 OR id:m2816611
  2.2
 


 
  Geckoplp4-M
  m2816611
 
 
  Geckoplp4-M
  m2816615
 



Here is the definition for the "comments" field:


And here is the definition for a "text" field:

  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


Any ideas? Am I doing something wrong?

thanks,
Kevin


I can't delete, why?

2007-09-25 Thread Ben Shlomo, Yatir
Hi!
I know I can delete multiple docs with the following:
mediaId:(6720 OR 6721 OR  )

My question is can I do something like this?
languageId:123 AND manufacturer:456 
(It does not work for me and I didn't forget to commit)


How can I do it ? with copy field ?
languageIdmanufacturer:123456
Thanks
yatir


solved: quering UTF-8 encoded CSV files

2007-08-21 Thread Ben Shlomo, Yatir
My problem is resolved:

The problem happened on tomcat running on win xp

When indexing utf-encoded csv files

 

The conclusion is that setting URIEncoding="UTF-8" in the  section 
in server.xml is not enough

I also needed to add -Dfile.encoding=UTF-8 to the tomcat’s java startup options 
(in catalina.bat)

yatir

____

From: Ben Shlomo, Yatir [mailto:[EMAIL PROTECTED] 
Sent: Monday, August 20, 2007 6:40 PM
To: solr-user@lucene.apache.org
Subject: problem with quering solr after indexing UTF-8 encoded CSV files

 

Hi!

 

I have utf-8 encoded data inside a csv file (actually it’s a tab separated file 
- attached)

I can index it with no apparent errors

I did not forget to set this in my tomcat configuration

 

 


 
   

 

When I query  a document using the UTF-8 text I get zero matches: 

 

   

- 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=%D7%99%D7%AA%D7%99%D7%A8&version=2.2&start=0&rows=10&indent=on##>
  

- 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=%D7%99%D7%AA%D7%99%D7%A8&version=2.2&start=0&rows=10&indent=on##>
  

  0 

  0 

- 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=%D7%99%D7%AA%D7%99%D7%A8&version=2.2&start=0&rows=10&indent=on##>
  

  on 

  0 

יתיר // Note that - I can see the correct UTF-8 
text in it (hebrew characters)

  10 

  2.2 

  

  

   

  

 

 

When I observe this text in the response by querinig for *:*

I notice that the text does not appear as desired: יתיר instead of יתיר

Do you have any ideas?

Thanks…

 

Here is the response :

 

   

- 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on##>
  

- 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on##>
  

  0 

  0 

- 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on##>
  

  on 

  0 

  *:* 

  10 

  2.2 

  

  

- 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on##>
  

- 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on##>
  

  1 

  desc is a very good camera 

  display is יתיר ABC res123  

  1 

  1 

  ABC 

   res123  

  C123 

  123456 

  72900010123 

  

  

  

 

 

yatir



problem with quering solr after indexing UTF-8 encoded CSV files

2007-08-20 Thread Ben Shlomo, Yatir
Hi!

 

I have utf-8 encoded data inside a csv file (actually it’s a tab separated file 
- attached)

I can index it with no apparent errors

I did not forget to set this in my tomcat configuration

 

 


 
   

 

When I query  a document using the UTF-8 text I get zero matches: 

 

   

- 

  

- 

  

  0 

  0 

- 

  

  on 

  0 

יתיר // Note that - I can see the correct UTF-8 
text in it (hebrew characters)

  10 

  2.2 

  

  

   

  

 

 

When I observe this text in the response by querinig for *:*

I notice that the text does not appear as desired: יתיר instead of יתיר

Do you have any ideas?

Thanks…

 

Here is the response :

 

   

- 

  

- 

  

  0 

  0 

- 

  

  on 

  0 

  *:* 

  10 

  2.2 

  

  

- 

  

- 

  

  1 

  desc is a very good camera 

  display is יתיר ABC res123  

  1 

  1 

  ABC 

   res123  

  C123 

  123456 

  72900010123 

  

  

  

 

 

yatir



RE: question: how to divide the indexing into sperate domains

2007-08-11 Thread Ben Shlomo, Yatir
Thanks yonik!

I do have some unused fields inside the csv file.
But they are not empty.
They are numeric they can be anything between 0 to 10,000
Can I do something like
f.unused.map=*:98765 

yatir

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Thursday, August 09, 2007 10:41 PM
To: solr-user@lucene.apache.org
Subject: Re: question: how to divide the indexing into sperate domains

Hmmm, I think you can map an empty (zero length) value to something else
via
f.foo.map=:something
But that column does currently need to be there in the CSV.

Specifying default values in a per-request basis is interesting, and
something we could perhaps support in the future.
The quickest way to index your data right now would probably be to
change the file, adding another value at the end of each file.  I
think it could even be an empty value (just add a "," at the end of
each line), and then you could map that via
f.domain.map=:98765

btw, 300M records is a lot for one Solr instance... I hope you've got
a big box with a lot of memory, and aren't too concerned with your
query latency.  Otherwise you can do some partitioning by domain.

-Yonik

On 8/9/07, Ben Shlomo, Yatir <[EMAIL PROTECTED]> wrote:
> Hi!
>
> say I have 300 csv files that I need to index.
>
> Each one holds millions of lines (each line is a few fields separated
by
> commas)
>
> Each csv file represents a different domain of data (e,g, file1 is
> computers, file2 is flowers, etc)
>
> There is no indication of the domain ID in the data inside the csv
file
>
>
>
> When I search I would like to specify the id of a specific domain
>
> And I want solr to search only in this domain - to save time and
reduce
> the number of matches
>
> I need to specify during indexing - the domain id of the csv file
being
> indexed
>
> How do I do it ?
>
>
>
>
>
> Thanks
>
>
>
>
>
>
>
> p.s.
>
> I wish I could index like this:
>
> curl
>
http://localhost:8080/solr/update/csv?stream.file=test.csv&fieldnames=fi
> eld1,field2&f.domain.value=98765
>
<http://localhost:8080/solr/update/csv?stream.file=test.csv&fieldnames=f
> ield1,field2&f.domain.value=98765>  (where 98765 is the domain id for
> ths specific csv file)
>
>


question: how to divide the indexing into sperate domains

2007-08-09 Thread Ben Shlomo, Yatir
Hi!

say I have 300 csv files that I need to index. 

Each one holds millions of lines (each line is a few fields separated by
commas)

Each csv file represents a different domain of data (e,g, file1 is
computers, file2 is flowers, etc)

There is no indication of the domain ID in the data inside the csv file

 

When I search I would like to specify the id of a specific domain

And I want solr to search only in this domain - to save time and reduce
the number of matches

I need to specify during indexing - the domain id of the csv file being
indexed

How do I do it ?

 

 

Thanks 

 

 

 

p.s. 

I wish I could index like this:

curl
http://localhost:8080/solr/update/csv?stream.file=test.csv&fieldnames=fi
eld1,field2&f.domain.value=98765
  (where 98765 is the domain id for
ths specific csv file)



detecting duplicates using the field type 'text'

2007-02-14 Thread Ben Incani
Hi Solr users,

I have the following fields set in my 'schema.xml'.

*** schema.xml ***
 
  
  
  ...
 
 id
 document_title
 
*** schema.xml ***

When I add a document with a duplicate title, it gets duplicated (not
sure why)



 duplicate


 duplicate



When I add a document with a duplicate title (numeric only), it does not
get duplicated



 123


 123



I can ensure duplicates DO NOT get added when using the field type
'string'.
And I can also ensure that they DO get added when using .

Why is there a disparity detecting duplicates when using the field type
'text'?

Is this merely a documentation issue or have I missed something here...

Regards,

Ben


separate log files

2007-01-15 Thread Ben Incani
Hi Solr users,

I'm running multiple instances of Solr, which all using the same war
file to load from.

Below is an example of the servlet context file used for each
application.





Hence each application is using the same
WEB-INF/classes/logging.properties file to configure logging.

I would like to each instance to log to separate log files such as;
app1-solr.-mm-dd.log
app2-solr.-mm-dd.log
...

Is there an easy way to append the context path to
org.apache.juli.FileHandler.prefix
E.g. 
org.apache.juli.FileHandler.prefix = ${catalina.context}-solr.
 
Or would this require a code change?

Regards

-Ben


RE: base64 support & containers

2006-07-05 Thread Ben Incani
 

> -Original Message-
> From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, 6 July 2006 9:52 AM
> To: solr-user@lucene.apache.org
> Subject: Re: base64 support & containers
> 
> 
> : Does Solr support/or will in the future base64 encoded XML documents
so
> : that binary blobs can be added to the index?
> 
> I'm not sure if I'm understanding your question completely 
> ... if you have binary data that you want to shove into a 
> stored field, you should certinaly be able to base64 encode 
> it in your client and shove it into Solr using a "string" 
> field type -- but your use of hte phrase "base64 encoded XML 
> documents" has me thinkingthatthere is more to your question 
> involving an "advanced" use of XML that i'm not familiar with 
> -- can you elaborate?
> 
> 
> 
> -Hoss
> 

No - no advanced use of XML has been implemented.
One of the fields in the add request would contain the original binary
document encoded in base64, then this would preferably be decoded to
binary and placed into a lucene binary field, which would need to be
defined in Solr.

Thanks
Ben


base64 support & containers

2006-07-04 Thread Ben Incani
Hi Solr users,
 
Does Solr support/or will in the future base64 encoded XML documents so
that binary blobs can be added to the index?

I have been using this solr client by Darren Vengroff successfully.  It
easily plugs-in into the Solr package and could also use binary
functions in org.apache.solr.util.XML
http://issues.apache.org/jira/browse/SOLR-20

So far I have been storing binary data in the lucene index, I realise
this is not an optimal solution, but so far I have not found a java
container system to manage documents.  Can anyone recommend one?

Regards,
 
Ben


  1   2   >