from:"Liu"

Incorrect Guava version in maven repository

2019-03-19 Thread Amber Liu

Hi,

When I try to upgrade Guava that SOLR depends on, I notice the Guava
version listed in maven repository for SOLR is 14.0.1 (
https://mvnrepository.com/artifact/org.apache.solr/solr-core/8.0.0). I also
noticed that there is a Jira issue resolved in SOLR that upgraded Guava
dependency to 25.1(https://issues.apache.org/jira/browse/SOLR-11763). Is
the Guava version listed in maven repository correct? Which Guava version
does  SOLR 8.0.0 and 7.5.0 depends on?

Thanks,
Amber

RE: Cassandra Solr Integration, what driver to use?

2018-11-15 Thread Liu, Daphne

I use this fa jar for Solr 6.6.5
https://github.com/adejanovski/cassandra-jdbc-wrapper


Kind regards,

Daphne Liu
BI Architect • Big Data - Matrix SCM

CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 
USA / www.cevalogistics.com
T 904.5641192/ F 904.928.1525 / daphne@cevalogistics.com

Making business flow


-Original Message-
From: Ka Mok 
Sent: Thursday, November 15, 2018 4:26 PM
To: solr-user@lucene.apache.org
Subject: Cassandra Solr Integration, what driver to use?

I'm trying to do some data integration with a Cassandra 3.11.3 database with 
Solr 7.5

I've spent the past 2 days looking for the right driver, and hasn't found a 
single one other than some product offered by Datastax.

Is there really no way to use the default DataImportHandler?

In the Solr Admin console, it reads that 1 request is made, 0 received / 
processed/ skipped.

However, when I tail Cassandra, I see nothing was sent. I can confirm 
connection using a db GUI software like TablePlus or SQuirreLSQL.

Anyone have any ideas?

NVOCC Services are provided by CEVA as agents for and on behalf of Pyramid 
Lines Limited trading as Pyramid Lines.
This e-mail message is intended for the above named recipient(s) only. It may 
contain confidential information that is privileged. If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this e-mail and any attachment(s) is strictly 
prohibited. If you have received this e-mail by error, please immediately 
notify the sender by replying to this e-mail and deleting the message including 
any attachment(s) from your system. Thank you in advance for your cooperation 
and assistance. Although the company has taken reasonable precautions to ensure 
no viruses are present in this email, the company cannot accept responsibility 
for any loss or damage arising from the use of this email or attachments.

RE: 20180917-Need Apache SOLR support

2018-09-18 Thread Liu, Daphne

You have to increase your RAM. We have upgraded our Solr cluster to  12 solr 
nodes, each with 64G RAM, our shard size is around 25G, each server only hosts 
either one shard ( leading node or replica),  Performance is very good.
For better performance, memory needs to be over your shard size.

Kind regards,

Daphne Liu
BI Architect • Big Data - Matrix SCM

CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 
USA / www.cevalogistics.com
T 904.9281448 / F 904.928.1525 / daphne@cevalogistics.com

Making business flow

-Original Message-
From: zhenyuan wei 
Sent: Tuesday, September 18, 2018 3:12 AM
To: solr-user@lucene.apache.org
Subject: Re: 20180917-Need Apache SOLR support

I have 6 machines，and each machine run a solr server, each solr server use RAM 
18GB.  Total document number is 3.2billion，1.4TB ，
my collection‘s replica factor is 1。collection shard number is
 60，currently each shard is 20~30GB。
15 fields per document。 Query rate is slow now，maybe 100-500 requests per 
second.

Shawn Heisey  于2018年9月18日周二 下午12:07写道：

> On 9/17/2018 9:05 PM, zhenyuan wei wrote:
> > Is that means： Small amount of shards  gains  better performance？
> > I also have a usecase which contains 3 billion documents，the
> > collection contains 60 shard now. Is that 10 shard is better than 60 shard?
>
> There is no definite answer to this question.  It depends on a bunch
> of things.  How big is each shard once it's finally built?  What's
> your query rate?  How many machines do you have, and how much memory
> do those machines have?
>
> Thanks,
> Shawn
>
>

NVOCC Services are provided by CEVA as agents for and on behalf of Pyramid 
Lines Limited trading as Pyramid Lines.
This e-mail message is intended for the above named recipient(s) only. It may 
contain confidential information that is privileged. If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this e-mail and any attachment(s) is strictly 
prohibited. If you have received this e-mail by error, please immediately 
notify the sender by replying to this e-mail and deleting the message including 
any attachment(s) from your system. Thank you in advance for your cooperation 
and assistance. Although the company has taken reasonable precautions to ensure 
no viruses are present in this email, the company cannot accept responsibility 
for any loss or damage arising from the use of this email or attachments.

missing jmx stats for num_docs and max_doc

2018-08-29 Thread Zehua Liu

Hi,

We are running a 7.4.0 solr cluster with 3 tlogs and a few pulls. There is
one collection divided into 8 shards, with each tlog has all 8 shards, and
each pull either has shard1 to 4 or shard5 to 8.

When using jmx to collect num_docs metrics via datadog, we found that the
metrics for some shards are missing. For example, on one tlog, we saw only
num_docs stats for shard3/4/5/8 and on another shard1/2/3/4/5/8. There
seems to be more max_doc, but it's also missing for some shards.

This only happens to the tlog instances so far. Restarting the solr process
does not help.

Did anyone encounter this before? What should I do next to continue to
troubleshoot this?


Thanks,
Zehua

Exception writing document xxxxxx to the index; possible analysis error.

2018-07-11 Thread Liu, Daphne

)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:518)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is 
closed
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:740)
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:754)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1558)
at 
org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:279)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:211)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166)
... 62 more
Caused by: java.lang.ArrayIndexOutOfBoundsException
at 
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:125)
at 
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:116)
at 
org.apache.lucene.codecs.lucene54.Lucene54DocValuesProducer$CompressedBinaryDocValues$CompressedBinaryTermsEnum.readTerm(Lucene54DocValuesProducer.java:1349)
at 
org.apache.lucene.codecs.lucene54.Lucene54DocValuesProducer$CompressedBinaryDocValues$CompressedBinaryTermsEnum.next(Lucene54DocValuesProducer.java:1365)
at 
org.apache.lucene.index.MultiTermsEnum.pushTop(MultiTermsEnum.java:275)
at org.apache.lucene.index.MultiTermsEnum.next(MultiTermsEnum.java:301)
at 
org.apache.lucene.index.MultiDocValues$OrdinalMap.(MultiDocValues.java:527)
at 
org.apache.lucene.index.MultiDocValues$OrdinalMap.build(MultiDocValues.java:484)
at 
org.apache.lucene.codecs.DocValuesConsumer.mergeSortedField(DocValuesConsumer.java:638)
at 
org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:204)
at 
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:153)
at 
org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:167)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:111)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4312)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3889)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)



Kind regards,

Daphne Liu
BI Architect • Big Data - Matrix SCM

CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 
USA / www.cevalogistics.com
T 904.9281448 / F 904.928.1525 / daphne@cevalogistics.com

Making business flow


-Original Message-
From: Erick Erickson 
Sent: Wednesday, July 11, 2018 4:51 PM
To: solr-user 
Subject: Re: solr filter query on text field

bq.  is there any difference if the fq field is a string field vs test

Absolutely. string fields are not analyzed in any way. They're not tokenized. 
There are case sensitive. Etc. For example takd My dog as input. A string field 
will have a _single_ token "M

RE: Solr or Elasticsearch

2018-03-22 Thread Liu, Daphne

I used Solr + Cassandra for Document search. Solr works very well with document 
indexing.
For big data visualization, I use Elasticsearch + Grafana.
As for today, Grafana is not supporting Solr.
Elasticseach is very friendly and easy to use on multi-dimensional Group by and 
its real-time query performance is very good.
Grafana dashboard solution can be viewed @ 
https://grafana.com/dashboards/5204/edit


Kind regards,

Daphne Liu
BI Architect Big Data - Matrix SCM

CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 
USA / www.cevalogistics.com
T 904.9281448 / F 904.928.1525 / daphne@cevalogistics.com

Making business flow

-Original Message-
From: Steven White [mailto:swhite4...@gmail.com]
Sent: Thursday, March 22, 2018 9:14 AM
To: solr-user@lucene.apache.org
Subject: Solr or Elasticsearch

Hi everyone,

There are some good write ups on the internet comparing the two and the one 
thing that keeps coming up about Elasticsearch being superior to Solr is it's 
analytic capability.  However, I cannot find what those analytic capabilities 
are and why they cannot be done using Solr.  Can someone help me with this 
question?

Personally, I'm a Solr user and the thing that concerns me about Elasticsearch 
is the fact that it is owned by a company that can  any day decide to stop 
making Elasticsearch avaialble under Apache license and even completely close 
free access to it.

So, this is a 2 part question:

1) What are the analytic capability of Elasticsearch that cannot be done using 
Solr?  I want to see a complete list if possible.
2) Should an Elasticsearch user be worried that Elasticsearch may close it's 
open-source policy at anytime or that outsiders have no say about it's road map?

Thanks,

Steve

NVOCC Services are provided by CEVA as agents for and on behalf of Pyramid 
Lines Limited trading as Pyramid Lines.
This e-mail message is intended for the above named recipient(s) only. It may 
contain confidential information that is privileged. If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this e-mail and any attachment(s) is strictly 
prohibited. If you have received this e-mail by error, please immediately 
notify the sender by replying to this e-mail and deleting the message including 
any attachment(s) from your system. Thank you in advance for your cooperation 
and assistance. Although the company has taken reasonable precautions to ensure 
no viruses are present in this email, the company cannot accept responsibility 
for any loss or damage arising from the use of this email or attachments.

Solr deltaImportQuery ID configuration

2017-08-23 Thread Liu, Daphne

Hello,
   I am using Solr 6.3.0. Does anyone know in deltaImportQuery when referencing 
id, should I use '${dih.delta.id}' or '${dataimporter.delta.id} ?
   Both were mentioned in Delta-Import wiki. I am confused. Thank you.

Kind regards,

Daphne Liu
BI Architect - Matrix SCM

CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 
USA / www.cevalogistics.com<http://www.cevalogistics.com/> T 904.564.1192 / F 
904.928.1448 / daphne@cevalogistics.com<mailto:daphne@cevalogistics.com>



NVOCC Services are provided by CEVA as agents for and on behalf of Pyramid 
Lines Limited trading as Pyramid Lines.
This e-mail message is intended for the above named recipient(s) only. It may 
contain confidential information that is privileged. If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this e-mail and any attachment(s) is strictly 
prohibited. If you have received this e-mail by error, please immediately 
notify the sender by replying to this e-mail and deleting the message including 
any attachment(s) from your system. Thank you in advance for your cooperation 
and assistance. Although the company has taken reasonable precautions to ensure 
no viruses are present in this email, the company cannot accept responsibility 
for any loss or damage arising from the use of this email or attachments.

Solrcloud updating issue.

2017-06-29 Thread Wudong Liu

Hi All:
We are trying to index a large number of documents in solrcloud and keep
seeing the following error: org.apache.solr.common.SolrException: Service
Unavailable, or org.apache.solr.common.SolrException: Service Unavailable

but with a similar stack:

request: http://wp-np2-c0:8983/solr/uniprot/update?wt=javabin=2
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:320)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$57/936653983.run(Unknown
Source)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


the settings are:
5 nodes in the cluster with each 16g memory, for the collection, it is
defined with 5 shards, and replicate factor 2. the total number of
documents is about 90m, each document size is quite large as well.
we have also 5 zookeeper instances running on each node.

On the solr side, we can see error like:
solr.log.3-Error from server at
http://wp-np2-c4.ebi.ac.uk:8983/solr/uniprot_shard5_replica1: Server Error
solr.log.3-request:
http://wp-np2-c4.ebi.ac.uk:8983/solr/uniprot_shard5_replica1/update?update.distrib=TOLEADER=http%3A%2F%2Fwp-np2-c0.ebi.ac.uk%3A8983%2Fsolr%2Funiprot_shard2_replica1%2F=javabin=2
solr.log.3-Remote error message: Async exception during distributed update:
Connect to wp-np2-c2.ebi.ac.uk:8983 timed out
solr.log.3- at
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:948)
solr.log.3- at
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1679)
solr.log.3- at
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)
--
solr.log.3- at
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
solr.log.3- at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
solr.log.3- at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
solr.log.3- at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
solr.log.3- at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
solr.log.3- at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
solr.log.3- at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
solr.log.3- at java.lang.Thread.run(Thread.java:745)


The strange bit is this exception doesn't seem to be captured by the
try/catch block in our main thread. and the cluster seems in the good
health (all nodes up) after the job done, we just missing lots of
documents!

any suggestion where we should look to resolve this problem?

Best Regards,
Wudong

Can solrcloud be running on a read-only filesystem?

2017-06-02 Thread Wudong Liu

Hi All:

We have a normal build/stage -> prod settings for our production pipeline.
And we would build solr index in the build environment and then the index
is copied to the prod environment.

The solrcloud in prod seems working fine when the file system backing it is
writable. However, we see many errors when the file system is readonly.
Many exceptions are thrown regarding the tlog file cannot be open for write
when the solr nodes are restarted with the new data; some of the nodes
eventually are stuck in the recovering phase and never able to go back
online in the cloud.

Just wondering is anyone has any experience on Solrcloud running in
readonly file system? Is it possible at all?

Regards,
Wudong

RE: Data Import

2017-03-17 Thread Liu, Daphne

NO, I use the free version. I have the driver from someone else. I can share it 
if you want to use Cassandra.
They have modified it for me since the free JDBC driver I found will timeout 
when the document is greater than 16mb.

Kind regards,

Daphne Liu
BI Architect - Matrix SCM

CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 
USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 / 
daphne@cevalogistics.com



-Original Message-
From: vishal jain [mailto:jain02...@gmail.com]
Sent: Friday, March 17, 2017 12:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Data Import

Hi Daphne,

Are you using DSE?


Thanks & Regards,
Vishal

On Fri, Mar 17, 2017 at 7:40 PM, Liu, Daphne <daphne@cevalogistics.com>
wrote:

> I just want to share my recent project. I have successfully sent all
> our EDI documents to Cassandra 3.7 clusters using Solr 6.3 Data Import
> JDBC Cassandra connector indexing our documents.
> Since Cassandra is so fast for writing, compression rate is around 13%
> and all my documents can be keep in my Cassandra clusters' memory, we
> are very happy with the result.
>
>
> Kind regards,
>
> Daphne Liu
> BI Architect - Matrix SCM
>
> CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL
> 32256 USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 /
> daphne@cevalogistics.com
>
>
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Friday, March 17, 2017 9:54 AM
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Data Import
>
> I feel DIH is much better for prototyping, even though people do use
> it in production. If you do want to use DIH, you may benefit from
> reviewing the DIH-DB example I am currently rewriting in
> https://issues.apache.org/jira/browse/SOLR-10312 (may need to change
> luceneMatchVersion in solrconfig.xml first).
>
> CSV, etc, could be useful if you want to keep history of past imports,
> again useful during development, as you evolve schema.
>
> SolrJ may actually be easiest/best for production since you already
> have Java stack.
>
> The choice is yours in the end.
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and
> experienced
>
>
> On 17 March 2017 at 08:56, Shawn Heisey <apa...@elyograg.org> wrote:
> > On 3/17/2017 3:04 AM, vishal jain wrote:
> >> I am new to Solr and am trying to move data from my RDBMS to Solr.
> >> I
> know the available options are:
> >> 1) Post Tool
> >> 2) DIH
> >> 3) SolrJ (as ours is a J2EE application).
> >>
> >> I want to know what is the recommended way for Data import in
> >> production environment. Will sending data via SolrJ in batches be
> faster than posting a csv using POST tool?
> >
> > I've heard that CSV import runs EXTREMELY fast, but I have never
> > tested it.  The same threading problem that I discuss below would
> > apply to indexing this way.
> >
> > DIH is extremely powerful, but it has one glaring problem:  It's
> > single-threaded, which means that only one stream of data is going
> > into Solr, and each batch of documents to be inserted must wait for
> > the previous one to finish inserting before it can start.  I do not
> > know if DIH batches documents or sends them in one at a time.  If
> > you have a manually sharded index, you can run DIH on each shard in
> > parallel, but each one will be single-threaded.  That single thread
> > is pretty efficient, but it's still only one thread.
> >
> > Sending multiple index updates to Solr in parallel (multi-threading)
> > is how you radically speed up the Solr part of indexing.  This is
> > usually done with a custom indexing program, which might be written
> > with SolrJ or even in a completely different language.
> >
> > One thing to keep in mind with ANY indexing method:  Once the
> > situation is examined closely, most people find that it's not Solr
> > that makes their indexing slow.  The bottleneck is usually the
> > source system -- how quickly the data can be retrieved.  It usually
> > takes a lot longer to obtain the data than it does for Solr to index it.
> >
> > Thanks,
> > Shawn
> >
> This e-mail message is intended for the above named recipient(s) only.
> It may contain confidential information that is privileged. If you are
> not the intended recipient, you are hereby notified that any
> dissemination, distribution or copying of this e-mail and any
> attachment(s) is strictly prohibited. If you have received this e-mail
> by error, please immediately noti

RE: Data Import

2017-03-17 Thread Liu, Daphne

I just want to share my recent project. I have successfully sent all our EDI 
documents to Cassandra 3.7 clusters using Solr 6.3 Data Import JDBC Cassandra 
connector indexing our documents.
Since Cassandra is so fast for writing, compression rate is around 13% and all 
my documents can be keep in my Cassandra clusters' memory, we are very happy 
with the result.


Kind regards,

Daphne Liu
BI Architect - Matrix SCM

CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 
USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 / 
daphne@cevalogistics.com



-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Sent: Friday, March 17, 2017 9:54 AM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Data Import

I feel DIH is much better for prototyping, even though people do use it in 
production. If you do want to use DIH, you may benefit from reviewing the 
DIH-DB example I am currently rewriting in
https://issues.apache.org/jira/browse/SOLR-10312 (may need to change 
luceneMatchVersion in solrconfig.xml first).

CSV, etc, could be useful if you want to keep history of past imports, again 
useful during development, as you evolve schema.

SolrJ may actually be easiest/best for production since you already have Java 
stack.

The choice is yours in the end.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 17 March 2017 at 08:56, Shawn Heisey <apa...@elyograg.org> wrote:
> On 3/17/2017 3:04 AM, vishal jain wrote:
>> I am new to Solr and am trying to move data from my RDBMS to Solr. I know 
>> the available options are:
>> 1) Post Tool
>> 2) DIH
>> 3) SolrJ (as ours is a J2EE application).
>>
>> I want to know what is the recommended way for Data import in
>> production environment. Will sending data via SolrJ in batches be faster 
>> than posting a csv using POST tool?
>
> I've heard that CSV import runs EXTREMELY fast, but I have never
> tested it.  The same threading problem that I discuss below would
> apply to indexing this way.
>
> DIH is extremely powerful, but it has one glaring problem:  It's
> single-threaded, which means that only one stream of data is going
> into Solr, and each batch of documents to be inserted must wait for
> the previous one to finish inserting before it can start.  I do not
> know if DIH batches documents or sends them in one at a time.  If you
> have a manually sharded index, you can run DIH on each shard in
> parallel, but each one will be single-threaded.  That single thread is
> pretty efficient, but it's still only one thread.
>
> Sending multiple index updates to Solr in parallel (multi-threading)
> is how you radically speed up the Solr part of indexing.  This is
> usually done with a custom indexing program, which might be written
> with SolrJ or even in a completely different language.
>
> One thing to keep in mind with ANY indexing method:  Once the
> situation is examined closely, most people find that it's not Solr
> that makes their indexing slow.  The bottleneck is usually the source
> system -- how quickly the data can be retrieved.  It usually takes a
> lot longer to obtain the data than it does for Solr to index it.
>
> Thanks,
> Shawn
>
This e-mail message is intended for the above named recipient(s) only. It may 
contain confidential information that is privileged. If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this e-mail and any attachment(s) is strictly 
prohibited. If you have received this e-mail by error, please immediately 
notify the sender by replying to this e-mail and deleting the message including 
any attachment(s) from your system. Thank you in advance for your cooperation 
and assistance. Although the company has taken reasonable precautions to ensure 
no viruses are present in this email, the company cannot accept responsibility 
for any loss or damage arising from the use of this email or attachments.

RE: Data Import Handler on 6.4.1

2017-03-15 Thread Liu, Daphne

For Solr 6.3,  I have to move mine to 
../solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib. If you are using jetty.

Kind regards,

Daphne Liu
BI Architect - Matrix SCM

CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 
USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 / 
daphne@cevalogistics.com


-Original Message-
From: Michael Tobias [mailto:mtob...@btinternet.com]
Sent: Wednesday, March 15, 2017 2:36 PM
To: solr-user@lucene.apache.org
Subject: Data Import Handler on 6.4.1

I am sure I am missing something simple but

I am running Solr 4.8.1 and trialling 6.4.1 on another computer.

I have had to manually modify the automatic 6.4.1 scheme config as we use a set 
of specialised field types.  They work fine.

I am now trying to populate my core with data and having problems.

Exactly what names/paths should I be using in the solrconfig.xml file to get 
this working - I don’t recall doing ANYTHING for 4.8.1


   ?

And where do I put the mysql-connector-java-5.1.29-bin.jar file and how do I 
reference it to get it loaded?


??

And then later in the solrconfig.xml I have:


  
db-data-config.xml
  



Any help much appreciated.

Regards

Michael


-Original Message-
From: David Hastings [mailto:hastings.recurs...@gmail.com]
Sent: 15 March 2017 17:47
To: solr-user@lucene.apache.org
Subject: Re: Get handler not working

from your previous email:
"There is no "id"
field defined in the schema."

you need an id field to use the get handler

On Wed, Mar 15, 2017 at 1:45 PM, Chris Ulicny <culicny@iq.media> wrote:

> I thought that "id" and "ids" were fixed parameters for the get
> handler, but I never remember, so I've already tried both. Each time
> it comes back with the same response of no document.
>
> On Wed, Mar 15, 2017 at 1:31 PM Alexandre Rafalovitch
> <arafa...@gmail.com>
> wrote:
>
> > Actually.
> >
> > I think Real Time Get handler has "id" as a magical parameter, not
> > as a field name. It maps to the real id field via the uniqueKey
> > definition:
> > https://cwiki.apache.org/confluence/display/solr/RealTime+Get
> >
> > So, if you have not, could you try the way you originally wrote it.
> >
> > Regards,
> >Alex.
> > 
> > http://www.solr-start.com/ - Resources for Solr users, new and
> experienced
> >
> >
> > On 15 March 2017 at 13:22, Chris Ulicny <culicny@iq.media> wrote:
> > > Sorry, that is a typo. The get is using the iqdocid field. There
> > > is no
> > "id"
> > > field defined in the schema.
> > >
> > > solr/TestCollection/get?iqdocid=2957-TV-201604141900
> > >
> > > solr/TestCollection/select?q=*:*=iqdocid:2957-TV-201604141900
> > >
> > > On Wed, Mar 15, 2017 at 1:15 PM Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > >> Is this a typo or are you trying to use get with an "id" field
> > >> and your filter query uses "iqdocid"?
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Wed, Mar 15, 2017 at 8:31 AM, Chris Ulicny <culicny@iq.media>
> wrote:
> > >> > Yes, we're using a fixed schema with the iqdocid field set as
> > >> > the
> > >> uniqueKey.
> > >> >
> > >> > On Wed, Mar 15, 2017 at 11:28 AM Alexandre Rafalovitch <
> > >> arafa...@gmail.com>
> > >> > wrote:
> > >> >
> > >> >> What is your uniqueKey? Is it iqdocid?
> > >> >>
> > >> >> Regards,
> > >> >>Alex.
> > >> >> 
> > >> >> http://www.solr-start.com/ - Resources for Solr users, new and
> > >> experienced
> > >> >>
> > >> >>
> > >> >> On 15 March 2017 at 11:24, Chris Ulicny <culicny@iq.media> wrote:
> > >> >> > Hi,
> > >> >> >
> > >> >> > I've been trying to use the get handler for a new solr cloud
> > >> collection
> > >> >> we
> > >> >> > are using, and something seems to be amiss.
> > >> >> >
> > >> >> > We are running 6.3.0, so we did not explicitly define the
> > >> >> > request
> > >> handler
> > >> >> > in the solrconfig since it's supposed to be implicitly defined.
> We
> > >> also
> > >> >> > have the update log enabled with the defaul

Delta Import JDBC connection frame size larger than max length

2017-03-01 Thread Liu, Daphne

Hello Solr experts,
   Is there a place in Solr   (Delta Import Datasource?) where I can adjust the 
JDBC connection  frame size to 256 mb ? I have adjusted the settings in 
Cassandra but I'm still getting this error.
   NonTransientConnectionException: 
org.apache.thrift.transport.TTransportException: Frame size (17676563) larger 
than max length (16384000
   Thank you.

Kind regards,

Daphne Liu
BI Architect - Matrix SCM

CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 
USA / www.cevalogistics.com<http://www.cevalogistics.com/> T 904.564.1192 / F 
904.928.1448 / daphne@cevalogistics.com<mailto:daphne@cevalogistics.com>


This e-mail message is intended for the above named recipient(s) only. It may 
contain confidential information that is privileged. If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this e-mail and any attachment(s) is strictly 
prohibited. If you have received this e-mail by error, please immediately 
notify the sender by replying to this e-mail and deleting the message including 
any attachment(s) from your system. Thank you in advance for your cooperation 
and assistance. Although the company has taken reasonable precautions to ensure 
no viruses are present in this email, the company cannot accept responsibility 
for any loss or damage arising from the use of this email or attachments.

Query/Field Index Analysis corrected but return no docs in search

2017-02-04 Thread Peter Liu

hi all:
   I was using solr 3.6 and tried to solve a recall-problem today , but
encountered a weird problem.

   There's doc with field value : 均匀肤色, (just treated that word as a symbol
if you don't know it, I just want to describe the problem as exact as
possible).


   And below was the analysis result ( tokenization) :

  [image: Inline image 2]

  ( and text-version if need.

Index Analyzer
均匀肤色 均匀 匀肤 肤色
均匀肤色 均匀 匀肤 肤色
均匀肤色 均匀 匀肤 肤色
Query Analyzer
均匀肤色
均匀肤色
均匀肤色
均匀肤色


 The tokenization result indicate the query will recall/hit the doc
undoubtedly. But the doc did not appear in the result if I search with
"均匀肤色". I tried to simplify the qf/bf/fq/q, just test it with single field
and single document, to make sure it was not caused by other problems but
failed.


It's knotty to debug because it only reproduced in

product environments, I tried same config/index/query but not produce in
dev environment. I'm here ask for helps if you met similar problem, or any
clues/debug-method will be really helped.

RE: how to sampling search result

2016-09-28 Thread Yongtao Liu

Alexandre,

Thanks for reply.
The use case is customer want to review document based on search result.
But they do not want to review all, since it is costly.
So, they want to pick partial (from 1% to 100%) document to review.
For statistics, user also ask this function.
It is kind of common requirement
Do you know any plan to implement this feature in future?

Post filter should work. Like collapsing query parser.

Thanks,
Yongtao
-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Tuesday, September 27, 2016 9:25 PM
To: solr-user
Subject: Re: how to sampling search result

I am not sure I understand what the business case is. However, you might be 
able to do something with a custom post-filter.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 27 September 2016 at 22:29, Yongtao Liu <y...@commvault.com> wrote:
> Mikhail,
>
> Thanks for your reply.
>
> Random field is based on index time.
> We want to do sampling based on search result.
>
> Like if the random field has value 1 - 100.
> And the query touched documents may all in range 90 - 100.
> So random field will not help.
>
> Is it possible we can sampling based on search result?
>
> Thanks,
> Yongtao
> -Original Message-
> From: Mikhail Khludnev [mailto:m...@apache.org]
> Sent: Tuesday, September 27, 2016 11:16 AM
> To: solr-user
> Subject: Re: how to sampling search result
>
> Perhaps, you can apply a filter on random field.
>
> On Tue, Sep 27, 2016 at 5:57 PM, googoo <liu...@gmail.com> wrote:
>
>> Hi,
>>
>> Is it possible I can sampling based on  "search result"?
>> Like run query first, and search result return 1 million documents.
>> With random sampling, 50% (500K) documents return for facet, and stats.
>>
>> The sampling need based on "search result".
>>
>> Thanks,
>> Yongtao
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.
>> nabble.com/how-to-sampling-search-result-tp4298269.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev

RE: how to remove duplicate from search result

2016-09-27 Thread Yongtao Liu

Shamik,

Thanks a lot.
Collapsing query parser solve the issue.

Thanks,
Yongtao
-Original Message-
From: shamik [mailto:sham...@gmail.com] 
Sent: Tuesday, September 27, 2016 3:09 PM
To: solr-user@lucene.apache.org
Subject: RE: how to remove duplicate from search result

Did you take a look at Collapsin Query Parser ?

https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-remove-duplicate-from-search-result-tp4298272p4298305.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: how to remove duplicate from search result

2016-09-27 Thread Yongtao Liu

David,

Thanks for your reply.

Group cannot solve the issue.
We also need run facet and stats based on search result.
With group, facet and stats result still count duplicate.

Thanks,
Yongtao
-Original Message-
From: David Santamauro [mailto:david.santama...@gmail.com] 
Sent: Tuesday, September 27, 2016 11:35 AM
To: solr-user@lucene.apache.org
Cc: david.santama...@gmail.com
Subject: Re: how to remove duplicate from search result

Have a look at

https://cwiki.apache.org/confluence/display/solr/Result+Grouping


On 09/27/2016 11:03 AM, googoo wrote:
> hi,
>
> We want to provide remove duplicate from search result function.
>
> like we have below documents.
> id(uniqueKey) guid
> doc1  G1
> doc2  G2
> doc3  G3
> doc4  G1
>
> user run one query and hit doc1, doc2 and doc4.
> user want to remove duplicate from search result based on guid field.
> since doc1 and doc4 has same guid, one of them should be drop from 
> search result.
>
> how we can address this requirement?
>
> Thanks,
> Yongtao
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-to-remove-duplicate-from-search
> -result-tp4298272.html Sent from the Solr - User mailing list archive 
> at Nabble.com.
>

RE: how to sampling search result

2016-09-27 Thread Yongtao Liu

Mikhail,

Thanks for your reply.

Random field is based on index time.
We want to do sampling based on search result.

Like if the random field has value 1 - 100.
And the query touched documents may all in range 90 - 100.
So random field will not help.

Is it possible we can sampling based on search result?

Thanks,
Yongtao
-Original Message-
From: Mikhail Khludnev [mailto:m...@apache.org] 
Sent: Tuesday, September 27, 2016 11:16 AM
To: solr-user
Subject: Re: how to sampling search result

Perhaps, you can apply a filter on random field.

On Tue, Sep 27, 2016 at 5:57 PM, googoo <liu...@gmail.com> wrote:

> Hi,
>
> Is it possible I can sampling based on  "search result"?
> Like run query first, and search result return 1 million documents.
> With random sampling, 50% (500K) documents return for facet, and stats.
>
> The sampling need based on "search result".
>
> Thanks,
> Yongtao
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/how-to-sampling-search-result-tp4298269.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

-- 
Sincerely yours
Mikhail Khludnev

remove user defined duplicate from search result

2016-09-26 Thread Yongtao Liu

Hi,

I am try to remove user defined duplicate from search result.

like below documents match the query.
when query return, I try to remove doc3 from result since it has duplicate guid 
with doc1.

Id (uniqueKey)

guid

doc1

G1

doc2

G2

doc3

G1


To do this, I generate exclude list based guid field terms.
For each term, we add from the second document to exclude list.
And add these docs to QueryCommand filter.

If there any better approach to handler this requirement?


Below is code change in SolrIndexSearcer.java

  private TreeMap dupDocs = null;

  public QueryResult search(QueryResult qr, QueryCommand cmd) throws 
IOException {
if (cmd.getUniqueField() != null)
{
  DocSet filter = getDuplicateByField(cmd.getUniqueField());
  if (cmd.getFilter() != null) cmd.getFilter().addAllTo(filter);
  cmd.setFilter(filter);
}

getDocListC(qr,cmd);

return qr;
  }

  private synchronized BitDocSet getDuplicateByField(String field) throws 
IOException
  {
if (dupDocs != null && dupDocs.containsKey(field)) {
  return dupDocs.get(field);
}

if (dupDocs == null)
{
  dupDocs = new TreeMap();
}

LeafReader reader = getLeafReader();

BitDocSet res = new BitDocSet(new FixedBitSet(maxDoc()));

Terms terms = reader.terms(field);

if (terms == null)
{
  dupDocs.put(field, res);
  return res;
}

TermsEnum termEnum = terms.iterator();
PostingsEnum docs = null;
BytesRef term = null;
while ((term = termEnum.next()) != null) {
  docs = termEnum.postings(docs, PostingsEnum.NONE);

  // slip first document
  docs.nextDoc();

  int docID = 0;
  while ((docID = docs.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS)
  {
res.add(docID);
  }
}

dupDocs.put(field, res);
return res;
  }

Thanks,
Yongtao

RE: remove user defined duplicate from search result

2016-09-26 Thread Yongtao Liu

Sorry, the table is missing.
Update below email with table.

-Original Message-
From: Yongtao Liu [mailto:y...@commvault.com] 
Sent: Monday, September 26, 2016 10:47 AM
To: 'solr-user@lucene.apache.org'
Subject: remove user defined duplicate from search result

Hi,

I am try to remove user defined duplicate from search result.

like below documents match the query.
when query return, I try to remove doc3 from result since it has duplicate guid 
with doc1.

id(uniqueKey)   guid
doc1G1
doc2G2
doc2G1

To do this, I generate exclude list based guid field terms.
For each term, we add from the second document to exclude list.
And add these docs to QueryCommand filter.

If there any better approach to handler this requirement?


Below is code change in SolrIndexSearcer.java

  private TreeMap<String, BitDocSet> dupDocs = null;

  public QueryResult search(QueryResult qr, QueryCommand cmd) throws 
IOException {
if (cmd.getUniqueField() != null)
{
  DocSet filter = getDuplicateByField(cmd.getUniqueField());
  if (cmd.getFilter() != null) cmd.getFilter().addAllTo(filter);
  cmd.setFilter(filter);
}

getDocListC(qr,cmd);

return qr;
  }

  private synchronized BitDocSet getDuplicateByField(String field) throws 
IOException
  {
if (dupDocs != null && dupDocs.containsKey(field)) {
  return dupDocs.get(field);
}

if (dupDocs == null)
{
  dupDocs = new TreeMap<String, BitDocSet>();
}

LeafReader reader = getLeafReader();

BitDocSet res = new BitDocSet(new FixedBitSet(maxDoc()));

Terms terms = reader.terms(field);

if (terms == null)
{
  dupDocs.put(field, res);
  return res;
}

TermsEnum termEnum = terms.iterator();
PostingsEnum docs = null;
BytesRef term = null;
while ((term = termEnum.next()) != null) {
  docs = termEnum.postings(docs, PostingsEnum.NONE);

  // slip first document
  docs.nextDoc();

  int docID = 0;
  while ((docID = docs.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS)
  {
res.add(docID);
  }
}

dupDocs.put(field, res);
return res;
  }

Thanks,
Yongtao

RE: Errors for Streaming Expressions using JDBC (Oracle) stream source

2016-06-23 Thread Hui Liu

Opened ticket: Issue SOLR-9246 - Errors for Streaming Expressions using JDBC 
(Oracle) stream source

Regards,
Hui
-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Thursday, June 23, 2016 11:56 AM
To: solr-user@lucene.apache.org
Subject: Re: Errors for Streaming Expressions using JDBC (Oracle) stream source

Ok you should be able to create the jira.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Jun 23, 2016 at 11:52 AM, Hui Liu <h...@opentext.com> wrote:

> Joel, I just opened an account for this, my user name is 
> h...@opentext.com; let me know when I can open the ticket.
>
> And thanks for the info, I will be glad to do any collaboration needed 
> as a reporter on this issue, so feel free to let me know what I need to do.
>
> Regards,
> Hui
>
> -Original Message-
> From: Joel Bernstein [mailto:joels...@gmail.com]
> Sent: Thursday, June 23, 2016 11:23 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Errors for Streaming Expressions using JDBC (Oracle) 
> stream source
>
> Sure. You can create a ticket from here
>
> https://issues.apache.org/jira/browse/SOLR/?selectedTab=com.atlassian.
> jira.jira-projects-plugin:summary-panel
>
> After you've created an account I'll need to add your username to the 
> contributors group. If you post your username back to this thread I'll 
> do that.
>
> Then you can open a ticket.
>
> This particular issue will require access to an Oracle database so it 
> will likely be handled as a collaboration between the reporter and a 
> committer, because not all committers are going to have access to Oracle.
>
> DIH will accomplish the data load for you.
>
> The JDBCStream can be used to do things like joins involving RDMBS and 
> Solr.
>
>
>
>
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Jun 23, 2016 at 11:06 AM, Hui Liu <h...@opentext.com> wrote:
>
> > Thanks Joel, I have never opened a ticket before with Solr, do you 
> > know the steps (url etc) I should follow? I will be glad to do so...
> > At the meantime, I guess the workaround is to use 'data import 
> > handler' to get the data from Oracle into Solr?
> >
> > Regards,
> > Hui
> > -Original Message-
> > From: Joel Bernstein [mailto:joels...@gmail.com]
> > Sent: Thursday, June 23, 2016 10:55 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Errors for Streaming Expressions using JDBC (Oracle) 
> > stream source
> >
> > Let's open a ticket for this issue specific to Oracle.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Thu, Jun 23, 2016 at 10:54 AM, Joel Bernstein 
> > <joels...@gmail.com>
> > wrote:
> >
> > > I think we're going to have to add some debugging into the code to 
> > > find what's going on. On line 225 in JDBCStream it's getting the 
> > > class name for each column. It would be good know what the class 
> > > names are that the Oracles driver is returning.
> > >
> > >
> > > https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.0.
> > > 0/
> > > solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/JDBCStream.
> > > java
> > >
> > > We probably need to throw an exception that includes the class 
> > > name to help users report what different drivers using for the classes.
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Thu, Jun 23, 2016 at 10:18 AM, Hui Liu <h...@opentext.com> wrote:
> > >
> > >> Joel - thanks for the quick response, in my previous test, the 
> > >> collection 'document5' does have a field called 'date_created'
> > >> which is type 'date', even though my SQL SELECT below did not 
> > >> select any un-supported data type (all columns are either long or 
> > >> String in jdbc type); but to totally rule out this issue, I 
> > >> created a new collection 'document6' which only contain long and 
> > >> string data type, and a new Oracle table 'document6' that only 
> > >> contain columns whose jdbc type is long and string, see below for 
> > >> schema.xml
> and table definition:
> > >>
> > >> schema.xml for Solr collection 'document6': (newly created empty 
> > >> collections with 2 shards)
> > >>
> > >> =
> > >> == == = 
> > >>   
> > >>  
> > >>

RE: Errors for Streaming Expressions using JDBC (Oracle) stream source

2016-06-23 Thread Hui Liu

Joel, I just opened an account for this, my user name is h...@opentext.com; let 
me know when I can open the ticket.

And thanks for the info, I will be glad to do any collaboration needed as a 
reporter on this issue, so feel free to let me know what I need to do.

Regards,
Hui

-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Thursday, June 23, 2016 11:23 AM
To: solr-user@lucene.apache.org
Subject: Re: Errors for Streaming Expressions using JDBC (Oracle) stream source

Sure. You can create a ticket from here
https://issues.apache.org/jira/browse/SOLR/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel

After you've created an account I'll need to add your username to the 
contributors group. If you post your username back to this thread I'll do that.

Then you can open a ticket.

This particular issue will require access to an Oracle database so it will 
likely be handled as a collaboration between the reporter and a committer, 
because not all committers are going to have access to Oracle.

DIH will accomplish the data load for you.

The JDBCStream can be used to do things like joins involving RDMBS and Solr.









Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Jun 23, 2016 at 11:06 AM, Hui Liu <h...@opentext.com> wrote:

> Thanks Joel, I have never opened a ticket before with Solr, do you 
> know the steps (url etc) I should follow? I will be glad to do so...
> At the meantime, I guess the workaround is to use 'data import 
> handler' to get the data from Oracle into Solr?
>
> Regards,
> Hui
> -Original Message-
> From: Joel Bernstein [mailto:joels...@gmail.com]
> Sent: Thursday, June 23, 2016 10:55 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Errors for Streaming Expressions using JDBC (Oracle) 
> stream source
>
> Let's open a ticket for this issue specific to Oracle.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Jun 23, 2016 at 10:54 AM, Joel Bernstein <joels...@gmail.com>
> wrote:
>
> > I think we're going to have to add some debugging into the code to 
> > find what's going on. On line 225 in JDBCStream it's getting the 
> > class name for each column. It would be good know what the class 
> > names are that the Oracles driver is returning.
> >
> >
> > https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.0.
> > 0/ 
> > solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/JDBCStream.
> > java
> >
> > We probably need to throw an exception that includes the class name 
> > to help users report what different drivers using for the classes.
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Thu, Jun 23, 2016 at 10:18 AM, Hui Liu <h...@opentext.com> wrote:
> >
> >> Joel - thanks for the quick response, in my previous test, the 
> >> collection 'document5' does have a field called 'date_created' 
> >> which is type 'date', even though my SQL SELECT below did not 
> >> select any un-supported data type (all columns are either long or 
> >> String in jdbc type); but to totally rule out this issue, I created 
> >> a new collection 'document6' which only contain long and string 
> >> data type, and a new Oracle table 'document6' that only contain 
> >> columns whose jdbc type is long and string, see below for schema.xml and 
> >> table definition:
> >>
> >> schema.xml for Solr collection 'document6': (newly created empty 
> >> collections with 2 shards)
> >>
> >> ===
> >> == = 
> >>   
> >>  
> >>  
> >>   >> sortMissingLast="true" docValues="true" />
> >>   >> precisionStep="0" positionIncrementGap="0"/>
> >>  
> >> 
> >>
> >> 
> >>   
> >>>> sortMissingLast="true" omitNorms="true"/>
> >>
> >>
> >>   >> multiValued="false"/>
> >>   >> docValues="true"/>
> >>   >> stored="true" docValues="true"/>
> >>   >> stored="true" docValues="true"/>
> >>   >> stored="true" docValues="true"/>
> >>   >> stored="true" docValues="true"/>
> >>
> >>   document_id
> >>   document_id
> >> 
> >>
> >> Oracle table 'document6': (newly create

RE: Errors for Streaming Expressions using JDBC (Oracle) stream source

2016-06-23 Thread Hui Liu

Thanks Joel, I have never opened a ticket before with Solr, do you know the 
steps (url etc) I should follow? I will be glad to do so...
At the meantime, I guess the workaround is to use 'data import handler' to get 
the data from Oracle into Solr?

Regards,
Hui
-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Thursday, June 23, 2016 10:55 AM
To: solr-user@lucene.apache.org
Subject: Re: Errors for Streaming Expressions using JDBC (Oracle) stream source

Let's open a ticket for this issue specific to Oracle.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Jun 23, 2016 at 10:54 AM, Joel Bernstein <joels...@gmail.com> wrote:

> I think we're going to have to add some debugging into the code to 
> find what's going on. On line 225 in JDBCStream it's getting the class 
> name for each column. It would be good know what the class names are 
> that the Oracles driver is returning.
>
>
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.0.0/
> solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/JDBCStream.
> java
>
> We probably need to throw an exception that includes the class name to 
> help users report what different drivers using for the classes.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Jun 23, 2016 at 10:18 AM, Hui Liu <h...@opentext.com> wrote:
>
>> Joel - thanks for the quick response, in my previous test, the 
>> collection 'document5' does have a field called 'date_created' which 
>> is type 'date', even though my SQL SELECT below did not select any 
>> un-supported data type (all columns are either long or String in jdbc 
>> type); but to totally rule out this issue, I created a new collection 
>> 'document6' which only contain long and string data type, and a new 
>> Oracle table 'document6' that only contain columns whose jdbc type is 
>> long and string, see below for schema.xml and table definition:
>>
>> schema.xml for Solr collection 'document6': (newly created empty 
>> collections with 2 shards)
>>
>> =
>> = 
>>   
>>  
>>  
>>  > sortMissingLast="true" docValues="true" />
>>  > precisionStep="0" positionIncrementGap="0"/>
>>  
>> 
>>
>> 
>>   
>>   > sortMissingLast="true" omitNorms="true"/>
>>
>>
>>  > multiValued="false"/>
>>  > docValues="true"/>
>>  > stored="true" docValues="true"/>
>>  > stored="true" docValues="true"/>
>>  > stored="true" docValues="true"/>
>>  > stored="true" docValues="true"/>
>>
>>   document_id
>>   document_id
>> 
>>
>> Oracle table 'document6': (newly created Oracle table with 9 records) 
>> ==
>> QA_DOCREP@qlgdb1 > desc document6
>>  Name  Null?Type
>>  - 
>> 
>>  DOCUMENT_ID   NOT NULL NUMBER(12)
>>  SENDER_MSG_DESTVARCHAR2(256)
>>  RECIP_MSG_DEST VARCHAR2(256)
>>  DOCUMENT_TYPE  VARCHAR2(20)
>>  DOCUMENT_KEY   VARCHAR2(100)
>>
>> Then I tried this jdbc streaming expression in my browser, 
>> still getting the same error stack (see below); By looking at the 
>> source code you have provided below, it seems Solr is able to connect 
>> to this Oracle db, but just cannot read the resultset for some 
>> reason? Do you think it has something to do with the jdbc driver version?
>>
>> http://localhost:8988/solr/document6/stream?expr=jdbc(connection=
>> "jdbc:oracle:thin:qa_docrep/
>> abc...@lit-racq01-scan.qa.gxsonline.net:1521/qlgdb",sql="SELECT
>> document_id,sender_msg_dest,recip_msg_dest,document_type,document_key 
>> FROM document6",sort="document_id 
>> asc",driver="oracle.jdbc.driver.OracleDriver")
>>
>> errors in solr.log
>> ==
>> 2016-06-23 14:07:02.833 INFO  (qtp1389647288-139) [c:document6 
>> s:shard2
>> r:core_node1 x:document6_shard2_replica1] o.a.s.c.S.Request 
>> [document6_shard2_

RE: Errors for Streaming Expressions using JDBC (Oracle) stream source

2016-06-23 Thread Hui Liu

onseWriter.java:183)
at 
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:299)
at 
org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:95)
at 
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:60)
at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
at 
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:725)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469)
... 26 more

-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Thursday, June 23, 2016 7:56 AM
To: solr-user@lucene.apache.org
Subject: Re: Errors for Streaming Expressions using JDBC (Oracle) stream source

I'm wondering if you're selecting an unsupported data type. The exception being 
thrown looks like it could happen if that were the case. The supported types 
are in the Java doc.
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.0.0/solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/JDBCStream.java

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Jun 22, 2016 at 11:46 PM, Hui Liu <h...@opentext.com> wrote:

> Hi,
>
>
>
>   I have Solr 6.0.0 installed on my PC (windows 7), I was 
> experimenting with ‘Streaming Expression’ by using Oracle jdbc as the 
> stream source, following is the http command I am using:
>
>
>
> http://localhost:8988/solr/document5/stream?expr=jdbc(connection=
> "jdbc:oracle:thin:qa_docrep/
> abc...@lit-racq01-scan.qa.gxsonline.net:1521/qlgdb",sql="SELECT
> document_id,sender_msg_dest,recip_msg_dest,document_type,document_key,
> sender_bu_id,recip_bu_id,date_created
> FROM tg_document WHERE rownum < 5",sort="document_id
> asc",driver="oracle.jdbc.driver.OracleDriver")
>
>
>
>   I can access this Oracle db from my PC via regular JDBC 
> connection. I did put Oracle jdbc driver jar ‘ojdbc14.jar’ (same jar 
> used in my regular jdbc code) under Solr/server/lib dir and restarted 
> Solr cloud. Below is the error from solr.log (got a null pointer 
> error); I am merely trying to get the data returned from Oracle table, 
> I have not tried to index them in the Solr yet, attached is the 
> shema.xml and solrconfig.xml for this collection ‘document5’; does 
> anyone know what am I missing? thanks for any help!
>
>
>
> Regards,
>
> Hui Liu
>
>
>
> Error from Solr.log:
>
> =
>
> 2016-06-23 03:17:34.413 INFO  (qtp1389647288-19) [c:document5 s:shard2
> r:core_node2 x:document5_shard2_replica1] o.a.s.c.S.Request 
> [document5_shard2_replica1]  webapp=/solr path=/stream 
> params={expr=jdbc(connection%3D"jdbc:oracle:thin:qa_docrep/
> abc...@lit-racq01-scan.qa.gxsonline.net:1521/qlgdb",sql%3D"SELECT+docu
> ment_id,sender_msg_dest,recip_msg_dest,document_type,document_key,send
> er_bu_id,recip_bu_id+FROM+tg_document+WHERE+rownum+<+5",sort%3D"docume
> nt_id+asc",driver%3D"oracle.jdbc.OracleDriver")}
> status=0 QTime=0
>
> 2016-06-23 03:17:37.588 ERROR (qtp1389647288-19) [c:document5 s:shard2
> r:core_node2 x:document5_shard2_replica1] 
> o.a.s.c.s.i.s.ExceptionStream java.lang.NullPointerException
>
>   at
> org.apache.solr.client.solrj.io.stream.JDBCStream.read(JDBCStream.java
> :305)
>
>   at
> org.apache.solr.client.solrj.io.stream.ExceptionStream.read(ExceptionS
> tream.java:64)
>
>   at
> org.apache.solr.handler.StreamHandler$TimerStream.read(StreamHandler.j
> ava:374)
>
>   at
> org.apache.solr.response.TextResponseWriter.writeTupleStream(TextRespo
> nseWriter.java:305)
>
>   at
> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWrite
> r.java:167)
>
>   at
> org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONRe
> sponseWriter.java:183)
>
>   at
> org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.
> java:299)
>
>   at
> org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.j
> ava:95)
>
>   at
> org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.j
> ava:60)
>
>   at
> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(Qu
> eryResponseWriterUtil.java:65)
>
>   at
> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:7
> 25)
>
>   at
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469)
>
>   at
> org.apache.solr.servl

Errors for Streaming Expressions using JDBC (Oracle) stream source

2016-06-22 Thread Hui Liu

Hi,

  I have Solr 6.0.0 installed on my PC (windows 7), I was 
experimenting with 'Streaming Expression' by using Oracle jdbc as the stream 
source, following is the http command I am using:

http://localhost:8988/solr/document5/stream?expr=jdbc(connection="jdbc:oracle:thin:qa_docrep/abc...@lit-racq01-scan.qa.gxsonline.net:1521/qlgdb",sql="SELECT
 
document_id,sender_msg_dest,recip_msg_dest,document_type,document_key,sender_bu_id,recip_bu_id,date_created
 FROM tg_document WHERE rownum < 5",sort="document_id 
asc",driver="oracle.jdbc.driver.OracleDriver")

  I can access this Oracle db from my PC via regular JDBC 
connection. I did put Oracle jdbc driver jar 'ojdbc14.jar' (same jar used in my 
regular jdbc code) under Solr/server/lib dir and restarted Solr cloud. Below is 
the error from solr.log (got a null pointer error); I am merely trying to get 
the data returned from Oracle table, I have not tried to index them in the Solr 
yet, attached is the shema.xml and solrconfig.xml for this collection 
'document5'; does anyone know what am I missing? thanks for any help!

Regards,
Hui Liu

Error from Solr.log:
=
2016-06-23 03:17:34.413 INFO  (qtp1389647288-19) [c:document5 s:shard2 
r:core_node2 x:document5_shard2_replica1] o.a.s.c.S.Request 
[document5_shard2_replica1]  webapp=/solr path=/stream 
params={expr=jdbc(connection%3D"jdbc:oracle:thin:qa_docrep/abc...@lit-racq01-scan.qa.gxsonline.net:1521/qlgdb",sql%3D"SELECT+document_id,sender_msg_dest,recip_msg_dest,document_type,document_key,sender_bu_id,recip_bu_id+FROM+tg_document+WHERE+rownum+<+5",sort%3D"document_id+asc",driver%3D"oracle.jdbc.OracleDriver")}
 status=0 QTime=0
2016-06-23 03:17:37.588 ERROR (qtp1389647288-19) [c:document5 s:shard2 
r:core_node2 x:document5_shard2_replica1] o.a.s.c.s.i.s.ExceptionStream 
java.lang.NullPointerException
  at 
org.apache.solr.client.solrj.io.stream.JDBCStream.read(JDBCStream.java:305)
  at 
org.apache.solr.client.solrj.io.stream.ExceptionStream.read(ExceptionStream.java:64)
  at 
org.apache.solr.handler.StreamHandler$TimerStream.read(StreamHandler.java:374)
  at 
org.apache.solr.response.TextResponseWriter.writeTupleStream(TextResponseWriter.java:305)
  at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:167)
  at 
org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:183)
  at 
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:299)
  at 
org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:95)
  at 
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:60)
  at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
  at 
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:725)
  at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469)
  at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:229)
  at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:184)
  at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
  at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
  at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
  at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
  at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
  at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
  at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
  at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
  at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
  at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
  at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
  at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
  at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
  at org.eclipse.jetty.server.Server.handle(Server.java:518)
  at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
  at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
  at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(Abs

RE: Questions regarding re-index when using Solr as a data source

2016-06-10 Thread Hui Liu

Thank you Walter.

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Friday, June 10, 2016 3:53 PM
To: solr-user@lucene.apache.org
Subject: Re: Questions regarding re-index when using Solr as a data source

Those are brand new features that I have not used, so I can’t comment on them.

But I know they do not make Solr into a database.

If you need a transactional database that can support search, you probably want 
MarkLogic. I worked at MarkLogic for a couple of years. In some ways, MarkLogic 
is like Solr, but the support for transactions goes very deep. It is not 
something you can put on top of a search engine.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jun 10, 2016, at 12:39 PM, Hui Liu <h...@opentext.com> wrote:
> 
> What if we plan to use Solr version 6.x? this url says it support 2 different 
> update modes: atomic update and optimistic concurrency:
> 
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
> 
> I tested 'optimistic concurrency' and it appears to be working, i.e if a 
> document I am updating got changed by another person I will get error if I 
> supply a _version_ value, So maybe you are referring to an older version of 
> Solr?
> 
> Regards,
> Hui
> 
> -Original Message-
> From: Walter Underwood [mailto:wun...@wunderwood.org] 
> Sent: Friday, June 10, 2016 11:18 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Questions regarding re-index when using Solr as a data source
> 
> Solr does not have transactions at all. The “commit” is really “submit batch”.
> 
> Solr does not have update. You can add, delete, or replace an entire document.
> 
> There is no optimistic concurrency control because there is no concurrency 
> control. Clients can concurrently add documents to a batch, then any client 
> can submit the entire batch.
> 
> Replication is not transactional. Replication is a file copy of the 
> underlying indexes (classic) or copying the documents in a batch (Solr Cloud).
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
>> On Jun 10, 2016, at 7:41 AM, Hui Liu <h...@opentext.com> wrote:
>> 
>> Walter,
>> 
>>  Thank you for your advice. We are new to Solr and have been using 
>> Oracle for past 10+ years, so we are used to the idea of having a tool that 
>> can be used as both data store and also searchable by having indexes on top 
>> of it. I guess the reason we are considering Solr as data store is due to it 
>> has some features of a database that our application requires, such as 1) be 
>> able to detect duplicate record by having a unique field; 2) allow us to do 
>> concurrent update by using Optimistic concurrency control feature; 3) its 
>> 'replication' feature allowing us to store multiple copies of data; so if we 
>> were to use a file system, we will not have the above features (at least not 
>> 1 and 2) and have to implement those ourselves. The other option is to pick 
>> another database tool such as Mysql or Cassandra, then we will need to learn 
>> and support an additional tool besides Solr; but you brought up several very 
>> good points about operational factors we should consider if we pick Solr as 
>> a data store. Also our application is more of a OLTP than OLAP. I will 
>> update our colleagues and stakeholders about these concerns. Thanks again!
>> 
>> Regards,
>> Hui
>> -Original Message-
>> From: Walter Underwood [mailto:wun...@wunderwood.org] 
>> Sent: Thursday, June 09, 2016 1:24 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Questions regarding re-index when using Solr as a data source
>> 
>> In the HowToReindex page, under “Using Solr as a Data Store”, it says this: 
>> "Don't do this unless you have no other option. Solr is not really designed 
>> for this role.” So don’t start by planning to do this.
>> 
>> Using a second copy of Solr is still using Solr as a repository. That 
>> doesn’t satisfy any sort of requirements for disaster recovery. How do you 
>> know that data is good? How do you make a third copy? How do you roll back 
>> to a previous version? How do you deal with a security breach that affects 
>> all your systems? Are the systems in the same data center? How do you deal 
>> with ransomware (U. of Calgary paid $20K yesterday)?
>> 
>> If a consultant suggested this to me, I’d probably just give up and get a 
>> different consultant.
>> 
>> Here is what we do for batch loading.
>> 
>> 1. For each Solr collection, we define a JSONL feed f

RE: Questions regarding re-index when using Solr as a data source

2016-06-10 Thread Hui Liu

What if we plan to use Solr version 6.x? this url says it support 2 different 
update modes: atomic update and optimistic concurrency:

https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

I tested 'optimistic concurrency' and it appears to be working, i.e if a 
document I am updating got changed by another person I will get error if I 
supply a _version_ value, So maybe you are referring to an older version of 
Solr?

Regards,
Hui

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Friday, June 10, 2016 11:18 AM
To: solr-user@lucene.apache.org
Subject: Re: Questions regarding re-index when using Solr as a data source

Solr does not have transactions at all. The “commit” is really “submit batch”.

Solr does not have update. You can add, delete, or replace an entire document.

There is no optimistic concurrency control because there is no concurrency 
control. Clients can concurrently add documents to a batch, then any client can 
submit the entire batch.

Replication is not transactional. Replication is a file copy of the underlying 
indexes (classic) or copying the documents in a batch (Solr Cloud).

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jun 10, 2016, at 7:41 AM, Hui Liu <h...@opentext.com> wrote:
> 
> Walter,
> 
>   Thank you for your advice. We are new to Solr and have been using 
> Oracle for past 10+ years, so we are used to the idea of having a tool that 
> can be used as both data store and also searchable by having indexes on top 
> of it. I guess the reason we are considering Solr as data store is due to it 
> has some features of a database that our application requires, such as 1) be 
> able to detect duplicate record by having a unique field; 2) allow us to do 
> concurrent update by using Optimistic concurrency control feature; 3) its 
> 'replication' feature allowing us to store multiple copies of data; so if we 
> were to use a file system, we will not have the above features (at least not 
> 1 and 2) and have to implement those ourselves. The other option is to pick 
> another database tool such as Mysql or Cassandra, then we will need to learn 
> and support an additional tool besides Solr; but you brought up several very 
> good points about operational factors we should consider if we pick Solr as a 
> data store. Also our application is more of a OLTP than OLAP. I will update 
> our colleagues and stakeholders about these concerns. Thanks again!
> 
> Regards,
> Hui
> -Original Message-
> From: Walter Underwood [mailto:wun...@wunderwood.org] 
> Sent: Thursday, June 09, 2016 1:24 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Questions regarding re-index when using Solr as a data source
> 
> In the HowToReindex page, under “Using Solr as a Data Store”, it says this: 
> "Don't do this unless you have no other option. Solr is not really designed 
> for this role.” So don’t start by planning to do this.
> 
> Using a second copy of Solr is still using Solr as a repository. That doesn’t 
> satisfy any sort of requirements for disaster recovery. How do you know that 
> data is good? How do you make a third copy? How do you roll back to a 
> previous version? How do you deal with a security breach that affects all 
> your systems? Are the systems in the same data center? How do you deal with 
> ransomware (U. of Calgary paid $20K yesterday)?
> 
> If a consultant suggested this to me, I’d probably just give up and get a 
> different consultant.
> 
> Here is what we do for batch loading.
> 
> 1. For each Solr collection, we define a JSONL feed format, with a JSON 
> Schema.
> 2. The owners of the data write an extractor to pull the data out of wherever 
> it is, then generate the JSON feed.
> 3. We validate the JSON feed against the JSON schema.
> 4. If the feed is valid, we save it to Amazon S3 along with a manifest which 
> lists the version of the JSON Schema.
> 5. Then a multi-threaded loader reads the feed and sends it to Solr.
> 
> Reloading is safe and easy, because all the feeds in S3 are valid.
> 
> Storing backups in S3 instead of running a second Solr is massively cheaper, 
> easier, and safer.
> 
> We also have a clear contract between the content owners and the search team. 
> That contract is enforced by the JSON Schema on every single batch.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
>> On Jun 9, 2016, at 9:51 AM, Hui Liu <h...@opentext.com> wrote:
>> 
>> Hi Walter,
>> 
>> Thank you for the reply, sorry I need to clarify what I mean by 'migrate 
>> tables' from Oracle to Solr, we are not literally move existing records fro

RE: Questions regarding re-index when using Solr as a data source

2016-06-10 Thread Hui Liu

Walter,

Thank you for your advice. We are new to Solr and have been using 
Oracle for past 10+ years, so we are used to the idea of having a tool that can 
be used as both data store and also searchable by having indexes on top of it. 
I guess the reason we are considering Solr as data store is due to it has some 
features of a database that our application requires, such as 1) be able to 
detect duplicate record by having a unique field; 2) allow us to do concurrent 
update by using Optimistic concurrency control feature; 3) its 'replication' 
feature allowing us to store multiple copies of data; so if we were to use a 
file system, we will not have the above features (at least not 1 and 2) and 
have to implement those ourselves. The other option is to pick another database 
tool such as Mysql or Cassandra, then we will need to learn and support an 
additional tool besides Solr; but you brought up several very good points about 
operational factors we should consider if we pick Solr as a data store. Also 
our application is more of a OLTP than OLAP. I will update our colleagues and 
stakeholders about these concerns. Thanks again!

Regards,
Hui
-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Thursday, June 09, 2016 1:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Questions regarding re-index when using Solr as a data source

In the HowToReindex page, under “Using Solr as a Data Store”, it says this: 
"Don't do this unless you have no other option. Solr is not really designed for 
this role.” So don’t start by planning to do this.

Using a second copy of Solr is still using Solr as a repository. That doesn’t 
satisfy any sort of requirements for disaster recovery. How do you know that 
data is good? How do you make a third copy? How do you roll back to a previous 
version? How do you deal with a security breach that affects all your systems? 
Are the systems in the same data center? How do you deal with ransomware (U. of 
Calgary paid $20K yesterday)?

If a consultant suggested this to me, I’d probably just give up and get a 
different consultant.

Here is what we do for batch loading.

1. For each Solr collection, we define a JSONL feed format, with a JSON Schema.
2. The owners of the data write an extractor to pull the data out of wherever 
it is, then generate the JSON feed.
3. We validate the JSON feed against the JSON schema.
4. If the feed is valid, we save it to Amazon S3 along with a manifest which 
lists the version of the JSON Schema.
5. Then a multi-threaded loader reads the feed and sends it to Solr.

Reloading is safe and easy, because all the feeds in S3 are valid.

Storing backups in S3 instead of running a second Solr is massively cheaper, 
easier, and safer.

We also have a clear contract between the content owners and the search team. 
That contract is enforced by the JSON Schema on every single batch.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jun 9, 2016, at 9:51 AM, Hui Liu <h...@opentext.com> wrote:
> 
> Hi Walter,
> 
> Thank you for the reply, sorry I need to clarify what I mean by 'migrate 
> tables' from Oracle to Solr, we are not literally move existing records from 
> Oracle to Solr, instead, we are building a new application directly feed data 
> into Solr as document and fields, in parallel of another existing application 
> which feeds the same data into Oracle tables/columns, of course, the Solr 
> schema will be somewhat different than Oracle; also we only keep those data 
> for 90 days for user to search on, we hope once we run both system in 
> parallel for some time (> 90 days), we will build up enough new data in Solr 
> and we no longer need any old data in Oracle, by then we will be able to use 
> Solr as our only data store.
> 
> It sounds to me that we may need to consider save the data into either file 
> system, or another database, in case we need to rebuild the indexes; and the 
> reason I mentioned to save data into another Solr system is by reading this 
> info from https://wiki.apache.org/solr/HowToReindex : so just trying to get a 
> feedback on if there is any update on this approach? And any better way to do 
> this to minimize the downtime caused by the schema change and re-index? For 
> example, in Oracle, we are able to add a new column or new index online 
> without any impact of existing queries as existing indexes are intact.
> 
> Alternatives when a traditional reindex isn't possible
> 
> Sometimes the option of "do your indexing again" is difficult. Perhaps the 
> original data is very slow to access, or it may be difficult to get in the 
> first place.
> 
> Here's where we go against our own advice that we just gave you. Above we 
> said "don't use Solr itself as a datasource" ... but one way to deal with 
> d

RE: Questions regarding re-index when using Solr as a data source

2016-06-09 Thread Hui Liu

Hi Walter,

Thank you for the reply, sorry I need to clarify what I mean by 'migrate 
tables' from Oracle to Solr, we are not literally move existing records from 
Oracle to Solr, instead, we are building a new application directly feed data 
into Solr as document and fields, in parallel of another existing application 
which feeds the same data into Oracle tables/columns, of course, the Solr 
schema will be somewhat different than Oracle; also we only keep those data for 
90 days for user to search on, we hope once we run both system in parallel for 
some time (> 90 days), we will build up enough new data in Solr and we no 
longer need any old data in Oracle, by then we will be able to use Solr as our 
only data store.

It sounds to me that we may need to consider save the data into either file 
system, or another database, in case we need to rebuild the indexes; and the 
reason I mentioned to save data into another Solr system is by reading this 
info from https://wiki.apache.org/solr/HowToReindex : so just trying to get a 
feedback on if there is any update on this approach? And any better way to do 
this to minimize the downtime caused by the schema change and re-index? For 
example, in Oracle, we are able to add a new column or new index online without 
any impact of existing queries as existing indexes are intact.

Alternatives when a traditional reindex isn't possible

Sometimes the option of "do your indexing again" is difficult. Perhaps the 
original data is very slow to access, or it may be difficult to get in the 
first place.

Here's where we go against our own advice that we just gave you. Above we said 
"don't use Solr itself as a datasource" ... but one way to deal with data 
availability problems is to set up a completely separate Solr instance (not 
distributed, which for SolrCloud means numShards=1) whose only job is to store 
the data, then use the SolrEntityProcessor in the DataImportHandler to index 
from that instance to your real Solr install. If you need to reindex, just run 
the import again on your real installation. Your schema for the intermediate 
Solr install would have stored="true" and indexed="false" for all fields, and 
would only use basic types like int, long, and string. It would not have any 
copyFields.

This is the approach used by the Smithsonian for their Solr installation, 
because getting access to the source databases for the individual entities 
within the organization is very difficult. This way they can reindex the online 
Solr at any time without having to get special permission from all those 
entities. When they index new content, it goes into a copy of Solr configured 
for storage only, not in-depth searching. Their main Solr instance uses 
SolrEntityProcessor to import from the intermediate Solr servers, so they can 
always reindex.

Regards,
Hui

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org]
Sent: Thursday, June 09, 2016 12:19 PM
To: solr-user@lucene.apache.org
Subject: Re: Questions regarding re-index when using Solr as a data source

First, using Solr as a repository is pretty risky. I would keep the official 
copy of the data in a database, not in Solr.

Second, you can’t “migrate tables” because Solr doesn’t have tables. You need 
to turn the tables into documents, then index the documents. It can take a lot 
of joins to flatten a relational schema into Solr documents.

Solr does not support schema migration, so yes, you will need to save off all 
the documents, then reload them. I would save them to files. It makes no sense 
to put them in another copy of Solr.

Changing the schema will be difficult and time-consuming, but you’ll probably 
run into much worse problems trying to use Solr as a repository.

wunder
Walter Underwood
wun...@wunderwood.org<mailto:wun...@wunderwood.org>
http://observer.wunderwood.org/  (my blog)

> On Jun 9, 2016, at 8:50 AM, Hui Liu 
> <h...@opentext.com<mailto:h...@opentext.com>> wrote:
>
> Hi,
>
>  We are porting an application currently hosted in Oracle 11g to 
> Solr Cloud 6.x, i.e we plan to migrate all tables in Oracle as collections in 
> Solr, index them, and build search tools on top of this; the goal is we won't 
> be using Oracle at all after this has been implemented; every fields in Solr 
> will have 'stored=true' and selectively a subset of searchable fields will 
> have 'indexed=true'; the question is what steps we should follow if we need 
> to re-index a collection after making some schema changes - mostly we only 
> add new fields to store, or make a non-indexed field as indexed, we normally 
> do not delete or rename any existing fields; according to this url: 
> https://wiki.apache.org/solr/HowToReindex it seems we need to setup a 
> 'intermediate' Solr1 to only store the data themselves without any indexing, 
> then have another Solr2 setup

Questions regarding re-index when using Solr as a data source

2016-06-09 Thread Hui Liu

Hi,

  We are porting an application currently hosted in Oracle 11g to 
Solr Cloud 6.x, i.e we plan to migrate all tables in Oracle as collections in 
Solr, index them, and build search tools on top of this; the goal is we won't 
be using Oracle at all after this has been implemented; every fields in Solr 
will have 'stored=true' and selectively a subset of searchable fields will have 
'indexed=true'; the question is what steps we should follow if we need to 
re-index a collection after making some schema changes - mostly we only add new 
fields to store, or make a non-indexed field as indexed, we normally do not 
delete or rename any existing fields; according to this url: 
https://wiki.apache.org/solr/HowToReindex it seems we need to setup a 
'intermediate' Solr1 to only store the data themselves without any indexing, 
then have another Solr2 setup to store the indexed data, and in case of 
re-index, just delete all the documents in Solr2 for the collection and 
re-import data from Solr1 into Solr2 using SolrEntityProcessor (from dataimport 
handler)? Is this still the recommended approach? I can see the downside of 
this approach is if we have tremendous amount of data for a collection (some of 
our collection could have several billions of documents), re-import it from 
Solr1 to Solr2 may take a few hours or even days, and during this time, users 
cannot query the data, is there any better way to do this and avoid this type 
of down time? Any feedback is appreciated!

Regards,
Hui Liu
Opentext, Inc.

RE: Help needed on Solr Streaming Expressions

2016-06-06 Thread Hui Liu

The only difference between document3 and document5 is document3 has no data in 
'shard2', after loading some data into shard2, the http command also worked:

http://localhost:8988/solr/document3/stream?expr=search(document3,zkHost="127.0.0.1:2181",q="*:*",fl="document_id,
 sender_msg_dest", sort="document_id asc",qt="/export")

my guess is the 'null pointer' error from the stack trace is caused by no data 
in the 'shard2'.

Regards,
Hui

-Original Message-
From: Hui Liu 
Sent: Monday, June 06, 2016 1:04 PM
To: solr-user@lucene.apache.org
Subject: RE: Help needed on Solr Streaming Expressions

Joel,

Thank you very much for your help, I tried the http command below with my 
existing 2 shards collection 'document3' (sorry I have a typo below should be 
document3 instead of document2), this time I got much better error:

{"result-set":{"docs":[
{"EXCEPTION":"Unable to construct instance of 
org.apache.solr.client.solrj.io.stream.CloudSolrStream","EOF":true}]}}

I attach the error stack trace from 'solr-8988-console.log' and 'solr.log' here 
in file 'solr_error.txt'.

However I continued and tried create another identical collection 'document5' 
with 2 shards and 2 replica using the same schema, this time the http URL 
worked!!! Maybe my previous collection 'document3' has some corruption? 

-- command to create collection 'document5':
solr create -c document5 -d new_doc_configs5 -p 8988 -s 2 -rf 2

-- command for stream expression:
http://localhost:8988/solr/document5/stream?expr=search(document5,zkHost="127.0.0.1:2181",q="*:*",fl="document_id,
 sender_msg_dest", sort="document_id asc",qt="/export")

-- result from browser:
{"result-set":{"docs":[
{"document_id":20346005172,"sender_msg_dest":"ZZ:035239425"},
{"document_id":20346005173,"sender_msg_dest":"ZZ:035239425"},
{"document_id":20346006403,"sender_msg_dest":"14:004321519IBMP"},
{"document_id":20346006406,"sender_msg_dest":"14:004321519IBMP"},
{"document_id":20346006741,"sender_msg_dest":"14:004321519IBMP"},
{"document_id":20346006743,"sender_msg_dest":"14:004321519IBMP"},
{"EOF":true,"RESPONSE_TIME":10}]}}

Do you think I can try the same in http using other 'Stream Decorators' such as 
'complement' and 'innerJoin'?

Regards,
Hui

-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com]
Sent: Monday, June 06, 2016 9:51 AM
To: solr-user@lucene.apache.org
Subject: Re: Help needed on Solr Streaming Expressions

Hi,

To eliminate any issues that might be happening due to curl, try running the 
command from your browser.

http://localhost:8988/solr/document2/stream?expr=search(document3,zkHost="
127.0.0.1:2181",q="*:*",fl="document_id, sender_msg_dest", sort="document_id 
asc",qt="/export")



I think most browsers will url encode the expression automatically, but you can 
url encode also using an online tool. Also you can remove the zkHost param and 
it should default to zkHost your solr is connected to.


If you still get an error take a look at the logs and post the full stack trace 
to this thread, which will help determine where the problem is.



Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, Jun 5, 2016 at 2:11 PM, Hui Liu <h...@opentext.com> wrote:

> Hi,
>
>
>
>   I have Solr 6.0.0 installed on my PC (windows 7), I was 
> experimenting with ‘Streaming Expression’ feature by following steps 
> from this link:
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
> , but cannot get it to work, attached is my solrconfig.xml and 
> schema.xml, note I do have ‘export’ handler defined in my 
> ‘solrconfig.xml’ and enabled all fields as ‘docvalues’ in 
> ‘schema.xml’; I am using solr cloud and external zookeeper (also 
> installed on m PC), here is the command to start this 2-node Solr 
> cloud instance and to create the collection ‘document3’:
>
>
>
> -- start 2-node solr cloud instances:
>
> solr start -c -z 127.0.0.1:2181 -p 8988 -s solr3
>
> solr start -c -z 127.0.0.1:2181 -p 8989 -s solr4
>
>
>
> -- create the collection:
>
> solr create -c document3 -d new_doc_configs3 -p 8988 -s 2 -rf 2
>
>
>
>   after creating the collection I loaded a few documents 
> using ‘csv’ format and I was able to query it using ‘curl’ command from my PC:
>
>
>
> -- this works on my PC:
>
> curl
> http://localhost:8988/solr/document3/select?q=*:*=document_id+des
> c,sender_msg_dest+desc=document_id,sender_msg_dest,recip_msg_dest
>
>
>
>

RE: Help needed on Solr Streaming Expressions

2016-06-06 Thread Hui Liu

Joel,

Thank you very much for your help, I tried the http command below with my 
existing 2 shards collection 'document3' (sorry I have a typo below should be 
document3 instead of document2), this time I got much better error:

{"result-set":{"docs":[
{"EXCEPTION":"Unable to construct instance of 
org.apache.solr.client.solrj.io.stream.CloudSolrStream","EOF":true}]}}

I attach the error stack trace from 'solr-8988-console.log' and 'solr.log' here 
in file 'solr_error.txt'.

However I continued and tried create another identical collection 'document5' 
with 2 shards and 2 replica using the same schema, this time the http URL 
worked!!! Maybe my previous collection 'document3' has some corruption? 

-- command to create collection 'document5':
solr create -c document5 -d new_doc_configs5 -p 8988 -s 2 -rf 2

-- command for stream expression:
http://localhost:8988/solr/document5/stream?expr=search(document5,zkHost="127.0.0.1:2181",q="*:*",fl="document_id,
 sender_msg_dest", sort="document_id asc",qt="/export")

-- result from browser:
{"result-set":{"docs":[
{"document_id":20346005172,"sender_msg_dest":"ZZ:035239425"},
{"document_id":20346005173,"sender_msg_dest":"ZZ:035239425"},
{"document_id":20346006403,"sender_msg_dest":"14:004321519IBMP"},
{"document_id":20346006406,"sender_msg_dest":"14:004321519IBMP"},
{"document_id":20346006741,"sender_msg_dest":"14:004321519IBMP"},
{"document_id":20346006743,"sender_msg_dest":"14:004321519IBMP"},
{"EOF":true,"RESPONSE_TIME":10}]}}

Do you think I can try the same in http using other 'Stream Decorators' such as 
'complement' and 'innerJoin'?

Regards,
Hui

-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Monday, June 06, 2016 9:51 AM
To: solr-user@lucene.apache.org
Subject: Re: Help needed on Solr Streaming Expressions

Hi,

To eliminate any issues that might be happening due to curl, try running the 
command from your browser.

http://localhost:8988/solr/document2/stream?expr=search(document3,zkHost="
127.0.0.1:2181",q="*:*",fl="document_id, sender_msg_dest", sort="document_id 
asc",qt="/export")



I think most browsers will url encode the expression automatically, but you can 
url encode also using an online tool. Also you can remove the zkHost param and 
it should default to zkHost your solr is connected to.


If you still get an error take a look at the logs and post the full stack trace 
to this thread, which will help determine where the problem is.



Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, Jun 5, 2016 at 2:11 PM, Hui Liu <h...@opentext.com> wrote:

> Hi,
>
>
>
>   I have Solr 6.0.0 installed on my PC (windows 7), I was 
> experimenting with ‘Streaming Expression’ feature by following steps 
> from this link:
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
> , but cannot get it to work, attached is my solrconfig.xml and 
> schema.xml, note I do have ‘export’ handler defined in my 
> ‘solrconfig.xml’ and enabled all fields as ‘docvalues’ in 
> ‘schema.xml’; I am using solr cloud and external zookeeper (also 
> installed on m PC), here is the command to start this 2-node Solr 
> cloud instance and to create the collection ‘document3’:
>
>
>
> -- start 2-node solr cloud instances:
>
> solr start -c -z 127.0.0.1:2181 -p 8988 -s solr3
>
> solr start -c -z 127.0.0.1:2181 -p 8989 -s solr4
>
>
>
> -- create the collection:
>
> solr create -c document3 -d new_doc_configs3 -p 8988 -s 2 -rf 2
>
>
>
>   after creating the collection I loaded a few documents 
> using ‘csv’ format and I was able to query it using ‘curl’ command from my PC:
>
>
>
> -- this works on my PC:
>
> curl
> http://localhost:8988/solr/document3/select?q=*:*=document_id+des
> c,sender_msg_dest+desc=document_id,sender_msg_dest,recip_msg_dest
>
>
>
>   but when trying Streaming ‘search’ using curl, it does 
> not work, I tried with 3 different options: with zkHost, using 
> ‘export’, or using ‘select’, all getting the same error:
>
>
> curl: (6) Could not resolve host: sort=document_id asc,qt=
>
> {"result-set":{"docs":[
>
> {"EXCEPTION":null,"EOF":true}]}}
>
> -- different curl commands tried, all getting the same error above:
>
> curl --data-urlencode 
> 'expr=search(document3,zkHost="127.0.0.1:2181",q="*:*",fl="document_id
> , sender_msg_dest", sort="document_id asc&q

Help needed on Solr Streaming Expressions

2016-06-05 Thread Hui Liu

Hi,

  I have Solr 6.0.0 installed on my PC (windows 7), I was 
experimenting with 'Streaming Expression' feature by following steps from this 
link: https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions, 
but cannot get it to work, attached is my solrconfig.xml and schema.xml, note I 
do have 'export' handler defined in my 'solrconfig.xml' and enabled all fields 
as 'docvalues' in 'schema.xml'; I am using solr cloud and external zookeeper 
(also installed on m PC), here is the command to start this 2-node Solr cloud 
instance and to create the collection 'document3':

-- start 2-node solr cloud instances:
solr start -c -z 127.0.0.1:2181 -p 8988 -s solr3
solr start -c -z 127.0.0.1:2181 -p 8989 -s solr4

-- create the collection:
solr create -c document3 -d new_doc_configs3 -p 8988 -s 2 -rf 2

  after creating the collection I loaded a few documents using 
'csv' format and I was able to query it using 'curl' command from my PC:

-- this works on my PC:
curl 
http://localhost:8988/solr/document3/select?q=*:*=document_id+desc,sender_msg_dest+desc=document_id,sender_msg_dest,recip_msg_dest

  but when trying Streaming 'search' using curl, it does not work, 
I tried with 3 different options: with zkHost, using 'export', or using 
'select', all getting the same error:

curl: (6) Could not resolve host: sort=document_id asc,qt=
{"result-set":{"docs":[
{"EXCEPTION":null,"EOF":true}]}}

-- different curl commands tried, all getting the same error above:
curl --data-urlencode 
'expr=search(document3,zkHost="127.0.0.1:2181",q="*:*",fl="document_id, 
sender_msg_dest", sort="document_id asc",qt="/export")' 
"http://localhost:8988/solr/document2/stream;

curl --data-urlencode 'expr=search(document3,q="*:*",fl="document_id, 
sender_msg_dest", sort="document_id asc",qt="/export")' 
"http://localhost:8988/solr/document2/stream;

curl --data-urlencode 'expr=search(document3,q="*:*",fl="document_id, 
sender_msg_dest", sort="document_id asc",qt="/select",rows=10)' 
"http://localhost:8988/solr/document2/stream;

  what am I doing wrong? Thanks for any help!

Regards,
Hui Liu





  

  
  6.0.0

  
  ${solr.data.dir:}


  
  
   

  
  

  
  


${solr.lock.type:native}


 true
  


  
  
  
  
  
  

  
  



  ${solr.ulog.dir:}
  ${solr.ulog.numVersionBuckets:65536}

 

  
   ${solr.autoCommit.maxTime:15000} 
   false 
 


  
   ${solr.autoSoftCommit.maxTime:-1} 
 

  
  
  
  

1024









   



 



true

   
   20

   
   200


false


2

  


  
  
 





  

  
  
  

 
   explicit
   10
 



  
  
 
   explicit
   json
   true
   text
 
  

  

  text

  

  
  


  
  

  
  

 explicit 
 true

  
  


  

  
  

  
  
 
  true
  false
 

  terms

  


  
{!xport}
xsort
false
  
  
query
  


  
  




  
 
 
 
 
 

   

  
  
  
   
   
 
 
 
 
 
 
 
 
 
   
  document_id
  document_id

答复: help need example code of solrj to get schema of a given core

2016-06-02 Thread Liu, Ming (Ming)

Thanks Georg very much!

Ming
-邮件原件-
发件人: Georg Sorst [mailto:georg.so...@gmail.com] 
发送时间: 2016年5月31日 18:22
收件人: solr-user@lucene.apache.org
主题: Re: help need example code of solrj to get schema of a given core

Querying the schema can be done with the Schema API ( 
https://cwiki.apache.org/confluence/display/solr/Schema+API), which is fully 
supported by SolrJ:
http://lucene.apache.org/solr/6_0_0/solr-solrj/org/apache/solr/client/solrj/request/schema/package-summary.html
.

Liu, Ming (Ming) <ming@esgyn.cn> schrieb am Di., 31. Mai 2016 09:41:

> Hello,
>
> I am very new to Solr, I want to write a simple Java program to get a 
> core's schema information. Like how many field and details of each 
> field. I spent a few time searching on internet, but cannot get much 
> information about this. The solrj wiki seems not updated for long 
> time. I am using Solr
> 5.5.0
>
> Hope there are some example code, or please give me some advices, or 
> simple hint like which java class I can take a look at.
>
> Thanks in advance!
> Ming
>

help need example code of solrj to get schema of a given core

2016-05-31 Thread Liu, Ming (Ming)

Hello,

I am very new to Solr, I want to write a simple Java program to get a core's 
schema information. Like how many field and details of each field. I spent a 
few time searching on internet, but cannot get much information about this. The 
solrj wiki seems not updated for long time. I am using Solr 5.5.0

Hope there are some example code, or please give me some advices, or simple 
hint like which java class I can take a look at.

Thanks in advance!
Ming

Re: Documents cannot be searched immediately when indexed using REST API with Solr Cloud

2015-03-19 Thread Liu Bo

Hi Edvin

Please review your commit/soft-commit configuration,
soft commits are about visibility, hard commits are about durability
  by a wise man. :)

If you are doing NRT index and searching, your probably need a short soft
commit interval or commit explicitly in your request handler. Be advised
that these strategies and configurations need to be tested and adjusted
according to your data size, searching and index updating frequency.

You should be able to find the answer yourself here:
http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

All the best

Liu Bo

On 19 March 2015 at 17:54, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote:

 Hi,

 I'm using Solr Cloud now, with 2 shards known as shard1 and shard2, and
 when I try to index rich-text documents using REST API or the default
 Documents module in Solr Admin UI, the documents that are indexed do not
 appear immediately when I do a search. It only appears after I restarted
 the Solr services (both shard1 and shard2).

 However, the same issue do not happen when I index the same documents using
 post.jar, and I can search for the indexed documents immediately.

 Here's my ExtractingRequestHandler in solrconfig.xml.

   requestHandler name=/update/extract
   class=solr.extraction.ExtractingRequestHandler 
 lst name=defaults
   str name=lowernamestrue/str
   str name=uprefixignored_/str

   !-- capture link hrefs but ignore div attributes --
   str name=captureAttrtrue/str
   str name=fmap.alinks/str
   str name=fmap.divignored_/str
 /lst
   /requestHandler

 What could be the reason why this is happening, and any solutions to solve
 it?

 Regards,
 Edwin

solr always loading and not any response

2014-07-24 Thread zhijun liu

hi, all, solr admin page is always loading, and when I send query request
also can not get any response. the tcp link is  always ESTABLISHED。only
restart solr service can fix it. how to find out the problem?

solr:4.6
jetty:8

thanks so much.

Re: Where to specify numShards when startup up a cloud setup

2014-04-18 Thread Liu Bo

Hi zzT

Putting numShards in core.properties also works.

I struggled a little bit while figuring out this configuration approach.
I knew I am not alone! ;-)


On 2 April 2014 18:06, zzT zis@gmail.com wrote:

 It seems that I've figured out a configuration approach to this issue.

 I'm having the exact same issue and the only viable solutions found on the
 net till now are
 1) Pass -DnumShards=x when starting up Solr server
 2) Use the Collections API as indicated by Shawn.

 What I've noticed though - after making the call to /collections to create
 a
 node solr.xml - is that a new core entry is added inside solr.xml with
 the
 attribute numShards.

 So, right now I'm configuring solr.xml with numShards attribute inside my
 core nodes. This way I don't have to worry with annoying stuff you've
 already mentioned e.g. waiting for Solr to start up etc.

 Of course same logic applies here, numShards param is meanigful only the
 first time. Even if you change it at a later point the # of shards stays
 the
 same.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Where-to-specify-numShards-when-startup-up-a-cloud-setup-tp4078473p4128566.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
All the best

Liu Bo

Re: Multiple Languages in Same Core

2014-03-26 Thread Liu Bo

Hi Jeremy

There're a lot of multi language discussions, two main approaches
 1. like yours, a language is one core
 2. all in one core, different language has it's own field.

We have multi-language support in a single core, each multilingual field
has it's own suffix such as name_en_US. We customized query handler to hide
the query details to client.
The main reason we want to do this is about NRT index and search,
take product for example:

product has price, quantity which is common and it's used by filtering
and sorting, name, description is multi language field,
if we split product in do different cores, the common field updating
may end up a update in all of the multi language cores.

As to scalability, we don't change solr cores/collections when a new
language is added, but we probably need update our customized index process
and run a full re-index.

This approach suits our requirement for now, but you may have your own
concerns.

We have similar suggest filter problem like yours, we want to return
suggest result filtering by stores. I can't find a way to build dictionary
with query at my version of solr 4.6

What I do is run a query on a N-Gram analyzed field and with filter queries
on store_id field. The suggest is actually a query. It may not perform as
well as suggestion but can do the trick.

You can try it to build a additional N-GRAM field for suggestion only and
search on it with fq on your Locale field.

All the best

Liu Bo




On 25 March 2014 09:15, Alexandre Rafalovitch arafa...@gmail.com wrote:

 Solr In Action has a significant discussion on the multi-lingual
 approach. They also have some code samples out there. Might be worth a
 look

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Tue, Mar 25, 2014 at 4:43 AM, Jeremy Thomerson
 jer...@thomersonfamily.com wrote:
  I recently deployed Solr to back the site search feature of a site I work
  on. The site itself is available in hundreds of languages. With the
 initial
  release of site search we have enabled the feature for ten of those
  languages. This is distributed across eight cores, with two Chinese
  languages plus Korean combined into one CJK core and each of the other
  seven languages in their own individual cores. The reason for splitting
  these into separate cores was so that we could have the same field names
  across all cores but have different configuration for analyzers, etc, per
  core.
 
  Now I have some questions on this approach.
 
  1) Scalability: Considering I need to scale this to many dozens more
  languages, perhaps hundreds more, is there a better way so that I don't
 end
  up needing dozens or hundreds of cores? My initial plan was that many
  languages that didn't have special support within Solr would simply get
  lumped into a single default core that has some default analyzers that
  are applicable to the majority of languages.
 
  1b) Related to this: is there a practical limit to the number of cores
 that
  can be run on one instance of Lucene?
 
  2) Auto Suggest: In phase two I intend to add auto-suggestions as a user
  types a query. In reviewing how this is implemented and how the
 suggestion
  dictionary is built I have concerns. If I have more than one language in
 a
  single core (and I keep the same field name for suggestions on all
  languages within a core) then it seems that I could get suggestions from
  another language returned with a suggest query. Is there a way to build a
  separate dictionary for each language, but keep these languages within
 the
  same core?
 
  If it's helpful to know: I have a field in every core for Locale.
 Values
  will be the locale of the language of that document, i.e. en, es,
  zh_hans, etc. I'd like to be able to: 1) when building a suggestion
  dictionary, divide it into multiple dictionaries, grouping them by
 locale,
  and 2) supply a parameter to the suggest query that allows the suggest
  component to only return suggestions from the appropriate dictionary for
  that locale.
 
  If the answer to #1 is keep splitting groups of languages that have
  different analyzers into their own cores and the answer to #2 is that's
  not supported, then I'd be curious: where would I start to write my own
  extension that supported #2? I looked last night at the suggest lookup
  classes, dictionary classes, etc. But I didn't see a clear point where it
  would be clean to implement something like I'm suggesting above.
 
  Best Regards,
  Jeremy Thomerson




-- 
All the best

Liu Bo

Re: Grouping results with group.limit return wrong numFound ?

2014-01-01 Thread Liu Bo

hi @Ahmet

I've thought about using group.ngroups=true , but when you use
group.main=true, there's no ngroups field in the response.

and according to http://wiki.apache.org/solr/FieldCollapsing, the result
might not be correct in solrcloud.

I don't like using facet for this but seems have to...


On 1 January 2014 00:35, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Tasmaniski,

 I don't follow. How come Liu's faceting workaround and n.groups=true
 produce different results?






 On Tuesday, December 31, 2013 6:08 PM, tasmaniski tasmani...@gmail.com
 wrote:
 @kamaci
 Ofcourse. That is the problem.

 group.limit is: the number of results (documents) to return for each
 group.
 NumFound is number of total found, but *not* sum number of *return for each
 group.*

 @Liu Bo
 seems to be the is only workaround for problem but
 it's to much expensive to go through all the groups and calculate total
 number of found/returned (I use PHP for client:) ).

 @iorixxx
 Yes, I consider that (group.ngroups=true)
 but in some group I have number of found result  lesser than limit.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Grouping-results-with-group-limit-return-wrong-numFound-tp4108174p4108906.html

 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
All the best

Liu Bo

Re: Chaining plugins

2013-12-31 Thread Liu Bo

I've done similar things as paul.

what I do is extending the default QueryComponent and overwrite the
preparing method,

then I just change the solrparams according to our logic and then call
super.prepare(). Then replace the default QueryComponent with it in my
search/query handler.

In this way, nothing of solr default behavior is touched. I think you can
do your logic in prepare method, and then let solr proceed the search.

I've tested it along with other components in both single solr node and
solrcloud. It works fine.

Hope it helps

Cheers

Bold

On 31 December 2013 06:03, Chris Hostetter hossman_luc...@fucit.org wrote:

You don't need to write your own handler.

See the previpous comment about implementing a SearchComponent -- you can
check for the params in your prepare() method and do whatever side effects
you want, then register your custom component and hook it into the
component chain of whatever handler configuration you want (either using
the components arr or by specifying it as a first-components...

https://cwiki.apache.org/confluence/display/solr/RequestHandlers+and+SearchComponents+in+SolrConfig

: I want to save the query into a file when a user is changing a parameter
in
: the query, lets say he adds logTofile=1 then the searchHandler will
: provide the same result as without this parameter, but in the background
it
: will do some logic(ex. save the query to file) .
: But I dont want to touch solr source code, all I want is to add code(like
: plugin). if i understand it right I want to write my own search handler
, do
: some logic , then pass the data to solr default search handler.

-Hoss
http://www.lucidworks.com/

--
All the best

Liu Bo

Re: Grouping results with group.limit return wrong numFound ?

2013-12-31 Thread Liu Bo

Hi

I've met the same problem, and I've googled it around but not found direct
solution.

But there's a work around, do a facet on your group field, with parameters
like

   str name=facettrue/str
   str name=facet.fieldyour_field/str
   str name=facet.limit-1/str
   str name=facet.mincount1/str

and then count how many facted pairs in the response. This should be the
same with the number of documents after grouping.

Cheers

Bold




On 31 December 2013 06:40, Furkan KAMACI furkankam...@gmail.com wrote:

 Hi;

 group.limit is: the number of results (documents) to return for each group.
 Defaults to 1. Did you check the page here:
 https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=32604232

 Thanks;
 Furkan KAMACI


 25 Aralık 2013 Çarşamba tarihinde tasmaniski tasmani...@gmail.com adlı
 kullanıcı şöyle yazdı:
  Hi All, When I perform a search with grouping result in a groups and do
 limit
  results in one group I got that *numFound* is the same as I didn't use
  limit.looks like SOLR first perform search and calculate numFound and
 that
  group and limit the results.I do not know if this is a bug or a feature
  :)But I cannot use pagination and other stuff.Is there any workaround or
 I
  missed something ?Example:I want to search book title and limit the
 search
  to 3 results per one publisher.q=book_title: solr
  phpgroup=truegroup.field=publishergroup.limit=3group.main=trueI have
 for
  apress publisher 20 results but I show only 3 that works OKBut in
 numFound I
  still have 20 for apress publisher...
 
 
 
  --
  View this message in context:

 http://lucene.472066.n3.nabble.com/Grouping-results-with-group-limit-return-wrong-numFound-tp4108174.html
  Sent from the Solr - User mailing list archive at Nabble.com.




-- 
All the best

Liu Bo

Re: PostingsSolrHighlighter

2013-12-18 Thread Liu Bo

hi Josip

for the 1 question we've done similar things: copying search field to a
text field. But highlighting is normally on specific fields such as tittle
depending on how the search content is displayed to the front end, you can
search on text and highlight on the field you wanted by specify hl.fl

ref: http://wiki.apache.org/solr/HighlightingParameters#hl.fl


On 17 December 2013 02:29, Josip Delic j...@lugensa.com wrote:

 Hi @all,

 i am playing with the PostingsSolrHighlighter. I'm running solr 4.6.0
 and my configuration is from here:

 https://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/highlight/
 PostingsSolrHighlighter.html

 Search query and result (not working):

 http://pastebin.com/13Uan0ZF

 Schema (not complete):

 http://pastebin.com/JGa38UDT

 Search query and result (working):

 http://pastebin.com/4CP8XKnr

 Solr config:

 searchComponent class=solr.HighlightComponent name=highlight
   highlighting class=org.apache.solr.highlight.PostingsSolrHighlighter/


 /searchComponent

 So this is working just fine, but now i have some questions:

 1.) With the old default highlighter component it was possible to search
 in searchable_text and to retrive highlighted text. This is essential,
 because we use copyfield to put almost everything to searchable_text
 (title, subtitle, description, ...)

 2.) I can't get ellipsis working i tried hl.tag.ellipsis=...,
 f.text.hl.tag.ellipsis=..., configuring it in RequestHandler noting seems
 to work, maxAnalyzedChars is just cutting the sentence?

 Kind Regards

 Josip Delic




-- 
All the best

Liu Bo

Re: an array liked string is treated as multivalued when adding doc to solr

2013-12-18 Thread Liu Bo

Hi Alexandre

It's quite a rare case, just one out of tens of thousands.

I'm planning to have every multilingual field as multivalued and just get
the first one while formatting the response to our business object.

The first value update processor seems a lot helpful, thank you.

All the best

Liu Bo

On 18 December 2013 15:26, Alexandre Rafalovitch arafa...@gmail.com wrote:

If this happens rarely and you want to deal with in on the way into Solr,
you could just keep one of the values, using URP:

http://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/update/processor/FirstFieldValueUpdateProcessorFactory.html

Regards,
Alex

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)

On Wed, Dec 18, 2013 at 2:20 PM, Liu Bo diabl...@gmail.com wrote:

Hey Furkan and solr users

This is a miss reported problem. It's not solr problem but our data
issue.
Sorry for this.

It's a data issue of our side, a coupon happened to have two piece
English
description, which is not allowed in our business logic, but it happened
and we added twice of the name_en_US to solr document.

I've done a set of test and deep debugging to solr source code, and found
out that a array like string such as [Get 20% Off Official Barca Kits,
coupon] won't be treated as multivalued field.

Sorry again for not digging more before sent out question email. I trust
our business logic and data integrity more than solr, I will definitely
not
do this again. ;-)

All the best

Liu Bo

On 11 December 2013 07:21, Furkan KAMACI furkankam...@gmail.com wrote:

Hi Liu;

Yes. it is an expected behavior. If you send data within square
brackets
Solr will behave it as a multivalued field. You can test it with this
way:
if you use Solrj and use a List for a field it will be considered as
multivalued too because when you call toString() method of your List
you
can see that elements are printed within square brackets. This is the
reason that a List can be used for a multivalued field.

If you explain your situation I can offer a way how to do it.

Thanks;
Furkan KAMACI

2013/12/6 Liu Bo diabl...@gmail.com

Dear solr users:

I've met this kind of error several times,

when add a array liked string such as:[Get 20% Off Official Barça
Kits,
coupon] to a multiValued=false field, solr will complain:

org.apache.solr.common.SolrException: ERROR:
[doc=7781396456243918692]
multiple values encountered for non multiValued field name_en_US:
[Get
20%
Off Official Barca Kits, coupon]

my schema defination:
field name=name_en_US type=text_en indexed=true stored=true
multiValued=false /

This field is stored as the search result needs this field and it's
value
in original format, and indexed to give it a boost while searching .

What I do is adding name (java.lang.String) to SolrInputDocument by
addField(name_en_US, product.getName()) method, and then add this
to
solr
using an AddUpdateCommand

It seems solr treats this kind of string data as multivalued, even I
add
this field to solr only once.

Is this a bug or a supposed behavior?

Is there any way to tell solr this is not a multivalued value add
don't
break it?

Your help and suggestion will be much of my appreciation.

--
All the best

Liu Bo

--
All the best

Liu Bo

--
All the best

Liu Bo

Re: PostingsSolrHighlighter

2013-12-18 Thread Liu Bo

Hi Josip

that's quite weird, to my experience highlight is strict on string field
which needs a exact match, text fields should be fine.

I copy your schema definition and do a quick test in a new core, everything
is default from the tutorial, and the search component is
using solr.HighlightComponent .

search on searchable_text can highlight text, I copied your search url and
just change the host part, the input parameters are exactly the same,

result is attached.

Can you upload your complete solrconfig.xml and schema.xml?


On 18 December 2013 19:02, Josip Delic j...@lugensa.com wrote:

 Am 18.12.2013 09:55, schrieb Liu Bo:

 hi Josip


 hi liu,


  for the 1 question we've done similar things: copying search field to a
 text field. But highlighting is normally on specific fields such as tittle
 depending on how the search content is displayed to the front end, you can
 search on text and highlight on the field you wanted by specify hl.fl

 ref: http://wiki.apache.org/solr/HighlightingParameters#hl.fl


 thats exactly what i'm doing in that pastebin:

 http://pastebin.com/13Uan0ZF

 I'm searing there for 'q=searchable_text:labore' this is present in 'text'
 and in the copyfield 'searchable_text' but it is not highlighted in 'text'
 (hl.fl=text)

 The same query is working if set 'q=text:labore' as you can see in

 http://pastebin.com/4CP8XKnr

 For 2 question i figured out that the PostingsSolrHighlighter ellipsis
 is not like i thought for adding ellipsis to start or/and end in
 highlighted text. It is instead used to combine multiple snippets together
 if snippets is  1.

 cheers

 josip




 On 17 December 2013 02:29, Josip Delic j...@lugensa.com wrote:

  Hi @all,

 i am playing with the PostingsSolrHighlighter. I'm running solr 4.6.0
 and my configuration is from here:

 https://lucene.apache.org/solr/4_6_0/solr-core/org/
 apache/solr/highlight/
 PostingsSolrHighlighter.html

 Search query and result (not working):

 http://pastebin.com/13Uan0ZF

 Schema (not complete):

 http://pastebin.com/JGa38UDT

 Search query and result (working):

 http://pastebin.com/4CP8XKnr

 Solr config:

 searchComponent class=solr.HighlightComponent name=highlight
highlighting class=org.apache.solr.highlight.
 PostingsSolrHighlighter/


 /searchComponent

 So this is working just fine, but now i have some questions:

 1.) With the old default highlighter component it was possible to search
 in searchable_text and to retrive highlighted text. This is
 essential,
 because we use copyfield to put almost everything to searchable_text
 (title, subtitle, description, ...)

 2.) I can't get ellipsis working i tried hl.tag.ellipsis=...,
 f.text.hl.tag.ellipsis=..., configuring it in RequestHandler noting seems
 to work, maxAnalyzedChars is just cutting the sentence?

 Kind Regards

 Josip Delic









-- 
All the best

Liu Bo
http://localhost:8080/solr/try/select?wt=jsonfl=text%2Cscore=hl=truehl.fl=textq=%28searchable_text%3Alabore%29rows=10sort=score+descstart=0

{
responseHeader: {
status: 0,
QTime: 36,
params: {
sort: score desc,
fl: text,
start: 0,
,score: ,
q: (searchable_text:labore),
hl.fl: text,
wt: json,
hl: true,
rows: 10
}
},
response: {
numFound: 3,
start: 0,
docs: [
{
text: Lorem ipsum dolor sit amet, consetetur sadipscing 
elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna 
aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores 
et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum 
dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed 
diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed 
diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet 
clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
},
{
text: Lorem ipsum dolor sit amet, consetetur sadipscing 
elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna 
aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores 
et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum 
dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed 
diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed 
diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet 
clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
},
{
text: Lorem ipsum dolor sit amet, consetetur sadipscing 
elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna 
aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores 
et ea rebum. Stet clita kasd gubergren, no sea takimata

Re: an array liked string is treated as multivalued when adding doc to solr

2013-12-17 Thread Liu Bo

Hey Furkan and solr users

This is a miss reported problem. It's not solr problem but our data issue.
Sorry for this.

It's a data issue of our side, a coupon happened to have two piece English
description, which is not allowed in our business logic, but it happened
 and we added twice of the name_en_US to solr document.

I've done a set of test and deep debugging to solr source code, and found
out that a array like string such as  [Get 20% Off Official Barca Kits,
coupon] won't be treated as multivalued field.

Sorry again for not digging more before sent out question email. I trust
our business logic and data integrity more than solr, I will definitely not
do this again. ;-)

All the best

Liu Bo



On 11 December 2013 07:21, Furkan KAMACI furkankam...@gmail.com wrote:

 Hi Liu;

 Yes. it is an expected behavior. If you send data within square brackets
 Solr will behave it as a multivalued field. You can test it with this way:
 if you use Solrj and use a List for a field it will be considered as
 multivalued too because when you call toString() method of your List you
 can see that elements are printed within square brackets. This is the
 reason that a List can be used for a multivalued field.

 If you explain your situation I can offer a way how to do it.

 Thanks;
 Furkan KAMACI


 2013/12/6 Liu Bo diabl...@gmail.com

  Dear solr users:
 
  I've met this kind of error several times,
 
  when add a array liked string such as:[Get 20% Off Official Barça Kits,
  coupon] to a  multiValued=false field, solr will complain:
 
  org.apache.solr.common.SolrException: ERROR: [doc=7781396456243918692]
  multiple values encountered for non multiValued field name_en_US: [Get
 20%
  Off Official Barca Kits, coupon]
 
  my schema defination:
  field name=name_en_US type=text_en indexed=true stored=true
  multiValued=false /
 
  This field is stored as the search result needs this field and it's value
  in original format, and indexed to give it a boost while searching .
 
  What I do is adding name (java.lang.String) to SolrInputDocument by
  addField(name_en_US, product.getName()) method, and then add this to
 solr
  using an AddUpdateCommand
 
  It seems solr treats this kind of string data as multivalued, even I add
  this field to solr only once.
 
  Is this a bug or a supposed behavior?
 
  Is there any way to tell solr this is not a multivalued value add don't
  break it?
 
  Your help and suggestion will be much of my appreciation.
 
  --
  All the best
 
  Liu Bo
 




-- 
All the best

Liu Bo

an array liked string is treated as multivalued when adding doc to solr

2013-12-05 Thread Liu Bo

Dear solr users:

I've met this kind of error several times,

when add a array liked string such as:[Get 20% Off Official Barça Kits,
coupon] to a  multiValued=false field, solr will complain:

org.apache.solr.common.SolrException: ERROR: [doc=7781396456243918692]
multiple values encountered for non multiValued field name_en_US: [Get 20%
Off Official Barca Kits, coupon]

my schema defination:
field name=name_en_US type=text_en indexed=true stored=true
multiValued=false /

This field is stored as the search result needs this field and it's value
in original format, and indexed to give it a boost while searching .

What I do is adding name (java.lang.String) to SolrInputDocument by
addField(name_en_US, product.getName()) method, and then add this to solr
using an AddUpdateCommand

It seems solr treats this kind of string data as multivalued, even I add
this field to solr only once.

Is this a bug or a supposed behavior?

Is there any way to tell solr this is not a multivalued value add don't
break it?

Your help and suggestion will be much of my appreciation.

-- 
All the best

Liu Bo

Re: deleting a doc inside a custom UpdateRequestProcessor

2013-11-18 Thread Liu Bo

hi,

you can try this in your checkIfIsDuplicate(), build a query based on
your title, and set it to a delete command:

//build your query accordingly, this depends on how your
tittle is indexed, eg analyzed or not. be careful with it and do some test.
  DeleteUpdateCommand cmd = new DeleteUpdateCommand(req);
cmd.commitWithin = commitWithin;
cmd.setQuery(query);
processDelete(cmd);

Processors are normally chained, you should make sure that your
processor comes the first so that it can control what's coming next based
on your logic.

you can also try to write your own updaterequesthandler instead of a
customized processor.

you can do a set of operations in your function
@Override
public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp)
throws Exception {}

get your processor chain in this function and passes a delete command
to it such as :

SolrParams params = req.getParams();
checkParameter(params);
UpdateRequestProcessorChain processorChain =
req.getCore().getUpdateProcessingChain(params.get(UpdateParams.UPDATE_CHAIN));
UpdateRequestProcessor processor = processorChain.createProcessor(req,
rsp);

  DeleteUpdateCommand cmd = new DeleteUpdateCommand(req);
cmd.commitWithin = commitWithin;
cmd.setQuery(query);
processor.processDelete(cmd);

this is what I am doing when customizing a update request handler, I try
not to touch the original process chain but tell solr what to do by
commands.


On 19 November 2013 10:01, Peyman Faratin pey...@robustlinks.com wrote:

 Hi

 I am building a custom UpdateRequestProcessor to intercept any doc heading
 to the index. Basically what I want to do is to check if the current index
 has a doc with the same title (i am using IDs as the uniques so I can't use
 that, and besides the logic of checking is a little more complicated). If
 the incoming doc has a duplicate and some other conditions hold then one of
 2 things can happen:

 1- we don't index the incoming document
 2- we index the incoming and delete the duplicate currently in the
 index

 I think (1) can be done by simple not passing the call up the chain (not
 calling super.processAdd(cmd)). However, I don't know how to implement the
 second condition, deleting the duplicate document, inside a custom
 UpdateRequestProcessor. This thread is the closest to my goal

 http://lucene.472066.n3.nabble.com/SOLR-4-3-0-Migration-How-to-use-DeleteUpdateCommand-td4062454.html

 however i am not clear how to proceed. Code snippets below.

 thank you in advance for your help

 class isDuplicate extends UpdateRequestProcessor
 {
 public isDuplicate( UpdateRequestProcessor next) {
   super( next );
 }
 @Override
 public void processAdd(AddUpdateCommand cmd) throws
 IOException {
 try
 {
 boolean indexIncomingDoc =
 checkIfIsDuplicate(cmd);
 if(indexIncomingDoc)
 super.processAdd(cmd);
 } catch (SolrServerException e)
 {e.printStackTrace();}
 catch (ParseException e) {e.printStackTrace();}
 }
 public boolean checkIfIsDuplicate(AddUpdateCommand cmd)
 ...{

 SolrInputDocument incomingDoc =
 cmd.getSolrInputDocument();
 if(incomingDoc == null) return false;
 String title = (String) incomingDoc.getFieldValue(
 title );
 SolrIndexSearcher searcher =
 cmd.getReq().getSearcher();
 boolean addIncomingDoc = true;
 Integer idOfDuplicate = searcher.getFirstMatch(new
 Term(title,title));
 if(idOfDuplicate != -1)
 {
 addIncomingDoc =
 compareDocs(searcher,incomingDoc,idOfDuplicate,title,addIncomingDoc);
 }
 return addIncomingDoc;
 }
 private boolean compareDocs(.){
 
 if( condition 1 )
 {
 -- DELETE DUPLICATE DOC in INDEX --
 addIncomingDoc = true;
 }
 
 return addIncomingDoc;
 }




-- 
All the best

Liu Bo

Re: Multi-core support for indexing multiple servers

2013-11-12 Thread Liu Bo

As far as I know about magento, it's DB schema is designed for extensible
property storage and relationships between db tables are kind of complex.

Product has its attribute sets and properties which are stored in different
tables. Configurable product may have different attribute values for each
of it's sub simple products.

Handle relationship like this in DIH won't be easy, especially when you
want to group attributes of a configurable product into one document.

But if you just need to search on name and description but not other
attributes, you can try write DIH on catalog_product_flat_x tables, magento
may have several of them.

We used to use lucene core to provide search on magento products, what we
do is using SOAP service provided by magento to get products, and then
converting them to lucene document. Indexes are updated daily. This hides
lots of magento implementation details but it's kind of slow.

On 12 November 2013 22:41, Robert Veliz rob...@mavenbridge.com wrote:

I have two sources/servers--one of them is Magento. Since Magento has a
more or less out of the box integration with Solr, my thought was to run
Solr server from the Magento instance and then use DIH to get/merge content
from the other source/server. Seem feasible/appropriate? I spec'd it out
and it seems to make sense...

On Nov 11, 2013, at 11:25 PM, Liu Bo diabl...@gmail.com wrote:

like Erick said, merge data from different datasource could be very
difficult, SolrJ is much easier to use but may need another application
to
do handle index process if you don't want to extends solr much.

I eventually end up with a customized request handler which use
SolrWriter
from DIH package to index data,

So that I can fully control the index process, quite like SolrJ, you can
write code to convert your data into SolrInputDocument, and then post
them
to SolrWriter, SolrWriter will handles the rest stuff.

On 8 November 2013 21:46, Erick Erickson erickerick...@gmail.com
wrote:

Yep, you can define multiple data sources for use with DIH.

Combining data from those multiple sources into a single
index can be a bit tricky with DIH, personally I tend to prefer
SolrJ, but that's mostly personal preference, especially if
I want to get some parallelism going on.

But whatever works

Erick

On Thu, Nov 7, 2013 at 11:17 PM, manju16832003 manju16832...@gmail.com
wrote:

Eric,
Just a question :-), wouldn't it be easy to use DIH to pull data from
multiple data sources.

I do use DIH to do that comfortably. I have three data sources
- MySQL
- URLDataSource that returns XML from an .NET application
- URLDataSource that connects to an API and return XML

Here is part of data-config data source settings
dataSource type=JdbcDataSource name=solr
driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost/employeeDB batchSize=-1 user=root
password=root/
dataSource name=CRMServer type=URLDataSource
encoding=UTF-8
connectionTimeout=5000 readTimeout=1/
dataSource name=ImageServer type=URLDataSource
encoding=UTF-8
connectionTimeout=5000 readTimeout=1/

Of course, in application I do the same.
To construct my results, I do connect to MySQL and those two data
sources.

Basically we have two point of indexing
- Using DIH at one time indexing
- At application whenever there is transaction to the details that we
are
storing in Solr.

--
View this message in context:

http://lucene.472066.n3.nabble.com/Multi-core-support-for-indexing-multiple-servers-tp4099729p4099933.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
All the best

Liu Bo

--
All the best

Liu Bo

Re: eDisMax, multiple language support and stopwords

2013-11-11 Thread Liu Bo

Happy to see some one have similar solutions as ours.

we have similar multi-language search feature and we index different
language content to _fr, _en field like you've done

but in search, we need a language code as a parameter to specify the
language client wants to search on which is normally decided by the website
visited, such as: qf=name descriptionlanguage=en

and in our search components we find the right field: name_en and
description_en to be searched on

we used to support on all language search and removed that later, as the
site tells the customer which language is supported, we also don't think we
have many language experts on our web sites that knows more than two
language and need to search them at the same time.


On 7 November 2013 23:01, Tom Mortimer tom.m.f...@gmail.com wrote:

 Ah, thanks Markus. I think I'll just add the Boolean operators to the
 stopwords list in that case.

 Tom



 On 7 November 2013 12:01, Markus Jelsma markus.jel...@openindex.io
 wrote:

  This is an ancient problem. The issue here is your mm-parameter, it gets
  confused because for separate fields different amount of tokens are
  filtered/emitted so it is never going to work just like this. The easiest
  option is not to use the stopfilter.
 
 
 
 http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-td493483.html
  https://issues.apache.org/jira/browse/SOLR-3085
 
  -Original message-
   From:Tom Mortimer tom.m.f...@gmail.com
   Sent: Thursday 7th November 2013 12:50
   To: solr-user@lucene.apache.org
   Subject: eDisMax, multiple language support and stopwords
  
   Hi all,
  
   Thanks for the help and advice I've got here so far!
  
   Another question - I want to support stopwords at search time, so that
  e.g.
   the query oscar and wilde is equivalent to oscar wilde (this is
 with
   lowercaseOperators=false). Fair enough, I have stopword and in the
  query
   analyser chain.
  
   However, I also need to support French as well as English, so I've got
  _en
   and _fr versions of the text fields, with appropriate stemming and
   stopwords. I index French content into the _fr fields and English into
  the
   _en fields. I'm searching with eDisMax over both versions, e.g.:
  
   str name=qfheadline_en headline_fr/str
  
   However, this means I get no results for oscar and wilde. The parsed
   query is:
  
   (+((DisjunctionMaxQuery((headline_fr:osca | headline_en:oscar))
   DisjunctionMaxQuery((headline_fr:and))
   DisjunctionMaxQuery((headline_fr:wild |
 headline_en:wild)))~3))/no_coord
  
   If I add and to the French stopwords list, I *do* get results, and
 the
   parsed query is:
  
   (+((DisjunctionMaxQuery((headline_fr:osca | headline_en:oscar))
   DisjunctionMaxQuery((headline_fr:wild |
 headline_en:wild)))~2))/no_coord
  
   This implies that the only solution is to have a minimal, shared
  stopwords
   list for all languages I want to support. Is this correct, or is there
 a
   way of supporting this kind of searching with per-language stopword
  lists?
  
   Thanks for any ideas!
  
   Tom
  
 




-- 
All the best

Liu Bo

Re: Multi-core support for indexing multiple servers

2013-11-11 Thread Liu Bo

like Erick said, merge data from different datasource could be very
difficult, SolrJ is much easier to use but may need another application to
do handle index process if you don't want to extends solr much.

I eventually end up with a customized request handler which use SolrWriter
from DIH package to index data,

So that I can fully control the index process, quite like SolrJ, you can
write code to convert your data into SolrInputDocument, and then post them
to SolrWriter, SolrWriter will handles the rest stuff.


On 8 November 2013 21:46, Erick Erickson erickerick...@gmail.com wrote:

 Yep, you can define multiple data sources for use with DIH.

 Combining data from those multiple sources into a single
 index can be a bit tricky with DIH, personally I tend to prefer
 SolrJ, but that's mostly personal preference, especially if
 I want to get some parallelism going on.

 But whatever works

 Erick


 On Thu, Nov 7, 2013 at 11:17 PM, manju16832003 manju16832...@gmail.com
 wrote:

  Eric,
  Just a question :-), wouldn't it be easy to use DIH to pull data from
  multiple data sources.
 
  I do use DIH to do that comfortably. I have three data sources
   - MySQL
   - URLDataSource that returns XML from an .NET application
   - URLDataSource that connects to an API and return XML
 
  Here is part of data-config data source settings
  dataSource type=JdbcDataSource name=solr
  driver=com.mysql.jdbc.Driver
  url=jdbc:mysql://localhost/employeeDB batchSize=-1 user=root
  password=root/
 dataSource name=CRMServer type=URLDataSource encoding=UTF-8
  connectionTimeout=5000 readTimeout=1/
 dataSource name=ImageServer type=URLDataSource
 encoding=UTF-8
  connectionTimeout=5000 readTimeout=1/
 
 
  Of course, in application I do the same.
  To construct my results, I do connect to MySQL and those two data
 sources.
 
  Basically we have two point of indexing
   - Using DIH at one time indexing
   - At application whenever there is transaction to the details that we
 are
  storing in Solr.
 
 
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Multi-core-support-for-indexing-multiple-servers-tp4099729p4099933.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 




-- 
All the best

Liu Bo

how does solr load plugins?

2013-10-16 Thread Liu Bo

Hi

I write a plugin to index contents reusing our DAO layer which is developed
using Spring.

What I am doing now is putting the plugin jar and all other depending jars
of DAO layer to shared lib folder under solr home.

In the log, I can see all the jars are loaded through SolrResourceLoader
like:

INFO  - 2013-10-16 16:25:30.611; org.apache.solr.core.SolrResourceLoader;
Adding 'file:/D:/apache-tomcat-7.0.42/solr/lib/spring-tx-3.1.0.RELEASE.jar'
to classloader


Then initialize the Spring context using:

ApplicationContext context = new
FileSystemXmlApplicationContext(/solr/spring/solr-plugin-bean-test.xml);


Then Spring will complain:

INFO  - 2013-10-16 16:33:57.432;
org.springframework.context.support.AbstractApplicationContext; Refreshing
org.springframework.context.support.FileSystemXmlApplicationContext@e582a85:
startup date [Wed Oct 16 16:33:57 CST 2013]; root of context hierarchy
INFO  - 2013-10-16 16:33:57.491;
org.springframework.beans.factory.xml.XmlBeanDefinitionReader; Loading XML
bean definitions from file
[D:\apache-tomcat-7.0.42\solr\spring\solr-plugin-bean-test.xml]
ERROR - 2013-10-16 16:33:59.944;
com.test.search.solr.spring.AppicationContextWrapper; Configuration
problem: Unable to locate Spring NamespaceHandler for XML schema namespace [
http://www.springframework.org/schema/context]
Offending resource: file
[D:\apache-tomcat-7.0.42\solr\spring\solr-plugin-bean-test.xml]

Spring context requires spring-tx-3.1.xsd which does exist
in spring-tx-3.1.0.RELEASE.jar under
org\springframework\transaction\config\ package, but the program can't
find it even though it could load spring classes successfully.

The following won't work either.

ApplicationContext context = new
ClassPathXmlApplicationContext(classpath:spring/solr-plugin-bean-test.xml);
//the solr-plugin-bean-test.xml is packaged in plugin.jar as well.

But when I but all the jars under TOMECAT_HOME/webapp/solr/WEB-INF/lib, and
using

ApplicationContext context = new
ClassPathXmlApplicationContext(classpath:spring/solr-plugin-bean-test.xml);

everything works fine, I could initialize spring context and load DAO beans
to read data and then write them to solr index. But isn't modifying
solr.war a bad practice?

It seems SolrResourceLoader only loads classes from plugins jars but these
jars are NOT in classpath. Please correct me if I am wrong,

Is there any ways to use resources in plugin jars such as configuration
file?

BTW is there any difference between SolrResourceLoader with tomcat webapp
classLoader?

-- 
All the best

Liu Bo

Re: SolrDocumentList - bitwise operation

2013-10-13 Thread Liu Bo

join query might be helpful: http://wiki.apache.org/solr/Join

join can across indexes but probably won't work in solr clound.

be aware that only to documents are retrievable, if you want content from
both documents, join query won't work. And in lucene join query doesn't
quite work on multiple join conditions, haven't test it in solr yet.

I have similar join case like you, eventually I choose to denormalize our
data into one set of documents.


On 13 October 2013 22:34, Michael Tyler michaeltyler1...@gmail.com wrote:

 Hello,

 I have 2 different solr indexes returning 2 different sets of
 SolrDocumentList. Doc Id is the foreign key relation.

 After obtaining them, I want to perform AND operation between them and
 then return results to user. Can you tell me how do I get this? I am using
 solr 4.3

  SolrDocumentList results1 = responseA.getResults();
  SolrDocumentList results2 = responseB.getResults();

 results1  : d1, d2, d3
 results2  :  d1,d2, d4

 Return : d1, d2

 Regards,
 Michael




-- 
All the best

Liu Bo

Re: SolrCore 'collection1' is not available due to init failure

2013-10-11 Thread Liu Bo

org.apache.solr.core.SolrCore.init(SolrCore.java:821) ... 13 more Caused
by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
out:
NativeFSLock@/usr/share/solr-4.5.0/example/solr/
collection1/data/index/write.lock:
java.io.FileNotFoundException:
/usr/share/solr-4.5.0/example/solr/collection1/data/index/write.lock
(Permission denied) at org.apache.lucene.store.Lock.obtain(Lock.java:84) at

it seems a permission problem, the user that start tomcat don't have
permission to access your index folder.

try grant read and write permission to current user to your solr data
folder and restart tomcat to see what happens.


-- 
All the best

Liu Bo

Re: Multiple schemas in the same SolrCloud ?

2013-10-10 Thread Liu Bo

you can try this way:

start zookeeper server first.

upload your configurations to zookeeper and link them to your collection
using zkcli just like shawn said

let's say you have conf1 and conf2, you can link them to collection1 and
collection2

remove the bootstrap stuff and start solr server.

after you have solr running, create collection1 and collection2 via core
admin, you don't have conf because all your core specified configurations
are in zookeeper

or you could use core discovery and have collection name specified in
core.properties, see :
http://wiki.apache.org/solr/Core%20Discovery%20%284.4%20and%20beyond%29



On 10 October 2013 23:57, maephisto my_sky...@yahoo.com wrote:

 On this topic, once you've uploaded you collection's configuration in ZK,
 how
 can you update it?
 Upload the new one with the same config name ?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279p4094729.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
All the best

Liu Bo

Re: documents are not commited distributively in solr cloud tomcat with core discovery, range is null for shards in clusterstate.json

2013-10-08 Thread Liu Bo

I've solved this problem myself.

If you use core discovery, you must specify the numShards parameter in
core.properties.
or else solr won't be allocate range for each shards and then documents
won't be distributed properly.

Using core discovery to set up solr cloud in tomcat is much easier and
clean than coreAdmin described in the wiki:
http://wiki.apache.org/solr/SolrCloudTomcat.

It costs me some time to move from jetty to tomcat, but I think our IT team
will like this way. :)




On 6 October 2013 23:53, Liu Bo diabl...@gmail.com wrote:

 Hi all

 I've sent out this mail before, but I only subscribed to lucene-user but
 not solr-user at that time. Sorry for repeating if any and your help will
 be much of my appreciation.

 I'm trying out the tutorial about solrcloud, and then I manage to write my
 own plugin to import data from our set of databases, I use SolrWriter from
 DataImporter package and the docs could be distributed commit to shards.

 Every thing works fine using jetty from the solr example, but when I move
 to tomcat, solrcloud seems not been configured right. As the documents are
 just committed to the shard where update requested goes to.

 The cause probably is the range is null for shards in clusterstate.json.
 The router is implicit instead of compositeId as well.

 Is there anything missed or configured wrong in the following steps? How
 could I fix it. Your help will be much of my appreciation.

 PS, solr cloud tomcat wiki page isn't up to 4.4 with core discovery, I'm
 trying out after reading SoclrCloud, SolrCloudJboss, and CoreAdmin wiki
 pages.

 Here's what I've done and some useful logs:

 1. start three zookeeper server.
 2. upload configuration files to zookeeper, the collection name is
 content_collection
 3. start three tomcat instants on three server with core discovery

 a) core file:
  name=content
  loadOnStartup=true
  transient=false
  shard=shard1   (differrent on servers)
  collection=content_collection
 b) solr.xml

  solr

   solrcloud

 str name=host${host:}/str

 str name=hostContext${hostContext:solr}/str

 int name=hostPort8080/int

 int name=zkClientTimeout${zkClientTimeout:15000}/int

 str name=zkHost10.199.46.176:2181,10.199.46.165:2181,
 10.199.46.158:2181/str

 bool name=genericCoreNodeNames${genericCoreNodeNames:true}/bool

   /solrcloud


   shardHandlerFactory name=shardHandlerFactory

 class=HttpShardHandlerFactory

 int name=socketTimeout${socketTimeout:0}/int

 int name=connTimeout${connTimeout:0}/int

   /shardHandlerFactory

 /solr

 4. In the solr.log, I see the three shards are recognized, and the
 solrcloud can see the content_collection has three shards as well.
 5. write documents to content_collection using my update request, the
 documents only commits to the shard the request goes to, in the log I can
 see the DistributedUpdateProcessorFactory is in the processorChain and
 disribute commit is triggered:

 INFO  - 2013-09-30 16:31:43.205;
 com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler;
 updata request processor factories:

 INFO  - 2013-09-30 16:31:43.206;
 com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler;
 org.apache.solr.update.processor.LogUpdateProcessorFactory@4ae7b77

 INFO  - 2013-09-30 16:31:43.207;
 com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler;
 org.apache.solr.update.processor.*DistributedUpdateProcessorFactory*
 @5b2bc407

 INFO  - 2013-09-30 16:31:43.207;
 com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler;
 org.apache.solr.update.processor.RunUpdateProcessorFactory@1652d654

 INFO  - 2013-09-30 16:31:43.283; org.apache.solr.core.SolrDeletionPolicy;
 SolrDeletionPolicy.onInit: commits: num=1


 commit{dir=/home/bold/work/tomcat/solr/content/data/index,segFN=segments_1,generation=1}

 INFO  - 2013-09-30 16:31:43.284; org.apache.solr.core.SolrDeletionPolicy;
 newest commit generation = 1

 INFO  - 2013-09-30 16:31:43.440; *org.apache.solr.update.SolrCmdDistributor;
 Distrib commit to*:[StdNode: http://10.199.46.176:8080/solr/content/,
 StdNode: http://10.199.46.165:8080/solr/content/]
 params:commit_end_point=truecommit=truesoftCommit=falsewaitSearcher=trueexpungeDeletes=false

 but the documents won't go to other shards, the other shards only has a
 request with not documents:

 INFO  - 2013-09-30 16:31:43.841;
 org.apache.solr.update.DirectUpdateHandler2; start
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

 INFO  - 2013-09-30 16:31:43.855; org.apache.solr.core.SolrDeletionPolicy;
 SolrDeletionPolicy.onInit: commits: num=1


 commit{dir=/home/bold/work/tomcat/solr/content/data/index,segFN=segments_1,generation=1}

 INFO  - 2013-09-30 16:31:43.855; org.apache.solr.core.SolrDeletionPolicy;
 newest commit

documents are not commited distributively in solr cloud tomcat with core discovery, range is null for shards in clusterstate.json

2013-10-06 Thread Liu Bo

@3c74c144main{StandardDirectoryReader(segments_1:1:nrt)}

INFO  - 2013-09-30 16:31:43.870;
org.apache.solr.update.DirectUpdateHandler2; end_commit_flush

INFO  - 2013-09-30 16:31:43.870;
org.apache.solr.update.processor.LogUpdateProcessor; [content] webapp=/solr
path=/update
params={waitSearcher=truecommit=truewt=javabinexpungeDeletes=falsecommit_end_point=trueversion=2softCommit=false}
{commit=} 0 42

6) later I found the range is null in clusterstate.json which might have
caused the document isn't committed distributively

{content_collection:{

shards:{

  shard1:{

   * range:null,*

state:active,

replicas:{core_node1:{

state:active,

core:content,

node_name:10.199.46.176:8080_solr,

base_url:http://10.199.46.176:8080/solr;,

leader:true}}},

  shard3:{

   * range:null,*

state:active,

replicas:{core_node2:{

state:active,

core:content,

node_name:10.199.46.202:8080_solr,

base_url:http://10.199.46.202:8080/solr;,

leader:true}}},

  shard2:{

   * range:null,*

state:active,

replicas:{core_node3:{

state:active,

core:content,

node_name:10.199.46.165:8080_solr,

base_url:http://10.199.46.165:8080/solr;,

leader:true,

*router:implicit*}}



-- 
All the best

Liu Bo

documents are not commited distributively in solr cloud tomcat with core discovery, range is null for shards in clusterstate.json

2013-09-30 Thread Liu Bo

:31:43.870;
org.apache.solr.update.processor.LogUpdateProcessor; [content] webapp=/solr
path=/update
params={waitSearcher=truecommit=truewt=javabinexpungeDeletes=falsecommit_end_point=trueversion=2softCommit=false}
{commit=} 0 42

6) later I found the range is null in clusterstate.json which might have
caused the document isn't committed distributively

{content_collection:{

shards:{

  shard1:{

   * range:null,*

state:active,

replicas:{core_node1:{

state:active,

core:content,

node_name:10.199.46.176:8080_solr,

base_url:http://10.199.46.176:8080/solr;,

leader:true}}},

  shard3:{

   * range:null,*

state:active,

replicas:{core_node2:{

state:active,

core:content,

node_name:10.199.46.202:8080_solr,

base_url:http://10.199.46.202:8080/solr;,

leader:true}}},

  shard2:{

   * range:null,*

state:active,

replicas:{core_node3:{

state:active,

core:content,

node_name:10.199.46.165:8080_solr,

base_url:http://10.199.46.165:8080/solr;,

leader:true,

*router:implicit*}}



-- 
All the best

Liu Bo

how can I use DataImportHandler on multiple MySQL databases with the same schema?

2013-09-17 Thread Liu Bo

Hi all

Our system has distributed MySQL databases, we create a database for every
customer signed up and distributed it to one of our MySQL hosts.

We currently use lucene core to perform search on these databases, and we
write java code to loop through these databases and convert the data to
lucene index.

Right now we are planning to move to Solr for distribution, and I am doing
investigation on it.

I tried to use DataImportHandlerhttp://wiki.apache.org/solr/DataImportHandler
in
the wiki page, but I can't figured out a way to use multiple datasoures
with the same schema.

The other question is, we have the database connection data in one table,
can I create datasource connections info from it, and loop through the
databases using DataImporter?

If DataImporter isn't working, is there a way to feed data to solr using
customized SolrRequestHandler without using SolrJ?

If neither of these two ways is working, I think I am going to reuse the
DAO of the old project and feed the data to solr using SolrJ, probably
using embedded Solr server.

Your help will be much of my appreciation.

http://wiki.apache.org/solr/DataImportHandlerFaq--
All the best

Liu Bo

答复: removing duplicates

2013-08-21 Thread Liu

This picture is extracted from apache-solr-ref-guide-4.4.pdf ,Maybe it will
help you.
You could download the document from
https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/

-邮件原件-
发件人: Ali, Saqib [mailto:docbook@gmail.com] 
发送时间: 2013年8月22日 5:15
收件人: solr-user@lucene.apache.org
主题: removing duplicates

hello,

We have documents that are duplicates i.e. the ID is different, but rest of
the fields are same. Is there a query that can remove duplicate, and just
leave one copy of the document on solr? There is one numeric field that we
can key off for find duplicates.

Please advise.

Thanks

How to sort by the function: relevance_score*numberic_field/(relevance_score +numberic_field )

2013-08-20 Thread Liu

Hi:

  I want to rank the search result by the function:
relevance_score*numberic_field/(relevance_score +numberic_field ) , this
function equals to 

1/((1/relevance_score)+1/numberic_field) 

 

As far as I know ,I could use function query: sort=
div(1,sum(div(1,field(numberic_field)),div(1,query({!edismax v='
somewords''} .There is a subquery in this function: query({!edismax
v='somewords'}) ,it returns the relevance_sore .But I can't figure out its
query efficiency. After tracking the source code, I think the efficiency is
OK, but I can't make sure.

 

Do we have other approaches to sort docs by:
relevance_score*numberic_field/(relevance_score +numberic_field ) ?

 

Thank you

Leo

One case for shingle and synonym filter

2013-04-11 Thread Xiang Liu

Hi,
Here is the case:Given a doc named sport center, we hope some query like 
sportctr (user ignore) can recall it.Can shingle and synonym filter be 
combined in some smart way to produce the term?
Thanks,Xiang

HELP: CommonsHttpSolrServer.commit() time out after 1min

2013-02-19 Thread Siping Liu

Hi,
we have an index with 2mil documents in it. From time to time we rewrite
about 1/10 of the documents (just under 200k). No autocommit. At the end we
a single commit and got time out after 60 sec. My questions are:
1. is it normal to have the commit of this size takes more than 1min? I
know it's probably depend on the server ...
2. I know there're a few parameters I can set in CommonsHttpSolrServer
class: setConnectionManagerTimeout(), setConnectionTimeout(),
setSoTimeout(). Which should I use?

TIA

Re: HELP: CommonsHttpSolrServer.commit() time out after 1min

2013-02-19 Thread Siping Liu

Thanks for the quick response. It's Solr 3.4. I'm pretty sure we get plenty
memory.



On Tue, Feb 19, 2013 at 7:50 PM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 Which version of Solr?
 Are you sure you did not run out of memory half way through import?

 Regards,
Alex.

 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Tue, Feb 19, 2013 at 7:44 PM, Siping Liu liu01...@gmail.com wrote:

  Hi,
  we have an index with 2mil documents in it. From time to time we rewrite
  about 1/10 of the documents (just under 200k). No autocommit. At the end
 we
  a single commit and got time out after 60 sec. My questions are:
  1. is it normal to have the commit of this size takes more than 1min? I
  know it's probably depend on the server ...
  2. I know there're a few parameters I can set in CommonsHttpSolrServer
  class: setConnectionManagerTimeout(), setConnectionTimeout(),
  setSoTimeout(). Which should I use?
 
  TIA

Re: HELP: CommonsHttpSolrServer.commit() time out after 1min

2013-02-19 Thread Siping Liu

Solrj.


On Tue, Feb 19, 2013 at 9:08 PM, Erick Erickson erickerick...@gmail.comwrote:

 Well, your commits may have to wait until any merges are done, which _may_
 be merging your entire index into a single segment. Possibly this could
 take more than 60 seconds.

 _How_ are you doing this? DIH? SolrJ? post.jar?

 Best
 Erick


 On Tue, Feb 19, 2013 at 8:00 PM, Siping Liu liu01...@gmail.com wrote:

  Thanks for the quick response. It's Solr 3.4. I'm pretty sure we get
 plenty
  memory.
 
 
 
  On Tue, Feb 19, 2013 at 7:50 PM, Alexandre Rafalovitch
  arafa...@gmail.comwrote:
 
   Which version of Solr?
   Are you sure you did not run out of memory half way through import?
  
   Regards,
  Alex.
  
   Personal blog: http://blog.outerthoughts.com/
   LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
   - Time is the quality of nature that keeps events from happening all at
   once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)
  
  
   On Tue, Feb 19, 2013 at 7:44 PM, Siping Liu liu01...@gmail.com
 wrote:
  
Hi,
we have an index with 2mil documents in it. From time to time we
  rewrite
about 1/10 of the documents (just under 200k). No autocommit. At the
  end
   we
a single commit and got time out after 60 sec. My questions are:
1. is it normal to have the commit of this size takes more than
 1min? I
know it's probably depend on the server ...
2. I know there're a few parameters I can set in
 CommonsHttpSolrServer
class: setConnectionManagerTimeout(), setConnectionTimeout(),
setSoTimeout(). Which should I use?
   
TIA

Re: custom sorter

2012-07-22 Thread Siping Liu

Hi -- thanks for the response. It's the right direction. However on closer
look I don't think I can use it directly. The reason is that in my case,
the query string is always *:*, we use filter query to get different
results. When fq=(field1:xyz) we want to boost one document and let sort=
to take care of the rest results, and when field1 has other value, sort=
takes care of all results.

Maybe I can define my own SearchComponent class, and specify it in
arr name=last-components
  strmy_search_component/str
/arr
I have to try and see if that'd work.

thanks.


On Fri, Jul 20, 2012 at 3:24 AM, Lee Carroll
lee.a.carr...@googlemail.comwrote:

 take a look at
 http://wiki.apache.org/solr/QueryElevationComponent

 On 20 July 2012 03:48, Siping Liu liu01...@gmail.com wrote:

  Hi,
  I have requirements to place a document to a pre-determined  position for
  special filter query values, for instance when filter query is
  fq=(field1:xyz) place document abc as first result (the rest of the
  result set will be ordered by sort=field2). I guess I have to plug in my
  Java code as a custom sorter. I'd appreciate it if someone can shed light
  on this (how to add custom sorter, etc.)
  TIA.

custom sorter

2012-07-19 Thread Siping Liu

Hi,
I have requirements to place a document to a pre-determined  position for
special filter query values, for instance when filter query is
fq=(field1:xyz) place document abc as first result (the rest of the
result set will be ordered by sort=field2). I guess I have to plug in my
Java code as a custom sorter. I'd appreciate it if someone can shed light
on this (how to add custom sorter, etc.)
TIA.

Re: help: I always get NULL with row.get(columnName)

2012-07-19 Thread Roy Liu

anyone knows?

On Thu, Jul 19, 2012 at 5:48 PM, Roy Liu liuchua...@gmail.com wrote:

 Hi,

 When I use Transformer to handle files, I always get NULL with
 row.get(columnName).
 anyone knows?

 --
 The following file is *data-config.xml*

 dataConfig
   dataSource type=JdbcDataSource
   name=ds
   driver=oracle.jdbc.driver.OracleDriver
   url=jdbc:oracle:thin:@10.1.1.1:1521:sid
   user=username
   password=pwd
   /
   document name=BS_REPORT

 entity name=report pk=ID
query=select a.objid as ID from DOCGENERAL a where
 a.objid=14154965

 field column=ID name=id /

 *entity name=attachment *
 *query=select docid as ID, name as filename,
 storepath as filepath from attachment where docid=${report.ID} *
 * transformer=com.bs.solr.BSFileTransformer *
 * field column=ID name=bs_attachment_id /*
 * field column=filename name=bs_attachment_name /*
 * field column=filepath name=bs_attachment isfile=true/*
 * /entity*

 /entity

   /document
 /dataConfig


 public class *BSFileTransformer *extends Transformer {
  private static Log LOGGER = LogFactory.getLog(BSFileTransformer.class);
  @Override
  public Object transformRow(MapString, Object row, Context context) {
 // row.get(filename) is always null,but row.get(id) is
 OK.
  S*ystem.out.println(==filename:+row.get(filename));*

 ListMapString, String fields = context.getAllEntityFields();

 String id = null; // Entity ID
 String fileName = NONAME;
  for (MapString, String field : fields) {
 String name = field.get(name);
  System.out.println(name: + name);
 if (bs_attachment_id.equals(name)) {
  String columnName = field.get(column);
 id = String.valueOf(row.get(columnName));
  }
 if (bs_attachment_name.equals(name)) {
 String columnName = field.get(column);
  fileName = (String) row.get(columnName);
 }
  String isFile = field.get(isfile);
 if (true.equals(isFile)) {
  String columnName = field.get(column);
 String filePath = (String) row.get(columnName);

 try {
 System.out.println(fileName:+ fileName+,filePath:  + filePath);
  if(filePath != null){
 File file = new File(filePath);
  InputStream inputStream = new FileInputStream(file);
 Tika tika = new Tika();
  String text = tika.parseToString(inputStream);
  row.put(columnName, text);
  }
 LOGGER.info(Processed File OK! Entity:  + fileName + , ID:  +id);
  } catch (IOException ioe) {
 LOGGER.error(ioe.getMessage());
 row.put(columnName, );
  } catch (TikaException e) {
 LOGGER.error(Parse File Error： + id + , Error:
  + e.getMessage());
 row.put(columnName, );
 }
  }
 }
 return row;
  }
 }

Solr mail dataimporter cannot be found

2012-05-21 Thread Emma Bo Liu

Hi,

I want to index emails using solr. I put the user name, password, hostname
in data-config.xml under mail folder. This is a valid email but when I run
in url http://localhost:8983/solr/mail/dataimport?command=full-import  It
said cannot access mail/dataimporter reason: no found.  But when i run
http://localhost:8983/solr/rss/dataimport?command=full-importhttp://localhost:8983/solr/mail/dataimport?command=full-import
 or  
http://localhost:8983/solr/db/dataimport?command=full-imporhttp://localhost:8983/solr/mail/dataimport?command=full-import
They can be found.

In addition, when I run the command java
-Dsolr.solr.home=./example-DIH/solr/ -jar start.jar , on the left side of
solr UI, there are db, rss, tika and solr but no mail. Is it a bug that
mail indexing? Thank you so much!

Best,

Emma

RE: memory usage keep increase

2011-11-17 Thread Yongtao Liu

Erick,

Thanks for your reply.

Yes, virtual memory does not mean physical memory.
But if when virtual memory  physical memory, the system will change to 
slow, since lots for paging request happen.

Yongtao
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, November 15, 2011 8:37 AM
To: solr-user@lucene.apache.org
Subject: Re: memory usage keep increase

I'm pretty sure not. The words virtual memory address space is important 
here, that's not physical memory...

Best
Erick

On Mon, Nov 14, 2011 at 11:55 AM, Yongtao Liu y...@commvault.com wrote:
 Hi all,

 I saw one issue is ram usage keep increase when we run query.
 After look in the code, looks like Lucene use MMapDirectory to map index file 
 to ram.

 According to 
 http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/store/MMapDirectory.html
  comments, it will use lot of memory.
 NOTE: memory mapping uses up a portion of the virtual memory address space in 
 your process equal to the size of the file being mapped. Before using this 
 class, be sure your have plenty of virtual address space, e.g. by using a 64 
 bit JRE, or a 32 bit JRE with indexes that are guaranteed to fit within the 
 address space.

 So, my understanding is solr request physical RAM = index file size, is it 
 right?

 Yongtao


 **Legal Disclaimer***
 This communication may contain confidential and privileged material 
 for the sole use of the intended recipient. Any unauthorized review, 
 use or distribution by others is strictly prohibited. If you have 
 received the message in error, please advise the sender by reply email 
 and delete the message. Thank you.
 *

memory usage keep increase

2011-11-14 Thread Yongtao Liu

Hi all,

I saw one issue is ram usage keep increase when we run query.
After look in the code, looks like Lucene use MMapDirectory to map index file 
to ram.

According to 
http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/store/MMapDirectory.html
 comments, it will use lot of memory.
NOTE: memory mapping uses up a portion of the virtual memory address space in 
your process equal to the size of the file being mapped. Before using this 
class, be sure your have plenty of virtual address space, e.g. by using a 64 
bit JRE, or a 32 bit JRE with indexes that are guaranteed to fit within the 
address space.

So, my understanding is solr request physical RAM = index file size, is it 
right?

Yongtao


**Legal Disclaimer***
This communication may contain confidential and privileged
material for the sole use of the intended recipient. Any
unauthorized review, use or distribution by others is strictly
prohibited. If you have received the message in error, please
advise the sender by reply email and delete the message. Thank
you.
*

Re: FW: MMapDirectory failed to map a 23G compound index segment

2011-09-21 Thread Yongtao Liu

I hit similar issue recently.
Not sure if MMapDirectory is right way to go.

When index file be map to ram, JVM will call OS file mapping function.
The memory usage is in share memory, it may not be calculate to JVM process
space.

I saw one problem is if the index file bigger then physical ram, and there
are lot of query which cause wide index file access.
Then, the machine has no available memory.
The system change to very slow.

What i did is change lucene code to disable MMapDirectory.

On Wed, Sep 21, 2011 at 1:26 PM, Yongtao Liu y...@commvault.com wrote:



 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Tuesday, September 20, 2011 3:33 PM
 To: solr-user@lucene.apache.org
 Subject: Re: MMapDirectory failed to map a 23G compound index segment

 Since you hit OOME during mmap, I think this is an OS issue not a JVM
 issue.  Ie, the JVM isn't running out of memory.

 How many segments were in the unoptimized index?  It's possible the OS
 rejected the mmap because of process limits.  Run cat
 /proc/sys/vm/max_map_count to see how many mmaps are allowed.

 Or: is it possible you reopened the reader several times against the index
 (ie, after committing from Solr)?  If so, I think 2.9.x never unmaps the
 mapped areas, and so this would accumulate against the system limit.

  My memory of this is a little rusty but isn't mmap also limited by mem +
 swap on the box? What does 'free -g' report?

 I don't think this should be the case; you are using a 64 bit OS/JVM so in
 theory (except for OS system wide / per-process limits imposed) you should
 be able to mmap up to the full 64 bit address space.

 Your virtual memory is unlimited (from ulimit output), so that's good.

 Mike McCandless

 http://blog.mikemccandless.com

 On Wed, Sep 7, 2011 at 12:25 PM, Rich Cariens richcari...@gmail.com
 wrote:
  Ahoy ahoy!
 
  I've run into the dreaded OOM error with MMapDirectory on a 23G cfs
  compound index segment file. The stack trace looks pretty much like
  every other trace I've found when searching for OOM  map failed[1].
  My configuration
  follows:
 
  Solr 1.4.1/Lucene 2.9.3 (plus
  SOLR-1969https://issues.apache.org/jira/browse/SOLR-1969
  )
  CentOS 4.9 (Final)
  Linux 2.6.9-100.ELsmp x86_64 yada yada yada Java SE (build
  1.6.0_21-b06) Hotspot 64-bit Server VM (build 17.0-b16, mixed mode)
  ulimits:
 core file size (blocks, -c) 0
 data seg size(kbytes, -d) unlimited
 file size (blocks, -f) unlimited
 pending signals(-i) 1024
 max locked memory (kbytes, -l) 32
 max memory size (kbytes, -m) unlimited
 open files(-n) 256000
 pipe size (512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 stack size(kbytes, -s) 10240
 cpu time(seconds, -t) unlimited
 max user processes (-u) 1064959
 virtual memory(kbytes, -v) unlimited
 file locks(-x) unlimited
 
  Any suggestions?
 
  Thanks in advance,
  Rich
 
  [1]
  ...
  java.io.IOException: Map failed
   at sun.nio.ch.FileChannelImpl.map(Unknown Source)
   at
  org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(Unknown
  Source)
   at
  org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(Unknown
  Source)
   at org.apache.lucene.store.MMapDirectory.openInput(Unknown Source)
   at org.apache.lucene.index.SegmentReader$CoreReaders.init(Unknown
  Source)
 
   at org.apache.lucene.index.SegmentReader.get(Unknown Source)
   at org.apache.lucene.index.SegmentReader.get(Unknown Source)
   at org.apache.lucene.index.DirectoryReader.init(Unknown Source)
   at org.apache.lucene.index.ReadOnlyDirectoryReader.init(Unknown
  Source)
   at org.apache.lucene.index.DirectoryReader$1.doBody(Unknown Source)
   at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(Unknown
  Source)
   at org.apache.lucene.index.DirectoryReader.open(Unknown Source)
   at org.apache.lucene.index.IndexReader.open(Unknown Source) ...
  Caused by: java.lang.OutOfMemoryError: Map failed
   at sun.nio.ch.FileChannelImpl.map0(Native Method) ...
 
 **Legal Disclaimer***
 This communication may contain confidential and privileged
 material for the sole use of the intended recipient. Any
 unauthorized review, use or distribution by others is strictly
 prohibited. If you have received the message in error, please
 advise the sender by reply email and delete the message. Thank
 you.
 *

Re: How to index PDF file stored in SQL Server 2008

2011-04-11 Thread Roy Liu

Hi, all
Thank YOU very much for your kindly help.

*1. I have upgrade from Solr 1.4 to Solr 3.1*
*2. Change data-config-sql.xml *

dataConfig
  dataSource type=JdbcDataSource
  name=*bsds*
  driver=com.microsoft.sqlserver.jdbc.SQLServerDriver

url=jdbc:sqlserver://localhost:1433;databaseName=bs_docmanager
  user=username
  password=pw/
  datasource name=*docds* type=*BinURLDataSource* /

  document name=docs
entity name=*doc* dataSource=*bsds*
query=select id,attachment,filename from attachment where
ext='pdf' and id30001030 
field column=id name=id /
*entity dataSource=docds processor=TikaEntityProcessor
url=${doc.attachment} format=text **
field column=attachment name=bs_attachment /
/entity*
field column=filename name=title /
/entity
  /document
/dataConfig

*3. solrconfig.xml and schema.xml are NOT changed.*

However, when I access
*http://localhost:8080/solr/dataimport?command=full-import*

It still has errors:
Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to execute query:[B@ae1393 Processing Document # 1

Could you give me some advices. This problem is so boring me.
Thanks.

-- 
Best Regards,
Roy Liu


On Mon, Apr 11, 2011 at 5:16 AM, Lance Norskog goks...@gmail.com wrote:

 You have to upgrade completely to the Apache Solr 3.1 release. It is
 worth the effort. You cannot copy any jars between Solr releases.
 Also, you cannot copy over jars from newer Tika releases.

 On Fri, Apr 8, 2011 at 10:47 AM, Darx Oman darxo...@gmail.com wrote:
  Hi again
  what you are missing is field mapping
  field column=id name=id /
  
 
 
  no need for TikaEntityProcessor  since you are not accessing pdf files
 



 --
 Lance Norskog
 goks...@gmail.com

Re: How to index PDF file stored in SQL Server 2008

2011-04-11 Thread Roy Liu

Hi,

I have copied
\apache-solr-3.1.0\dist\apache-solr-dataimporthandler-extras-3.1.0.jar

into \apache-tomcat-6.0.32\webapps\solr\WEB-INF\lib\

Other Errors:
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Unclosed
quotation mark after the character string 'B@3e574'.

-- 
Best Regards,
Roy Liu


On Mon, Apr 11, 2011 at 2:12 PM, Darx Oman darxo...@gmail.com wrote:

 Hi there

 Error is not clear...

 but did you copy apache-solr-dataimporthandler-extras-4.0-SNAPSHOT.jar
 to your solr\lib ?

Re: How to index PDF file stored in SQL Server 2008

2011-04-11 Thread Roy Liu

I changed data-config-sql.xml to
dataConfig
  dataSource type=JdbcDataSource
  name=bsds
  driver=com.microsoft.sqlserver.jdbc.SQLServerDriver

url=jdbc:sqlserver://localhost:1433;databaseName=bs_docmanager
  user=username
  password=pw
  convertType=true
  /

  document name=docs
entity name=doc dataSource=bsds
query=select id,filename,attachment from attachment where
ext='pdf' and id=3632 
field column=id name=id /
field column=filename name=title /
field column=attachment name=bs_attachment /
/entity
  /document
/dataConfig


There are no errors, but, the indexed pdf is convert to Numbers..
200 1 202 1 203 1 212 1 222 1 236 1 242 1 244 1 254 1 255
-- 
Best Regards,
Roy Liu


On Mon, Apr 11, 2011 at 2:02 PM, Roy Liu liuchua...@gmail.com wrote:

 Hi, all
 Thank YOU very much for your kindly help.

 *1. I have upgrade from Solr 1.4 to Solr 3.1*
 *2. Change data-config-sql.xml *

 dataConfig
   dataSource type=JdbcDataSource
   name=*bsds*
   driver=com.microsoft.sqlserver.jdbc.SQLServerDriver

 url=jdbc:sqlserver://localhost:1433;databaseName=bs_docmanager
   user=username
   password=pw/
   datasource name=*docds* type=*BinURLDataSource* /

   document name=docs
 entity name=*doc* dataSource=*bsds*
 query=select id,attachment,filename from attachment where
 ext='pdf' and id30001030 

 field column=id name=id /
 *entity dataSource=docds processor=TikaEntityProcessor
 url=${doc.attachment} format=text **
 field column=attachment name=bs_attachment /
 /entity*
 field column=filename name=title /
 /entity
   /document
 /dataConfig

 *3. solrconfig.xml and schema.xml are NOT changed.*

 However, when I access

 *http://localhost:8080/solr/dataimport?command=full-import*

 It still has errors:
 Full Import
 failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
 Unable to execute query:[B@ae1393 Processing Document # 1

 Could you give me some advices. This problem is so boring me.
 Thanks.

 --
 Best Regards,
 Roy Liu



 On Mon, Apr 11, 2011 at 5:16 AM, Lance Norskog goks...@gmail.com wrote:

 You have to upgrade completely to the Apache Solr 3.1 release. It is
 worth the effort. You cannot copy any jars between Solr releases.
 Also, you cannot copy over jars from newer Tika releases.

 On Fri, Apr 8, 2011 at 10:47 AM, Darx Oman darxo...@gmail.com wrote:
  Hi again
  what you are missing is field mapping
  field column=id name=id /
  
 
 
  no need for TikaEntityProcessor  since you are not accessing pdf files
 



 --
 Lance Norskog
 goks...@gmail.com

Re: Tika, Solr running under Tomcat 6 on Debian

2011-04-11 Thread Roy Liu

\apache-solr-3.1.0\contrib\extraction\lib\tika*.jar

-- 
Best Regards,
Roy Liu


On Mon, Apr 11, 2011 at 3:10 PM, Mike satish01sud...@gmail.com wrote:

 Hi All,

 I have the same issue. I have installed solr instance on tomcat6. When try
 to index pdf I am running into the below exception:

 11 Apr, 2011 12:11:55 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.NoClassDefFoundError:
 org/apache/tika/exception/TikaException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at

 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:359)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)
at
 org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:449)
at

 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:240)
at

 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at

 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at

 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at

 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at

 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at

 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at

 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at

 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at

 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)
 Caused by: java.lang.ClassNotFoundException:
 org.apache.tika.exception.TikaException
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 22 more

 I could not found any tika jar file.
 Could you please help me out in fixing the above issue.

 Thanks,
 Mike

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Tika-Solr-running-under-Tomcat-6-on-Debian-tp993295p2805615.html
 Sent from the Solr - User mailing list archive at Nabble.com.

How to index MS SQL Server column with image type

2011-04-07 Thread Roy Liu

Hi all,

When I index a column(image type) of a table  via *
http://localhost:8080/solr/dataimport?command=full-import*
*There is a error like this: String length must be a multiple of four.*

Any help?
Thank you very much.

PS. the attachment includes Chinese character.


*1. data-config.xml*
dataConfig
 dataSource type=JdbcDataSource
 driver=net.sourceforge.jtds.jdbc.Driver
 url=jdbc:jtds:sqlserver://host:1433/db
 user=username
 password=password/

 document
   entity name=doc
   query=select id,*attachment*,filename as title from attachment
where ext='doc' and id1
*   field column=attachment name=bs_attachment/*
   /entity
 /document
/dataConfig

*2. schema.xml*
field name=bs_attachment type=binary indexed=true stored=true/

*3. Database*
*attachment *is a column of table attachment. it's type is IMAGE.


Best Regards,
Roy Liu

How to index PDF file stored in SQL Server 2008

2011-04-07 Thread Roy Liu

Hi,

I have a table named *attachment *in MS SQL Server 2008.

COLUMNTYPE
- 
id   int
titlevarchar(200)
attachment image

I need to index the attachment(store pdf files) column from database via
DIH.

After access this URL, it returns Indexing completed. Added/Updated: 5
documents. Deleted 0 documents.
http://localhost:8080/solr/dataimport?command=full-import

However, I can not search anything.

Anyone can help me ?

Thanks.



*data-config-sql.xml*
dataConfig
  dataSource type=JdbcDataSource
  driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
  url=jdbc:sqlserver://localhost:1433;databaseName=master
  user=user
  password=pw/
  document
entity name=doc
query=select id,title,attachment from attachment
/entity
  /document
/dataConfig

*schema.xml*
field name=attachment type=text indexed=true stored=true/



Best Regards,
Roy Liu

Re: How to index PDF file stored in SQL Server 2008

2011-04-07 Thread Roy Liu

Thanks Lance,

I'm using Solr 1.4.
If I want to using TikaEP, need to upgrade to Solr 3.1 or import jar files?

Best Regards,
Roy Liu


On Fri, Apr 8, 2011 at 10:22 AM, Lance Norskog goks...@gmail.com wrote:

 You need the TikaEntityProcessor to unpack the PDF image. You are
 sticking binary blobs into the index. Tika unpacks the text out of the
 file.

 TikaEP is not in Solr 1.4, but it is in the new Solr 3.1 release.

 On Thu, Apr 7, 2011 at 7:14 PM, Roy Liu liuchua...@gmail.com wrote:
  Hi,
 
  I have a table named *attachment *in MS SQL Server 2008.
 
  COLUMNTYPE
  - 
  id   int
  titlevarchar(200)
  attachment image
 
  I need to index the attachment(store pdf files) column from database via
  DIH.
 
  After access this URL, it returns Indexing completed. Added/Updated: 5
  documents. Deleted 0 documents.
  http://localhost:8080/solr/dataimport?command=full-import
 
  However, I can not search anything.
 
  Anyone can help me ?
 
  Thanks.
 
 
  
  *data-config-sql.xml*
  dataConfig
   dataSource type=JdbcDataSource
   driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
   url=jdbc:sqlserver://localhost:1433;databaseName=master
   user=user
   password=pw/
   document
 entity name=doc
 query=select id,title,attachment from attachment
 /entity
   /document
  /dataConfig
 
  *schema.xml*
  field name=attachment type=text indexed=true stored=true/
 
 
 
  Best Regards,
  Roy Liu
 



 --
 Lance Norskog
 goks...@gmail.com

Need help for solr searching case insensative item

2010-10-26 Thread wu liu

Hi all,

I just noticed a wierd thing happend to my solr search result.
if I do a search for ecommons, it cannot get the result for eCommons, 
instead,
if i do a search for eCommons, i can only get all the match for eCommons, 
but not ecommons.

I cannot figure it out why?

please help me

Thanks very much in advance

Re: How to delete documents from a SOLR cloud / balance the shards in the cloud?

2010-09-10 Thread James Liu

Stephan and all,

I am evaluating this like you are. You may want to check
http://www.tomkleinpeter.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/.
I would appreciate if others can shed some light on this, too.

Bests,
James
On Fri, Sep 10, 2010 at 6:07 AM, Stephan Raemy stephan.ra...@gmail.comwrote:

 Hi solr-cloud users,

 I'm currently setting up a solr-cloud/zookeeper instance and so far,
 everything works out fine. I downloaded the source from the cloud branch
 yesterday and build it from source.

 I've got 10 shards distributed across 4 servers and a zookeeper instance.
 Searching documents with the flag distrib=true works out and it returns
 the expected result.

 But here comes the tricky question. I will add new documents every day and
 therefore, I'd like to balance my shards to keep the system speedy. The
 Wiki says that one can calculate the hash of a document id and then
 determine the corresponding shard. But IMHO, this does not take into
 account
 that the cloud may become bigger or shrink over time by adding or removing
 shards. Obviously adding has a higher priority since one wants to reduce
 the shard size to improve the response time of distributed searches.

 When reading through the Wikis and existing documentation, it is still
 unclear to me how to do the following operations:
 - Modify/Delete a document stored in the cloud without having to store the
  document:shard mapping information outside of the cloud. I would expect
  something like shard attribute on each doc in the SOLR query result
  (activated/deactivated by a flag), so that i can query the SOLR cloud for
 a
  doc and then delete it on the specific shard.
 - Balance a cloud when adding/removing new shards or just balance them
 after
  many deletions.

 Of course there are solutions to this, but at the end, I'd love to have a
 true cloud where i do not have to worry about shard performance
 optimization.
 Hints are greatly appreciated.

 Cheers,
 Stephan

match to non tokenizable word (helloworld)

2010-05-16 Thread siping liu


I get no match when searching for helloworld, even though I have hello 
world in my index. How do people usually deal with this? Write a custom 
analyzer, with help from a collection of all dictionary words?

 

thanks for suggestions/comments.
  
_
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1

how to stress test solr

2010-02-03 Thread James liu

before stressing test, Should i close SolrCache?

which tool u use?

How to do stress test correctly?

Any pointers?

-- 
regards
j.L ( I live in Shanghai, China)

weird problem with solr.DateField

2009-11-11 Thread siping liu


Hi,

I'm using Solr 1.4 (from nightly build about 2 months ago) and have this 
defined in solrconfig:

fieldType name=date class=solr.DateField sortMissingLast=true 
omitNorms=true /

field name=lastUpdate type=date indexed=true stored=true default=NOW 
multiValued=false /

 

and following code that get executed once every night:

CommonsHttpSolrServer solrServer = new CommonsHttpSolrServer(http://...;);
solrServer.setRequestWriter(new BinaryRequestWriter());

solrServer.add(documents);
solrServer.commit();

UpdateResponse deleteResult = solrServer.deleteByQuery(lastUpdate:[* TO 
NOW-2HOUR]);
solrServer.commit();

 

The purpose is to refresh index with latest data (in documents).

This works fine, except that after a few days I start to see a few documents 
with no lastUpdate field (query -lastUpdate:[* TO *]) -- how can that be 
possible?

 

thanks in advance.

 
  
_
Windows 7: Unclutter your desktop.
http://go.microsoft.com/?linkid=9690331ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen:112009

RE: Solr and Garbage Collection

2009-10-02 Thread siping liu

Hi,

I read pretty much all posts on this thread (before and after this one). Looks
like the main suggestion from you and others is to keep max heap size (-Xmx) as
small as possible (as long as you don't see OOM exception). This brings more
questions than answers (for me at least. I'm new to Solr).

First, our environment and problem encountered: Solr1.4 (nightly build,
downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on
Solaris(multi-cpu/cores). The cache setting is from the default solrconfig.xml
(looks very small). At first we used minimum JAVA_OPTS and quickly run into the
problem similar to the one orignal poster reported -- long pause (seconds to
minutes) under load test. jconsole showed that it pauses on GC. So more
JAVA_OPTS get added: -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
-XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 -XX:NewSize=128m
-XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200, the thinking is with
mutile-cpu/cores we can get over with GC as quickly as possibe. With the new
setup, it works fine until Tomcat reaches heap size, then it blocks and takes
minutes on full GC to get more space from tenure generation. We tried
different Xmx (from very small to large), no difference in long GC time. We
never run into OOM.

Questions:

* In general various cachings are good for performance, we have more RAM to use
and want to use more caching to boost performance, isn't your suggestion (of
lowering heap limit) going against that?

* Looks like Solr caching made its way into tenure-generation on heap, that's
good. But why they get GC'ed eventually?? I did a quick check of Solr code
(Solr 1.3, not 1.4), and see a single instance of using WeakReference. Is that
what is causing all this? This seems to suggest a design flaw in Solr's memory
management strategy (or just my ignorance about Solr?). I mean, wouldn't this
be the right way of doing it -- you allow user to specify the cache size in
solrconfig.xml, then user can set up heap limit in JAVA_OPTS accordingly, and
no need to use WeakReference (BTW, why not SoftReference)??

* Right now I have a single Tomcat hosting Solr and other applications. I guess
now it's better to have Solr on its own Tomcat, given that it's tricky to
adjust the java options.

thanks.

From: wun...@wunderwood.org
To: solr-user@lucene.apache.org
Subject: RE: Solr and Garbage Collection
Date: Fri, 25 Sep 2009 09:51:29 -0700

30ms is not better or worse than 1s until you look at the service
requirements. For many applications, it is worth dedicating 10% of your
processing time to GC if that makes the worst-case pause short.

On the other hand, my experience with the IBM JVM was that the maximum query
rate was 2-3X better with the concurrent generational GC compared to any of
their other GC algorithms, so we got the best throughput along with the
shortest pauses.

Solr garbage generation (for queries) seems to have two major components:
per-request garbage and cache evictions. With a generational collector,
these two are handled by separate parts of the collector. Per-request
garbage should completely fit in the short-term heap (nursery), so that it
can be collected rapidly and returned to use for further requests. If the
nursery is too small, the per-request allocations will be made in tenured
space and sit there until the next major GC. Cache evictions are almost
always in long-term storage (tenured space) because an LRU algorithm
guarantees that the garbage will be old.

Check the growth rate of tenured space (under constant load, of course)
while increasing the size of the nursery. That rate should drop when the
nursery gets big enough, then not drop much further as it is increased more.

After that, reduce the size of tenured space until major GCs start happening
too often (a judgment call). A bigger tenured space means longer major GCs
and thus longer pauses, so you don't want it oversized by too much.

Also check the hit rates of your caches. If the hit rate is low, say 20% or
less, make that cache much bigger or set it to zero. Either one will reduce
the number of cache evictions. If you have an HTTP cache in front of Solr,
zero may be the right choice, since the HTTP cache is cherry-picking the
easily cacheable requests.

Note that a commit nearly doubles the memory required, because you have two
live Searcher objects with all their caches. Make sure you have headroom for
a commit.

If you want to test the tenured space usage, you must test with real world
queries. Those are the only way to get accurate cache eviction rates.

wunder

_
Bing™ brings you maps, menus, and reviews organized in one place. Try it now.
http://www.bing.com/search?q=restaurantsform=MLOGENpubl=WLHMTAGcrea=TEXT_MLOGEN_Core_tagline_local_1x1

anyway to get Document update time stamp

2009-09-17 Thread siping liu


I understand there's no update in Solr/lucene, it's really delete+insert. Is 
there anyway to get a Document's insert time stamp, w/o explicitely creating 
such a data field in the document? If so, how can I query it, for instance get 
all documents that are older than 24 hours? Thanks.
_
Hotmail: Free, trusted and rich email service.
http://clk.atdmt.com/GBL/go/171222984/direct/01/

Re: Is it problem? I use solr to search and index is made by lucene. (not EmbeddedSolrServer(wiki is old))

2009-07-03 Thread James liu

solr have much fieldtype, like: integer,long, double, sint, sfloat,
tint,tfloat,,and more.

but lucene not fieldtype,,just name and value, value only string.

so i not sure is it a problem when i use solr to search( index made by
lucene).



-- 
regards
j.L ( I live in Shanghai, China)

IndexMerge not found

2009-07-02 Thread James liu

i try http://wiki.apache.org/solr/MergingSolrIndexes

system: win2003, jdk 1.6

Error information:

 Caused by: java.lang.ClassNotFoundException:
 org.apache.lucene.misc.IndexMergeTo
 ol
 at java.net.URLClassLoader$1.run(Unknown Source)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(Unknown Source)
 at java.lang.ClassLoader.loadClass(Unknown Source)
 at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
 at java.lang.ClassLoader.loadClass(Unknown Source)
 at java.lang.ClassLoader.loadClassInternal(Unknown Source)
 Could not find the main class: org/apache/lucene/misc/IndexMergeTool.
 Program w
 ill exit.



-- 
regards
j.L ( I live in Shanghai, China)

Re: IndexMerge not found

2009-07-02 Thread James liu

i use lucene-core-2.9-dev.jar, lucene-misc-2.9-dev.jar

On Thu, Jul 2, 2009 at 2:02 PM, James liu liuping.ja...@gmail.com wrote:

 i try http://wiki.apache.org/solr/MergingSolrIndexes

 system: win2003, jdk 1.6

 Error information:

 Caused by: java.lang.ClassNotFoundException:
 org.apache.lucene.misc.IndexMergeTo
 ol
 at java.net.URLClassLoader$1.run(Unknown Source)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(Unknown Source)
 at java.lang.ClassLoader.loadClass(Unknown Source)
 at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
 at java.lang.ClassLoader.loadClass(Unknown Source)
 at java.lang.ClassLoader.loadClassInternal(Unknown Source)
 Could not find the main class: org/apache/lucene/misc/IndexMergeTool.
 Program w
 ill exit.



 --
 regards
 j.L ( I live in Shanghai, China)




-- 
regards
j.L ( I live in Shanghai, China)

Is it problem? I use solr to search and index is made by lucene. (not EmbeddedSolrServer(wiki is old))

2009-07-02 Thread James liu

I use solr to search and index is made by lucene. (not
EmbeddedSolrServer(wiki is old))

Is it problem when i use solr to search?

which the difference between Index(made by lucene and solr)?


thks

-- 
regards
j.L ( I live in Shanghai, China)

DisMaxRequestHandler usage

2009-06-16 Thread siping liu


Hi,

I have this standard query:

q=(field1:hello OR field2:hello) AND (field3:world)

 

Can I use dismax handler for this (applying the same search term on field1 and 
field2, but keep field3 with something separate)? If it can be done, what's the 
advantage of doing it this way over using the standard query?

 

thanks.

_
Microsoft brings you a new way to search the web.  Try  Bing™ now
http://www.bing.com?form=MFEHPGpubl=WLHMTAGcrea=TEXT_MFEHPG_Core_tagline_try 
bing_1x1

does solr support summary

2009-06-10 Thread James liu

if user use keyword to search and get summary(auto generated by
keyword)...like this

doc filed: id, text

id: 001
text:

 Open source is a development method for software that harnesses the power
 of distributed peer review and transparency of process. The promise of open
 source is better quality, higher reliability, more flexibility, lower cost,
 and an end to predatory vendor lock-in.

if keyword is source,,summary is:

Open source is a development...The promise of open source is better quality
if keyword is power ,,,summary is：
Open...harnesses the power of distributed peer review and transparency of
process...

just like google search results...

and any advice will be appreciated.

-- 
regards
j.L ( I live in Shanghai, China)

Query faceting

2009-06-08 Thread siping liu


Hi,

I have a field called service with following values:

- Shuttle Services
- Senior Discounts
- Laundry Rooms

- ...

 

When I conduct query with facet=truefacet.field=servicefacet.limit=-1, I 
get something like this back:

- shuttle 2

- service 3

- senior 0

- laundry 0

- room 3

- ...

 

Questions:

- How not to break up fields values in words, so I can get something like 
Shuttle Services 2 back?

- How to tell Solr not to return facet with 0 value? The query takes long time 
to finish, seemingly because of the long list of items with 0 count.

 

thanks for any advice.

_
Insert movie times and more without leaving Hotmail®. 
http://windowslive.com/Tutorial/Hotmail/QuickAdd?ocid=TXT_TAGLM_WL_HM_Tutorial_QuickAdd_062009

Re: timeouts

2009-06-05 Thread James liu

*Collins:

*i don't know what u wanna say?

-- 
regards
j.L ( I live in Shanghai, China)

Re: indexing Chienese langage

2009-06-04 Thread James liu

first: u not have to restart solr,,,u can use new data to replace old data
and call solr to use new search..u can find something in shell script which
with solr

two: u not have to restart solr,,,just keep id is same..example: old
id:1,title:hi, new id:1,title:welcome,,just index new data,,it will delete
old data and insert new doc,,,like replace,,but it will use more time and
resouce.

u can find indexed doc number from solr admin page.


On Fri, Jun 5, 2009 at 7:42 AM, Fer-Bj fernando.b...@gmail.com wrote:


 What we usually do to reindex is:

 1. stop solr
 2. rmdir -r data  (that is to remove everything in  /opt/solr/data/
 3. mkdir data
 4. start solr
 5. start reindex.   with this we're sure about not having old copies or
 index..

 To check the index size we do:
 cd data
 du -sh



 Otis Gospodnetic wrote:
 
 
  I can't tell what that analyzer does, but I'm guessing it uses n-grams?
  Maybe consider trying https://issues.apache.org/jira/browse/LUCENE-1629
  instead?
 
   Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: Fer-Bj fernando.b...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Thursday, June 4, 2009 2:20:03 AM
  Subject: Re: indexing Chienese langage
 
 
  We are trying SOLR 1.3 with Paoding Chinese Analyzer , and after
  reindexing
  the index size went from 1.5 Gb to 2.7 Gb.
 
  Is that some expected behavior ?
 
  Is there any switch or trick to avoid having a double + index file size?
 
  Koji Sekiguchi-2 wrote:
  
   CharFilter can normalize (convert) traditional chinese to simplified
   chinese or vice versa,
   if you define mapping.txt. Here is the sample of Chinese character
   normalization:
  
  
 
 https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG
  
   See SOLR-822 for the detail:
  
   https://issues.apache.org/jira/browse/SOLR-822
  
   Koji
  
  
   revathy arun wrote:
   Hi,
  
   When I index chinese content using chinese tokenizer and analyzer in
  solr
   1.3 ,some of the chinese text files are getting indexed but others
 are
   not.
  
   Since chinese has got many different language subtypes as in standard
   chinese,simplified chinese etc which of these does the chinese
  tokenizer
   support and is there any method to find the type of  chiense language
   from
   the file?
  
   Rgds
  
  
  
  
  
 
  --
  View this message in context:
 
 http://www.nabble.com/indexing-Chienese-langage-tp22033302p23864358.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

 --
 View this message in context:
 http://www.nabble.com/indexing-Chienese-langage-tp22033302p23879730.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
regards
j.L ( I live in Shanghai, China)

Re: indexing Chienese langage

2009-06-04 Thread James liu

On Mon, Feb 16, 2009 at 4:30 PM, revathy arun revas...@gmail.com wrote:

 Hi,

 When I index chinese content using chinese tokenizer and analyzer in solr
 1.3 ,some of the chinese text files are getting indexed but others are not.


are u sure ur analyzer can do it good?

if not sure, u can use analzyer link in solr admin page to check it



 Since chinese has got many different language subtypes as in standard
 chinese,simplified chinese etc which of these does the chinese tokenizer
 support and is there any method to find the type of  chiense language  from
 the file?

 Rgds




-- 
regards
j.L ( I live in Shanghai, China)

Re: Using Chinese / How to ?

2009-06-03 Thread James liu

1: modify ur schema.xml:
like
fieldtype name=text_cn class=solr.TextField
analyzer class=chineseAnalyzer/
analyzer

2: add your field:
field name=urfield type=text_cn indexd=true stored=true/

3: add your analyzer to {solr_dir}\lib\

4: rebuild newsolr and u will find it in {solr_dir}\dist

5: follow tutorial to setup solr

6: open your browser to solr admin page, find analyzer to check analyzer, it
will tell u how to analyzer world, use which analyzer


-- 
regards
j.L ( I live in Shanghai, China)

Re: Using Chinese / How to ?

2009-06-02 Thread James liu

u means how to config solr which support chinese?

Update problem?

On Tuesday, June 2, 2009, Fer-Bj fernando.b...@gmail.com wrote:

 I'm sending 3 files:
 - schema.xml
 - solrconfig.xml
 - error.txt (with the error description)

 I can confirm by now that this error is due to invalid characters for the
 XML format (ASCII 0 or 11).
 However, this problem now is taking a different direction: how to start
 using the CJK instead of the english!
 http://www.nabble.com/file/p23825881/error.txt error.txt
 http://www.nabble.com/file/p23825881/solrconfig.xml solrconfig.xml
 http://www.nabble.com/file/p23825881/schema.xml schema.xml


 Grant Ingersoll-6 wrote:

 Can you provide details on the errors?  I don't think we have a
 specific how to, but I wouldn't think it would be much different from
 1.2

 -Grant
 On May 31, 2009, at 10:31 PM, Fer-Bj wrote:


 Hello,

 is there any how to already created to get me up using SOLR 1.3
 running
 for a chinese based website?
 Currently our site is using SOLR 1.2, and we tried to move into 1.3
 but we
 couldn't complete our reindex as it seems like 1.3 is more strict
 when it
 comes to special chars.

 I would appreciate any help anyone may provide on this.

 Thanks!!
 --
 View this message in context:
 http://www.nabble.com/Using-Chinese---How-to---tp23810129p23810129.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
 using Solr/Lucene:
 http://www.lucidimagination.com/search




 --
 View this message in context: 
 http://www.nabble.com/Using-Chinese---How-to---tp23810129p23825881.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
regards
j.L ( I live in Shanghai, China)

Re: Solr multiple keyword search as google

2009-06-02 Thread James liu

U can find answer in tutorial or example

On Tuesday, June 2, 2009, The Spider maheshmura...@rediffmail.com wrote:

 Hi,
    I am using solr nightly bind for my search.
 I have to search in the location field of the table which is not my default
 search field.
 I will briefly explain my requirement below:
 I want to get the same/similar result when I give location multiple
 keywords, say  San jose ca USA
 or USA ca san jose or CA San jose USA (like that of google search). That
 means even if I rearranged the keywords of location I want to get proper
 results. Is there any way to do that?
 Thanks in advance
 --
 View this message in context: 
 http://www.nabble.com/Solr-multiple-keyword-search-as-google-tp23826278p23826278.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
regards
j.L ( I live in Shanghai, China)

RE: Creating a distributed search in a searchComponent

2009-05-21 Thread siping liu

I was looking for answer to the same question, and have similar concern. Looks
like any serious customization work requires developing custom SearchComponent,
but it's not clear to me how Solr designer wanted this to be done. I have more
confident to either do it at Lucene level, or stay on client side and using
something like Multi-core (as discussed here
http://wiki.apache.org/solr/MultipleIndexes).

Date: Wed, 20 May 2009 13:47:20 -0400
Subject: RE: Creating a distributed search in a searchComponent
From: nicholas.bai...@rackspace.com
To: solr-user@lucene.apache.org

It seems I sent this out a bit too soon. After looking at the source it seems
there are two seperate paths for distributed and regular queries, however the
prepare method for for all components is run before the shards parameter is
checked. So I can build the shards portion by using the prepare method of the
my own search component.

However I'm not sure if this is the greatest idea in case solr changes at
some point.

-Nick

-Original Message-
From: Nick Bailey nicholas.bai...@rackspace.com
Sent: Wednesday, May 20, 2009 1:29pm
To: solr-user@lucene.apache.org
Subject: Creating a distributed search in a searchComponent

Hi,

I am wondering if it is possible to basically add the distributed portion of
a search query inside of a searchComponent.

I am hoping to build my own component and add it as a first-component to the
StandardRequestHandler. Then hopefully I will be able to use this component
to build the shards parameter of the query and have the Handler then treat
the query as a distributed search. Anyone have any experience or know if this
is possible?

Thanks,
Nick

_
Hotmail® has ever-growing storage! Don’t worry about storage limits.
http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage1_052009

1 2 3 >

1 - 100 of 285 matches

Mail list logo