Re: Shard update error when using DIH

2013-01-22 Thread Jun Wang
You shoud look at log of solr-shard-4, It's seem that some error occured in
this shard.
-- 
from Jun Wang


Re: core.SolrCore - java.io.FileNotFoundException

2013-01-14 Thread Jun Wang
):C2836, _1oz5(4.0.0.2):C8231,
_1oyy(4.0.0.2):C29, _1oz4(4.0.0.2):C2988, _1oz8(4.0.0.2):C1,
_1ozb(4.0.0.2):C1] packetCount=4599
1491308 IW 0 [Mon Jan 14 09:21:37 PST 2013; http-0.0.0.0-8080-2]: hit
exception updating document
.

It's seemed lucene used a segment that has been deleted.



2012/10/15 Jun Wang wangjun...@gmail.com

 Hi, Erick
 Thanks for your advice. My mergeFactor is set to 10, so it's impossible
 have so many segments, specially some .fdx, .fdt file is just empty. And
 sometime indexing is working fine, ended with 200+ files in data dir. My
 deployment is having two core and two shard for every core, using
 autocommit , DIH is used for pull data from DB,   merge policies is using 
 TieredMergePolicy.
 there is nothing customized.

 I am wondering how could empty .fdx file generated. may be some config
 in indexConfig is wrong. My final index is about 20G, having 40m+ docs.
 here is part of my solrconfig.xml
 -
 ramBufferSizeMB32/ramBufferSizeMB
 maxBufferedDocs100/maxBufferedDocs

 mergeFactor10/mergeFactor

 updateHandler class=solr.DirectUpdateHandler2
   autoCommit
maxTime15000/maxTime
openSearcherfalse/openSearcher
  /autoCommit
 /updateHandler
 -

 PS, I found an other kind of log, but I am not sure it's the reason or
 the consequence. I am planing to open debug log, to gather more information
 tomorrow.


 2012-10-14 10:13:19,854 ERROR update.CommitTracker - auto commit
 error...:java.io.FileNotFoundException: _cwj.fdt
 at
 org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:266)
 at
 org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:177)
 at
 org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:103)
 at
 org.apache.lucene.index.IndexWriter.prepareFlushedSegment(IndexWriter.java:2126)
 at
 org.apache.lucene.index.DocumentsWriter.publishFlushedSegment(DocumentsWriter.java:495)
 at
 org.apache.lucene.index.DocumentsWriter.finishFlush(DocumentsWriter.java:474)
 at
 org.apache.lucene.index.DocumentsWriterFlushQueue$SegmentFlushTicket.publish(DocumentsWriterFlushQueue.java:201)
 at
 org.apache.lucene.index.DocumentsWriterFlushQueue.innerPurge(DocumentsWriterFlushQueue.java:119)
 at
 org.apache.lucene.index.DocumentsWriterFlushQueue.tryPurge(DocumentsWriterFlushQueue.java:148)
 at
 org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:435)
 at
 org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:551)
 at
 org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2657)
 at
 org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2793)
 at
 org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2773)
 at
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:531)
 at org.apache.solr.update.CommitTracker.run(CommitTracker.java:214)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
 at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)








 2012/10/15 Erick Erickson erickerick...@gmail.com

 I have no idea how you managed to get so many files in
 your index directory, but that's definitely weird. How it
 relates to your file not found, I'm not quite sure, but it
 could be something as simple as you've run out of file
 handles.

 So you could try upping the number of
 file handles as a _temporary_ fix just to see if that's
 the problem. See your op-system's manuals for
 how.

 If it does work, then I'd run an optimize
 down to one segment and remove all the segment
 files _other_ than that one segment. NOTE: this
 means things like .fdt, .fdx, .tii files etc. NOT things
 like segments.gen and segments_1. Make a
 backup of course before you try this.

 But I think that's secondary. To generate this many
 fiels I suspect you've started a lot of indexing
 jobs that you then abort (hard kill?). To get this
 many files I'd guess it's something programmatic,
 but that's a guess.

 How are you committing? Autocommit? From a SolrJ
 (or equivalent) program? Have you implemented any
 custom merge policies?

 But to your immediate problem. You can try running
 CheckIndex (here's a tutorial from 2.9, but I think
 it's still

Re: Solr 4.0 segment flush times has bigger difference between tow machines

2012-10-19 Thread Jun Wang
I have found that segment flush is controlled by
DocumentWriterFlushControl, and indexing is implemented by
DocumentWriterPerThread. DocumentWriterFlushControl has information about
number of doc and size of RAM buffer, but this seemed be shared by
all DocumentWriterPerThread. Is that RAM limit is sum of all buffer
of DocumentWriterPerThread?

2012/10/19 Jun Wang wangjun...@gmail.com

 Hi

 I have 2 machine for a collection, and it's using DIH to import data, DIH
 is trigger via url request at one machine, let's call it A, and A will
 forward some index to machine B. Recently I have found that segment flush
 happened more in machine B. here is part of INFOSTREAM.txt.

 Machine A:
 
 DWPT 0 [Thu Oct 18 20:06:20 PDT 2012; Thread-39]: flush postings as
 segment _4r3 numDocs=71616
 DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: new segment has 0
 deleted docs
 DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: new segment has no
 vectors; no norms; no docValues; prox; freqs
 DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]:
 flushedFiles=[_4r3_Lucene40_0.prx, _4r3.fdt, _4r3.fdx, _4r3.fnm,
 _4r3_Lucene40_0.tip, _4r3_Lucene40_0.tim, _4r3_Lucene40_0.frq]
 DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: flushed codec=Lucene40
 D

 Machine B
 --
 DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: flush postings
 as segment _zi0 numDocs=4302
 DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: new segment
 has 0 deleted docs
 DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: new segment
 has no vectors; no norms; no docValues; prox; freqs
 DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]:
 flushedFiles=[_zi0_Lucene40_0.prx, _zi0.fdx, _zi0_Lucene40_0.tim, _zi0.fdt,
 _zi0.fnm, _zi0_Lucene40_0.frq, _zi0_Lucene40_0.tip]
 DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: flushed
 codec=Lucene40
 D

 I have found that flush occured  when number of doc in RAM reached
 7~9000 in machine A, but the number in machine B is very different,
 almost is 4000.  It seem that every doc in buffer used more RAM in machine
 B then machine A, that result in more flush . Does any one know why this
 happened?

 My conf is here.

 ramBufferSizeMB64/ramBufferSizeMBmaxBufferedDocs10/maxBufferedDocs




 --
 from Jun Wang





-- 
from Jun Wang


What does _version_ field used for?

2012-10-17 Thread Jun Wang
I ma moving to solr4.0 from beta version. There is a exception was thrown,

Caused by: org.apache.solr.common.SolrException: _version_field must exist
in schema, using indexed=true stored=true and multiValued=false
(_version_ does not exist)
at
org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57)
at org.apache.solr.core.SolrCore.init(SolrCore.java:606)
... 26 more
2

It's seem that there need a field like
 field name=_version_ type=long indexed=true stored=true/
in schema.xml. I am wonder what does this used for?
-- 
from Jun Wang


Re: What does _version_ field used for?

2012-10-17 Thread Jun Wang
Is that said we just need to add this filed, and there is no more work?

2012/10/17 Rafał Kuć r@solr.pl

 Hello!

 It is used internally by Solr, for example by features like partial
 update functionality and update log.

 --
 Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

  I ma moving to solr4.0 from beta version. There is a exception was
 thrown,

  Caused by: org.apache.solr.common.SolrException: _version_field must
 exist
  in schema, using indexed=true stored=true and multiValued=false
  (_version_ does not exist)
  at
 
 org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57)
  at org.apache.solr.core.SolrCore.init(SolrCore.java:606)
  ... 26 more
  2

  It's seem that there need a field like
   field name=_version_ type=long indexed=true stored=true/
  in schema.xml. I am wonder what does this used for?




-- 
from Jun Wang


Re: What does _version_ field used for?

2012-10-17 Thread Jun Wang
Ok, I got it, thanks

2012/10/17 Alexandre Rafalovitch arafa...@gmail.com

 Yes, just make sure you have it in the scheme. Solr handles the rest.

 Regards,
Alex.
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Wed, Oct 17, 2012 at 12:57 PM, Jun Wang wangjun...@gmail.com wrote:
  Is that said we just need to add this filed, and there is no more work?
 
  2012/10/17 Rafał Kuć r@solr.pl
 
  Hello!
 
  It is used internally by Solr, for example by features like partial
  update functionality and update log.
 
  --
  Regards,
   Rafał Kuć
   Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
 ElasticSearch
 
   I ma moving to solr4.0 from beta version. There is a exception was
  thrown,
 
   Caused by: org.apache.solr.common.SolrException: _version_field must
  exist
   in schema, using indexed=true stored=true and multiValued=false
   (_version_ does not exist)
   at
  
 
 org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57)
   at org.apache.solr.core.SolrCore.init(SolrCore.java:606)
   ... 26 more
   2
 
   It's seem that there need a field like
field name=_version_ type=long indexed=true stored=true/
   in schema.xml. I am wonder what does this used for?
 
 
 
 
  --
  from Jun Wang




-- 
from Jun Wang


Re: core.SolrCore - java.io.FileNotFoundException

2012-10-15 Thread Jun Wang
Hi, Erick
Thanks for your advice. My mergeFactor is set to 10, so it's impossible
have so many segments, specially some .fdx, .fdt file is just empty. And
sometime indexing is working fine, ended with 200+ files in data dir. My
deployment is having two core and two shard for every core, using
autocommit , DIH is used for pull data from DB,   merge policies is
using TieredMergePolicy.
there is nothing customized.

I am wondering how could empty .fdx file generated. may be some config
in indexConfig is wrong. My final index is about 20G, having 40m+ docs.
here is part of my solrconfig.xml
-
ramBufferSizeMB32/ramBufferSizeMB
maxBufferedDocs100/maxBufferedDocs

mergeFactor10/mergeFactor

updateHandler class=solr.DirectUpdateHandler2
  autoCommit
   maxTime15000/maxTime
   openSearcherfalse/openSearcher
 /autoCommit
/updateHandler
-

PS, I found an other kind of log, but I am not sure it's the reason or
the consequence. I am planing to open debug log, to gather more information
tomorrow.


2012-10-14 10:13:19,854 ERROR update.CommitTracker - auto commit
error...:java.io.FileNotFoundException: _cwj.fdt
at
org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:266)
at
org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:177)
at
org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:103)
at
org.apache.lucene.index.IndexWriter.prepareFlushedSegment(IndexWriter.java:2126)
at
org.apache.lucene.index.DocumentsWriter.publishFlushedSegment(DocumentsWriter.java:495)
at
org.apache.lucene.index.DocumentsWriter.finishFlush(DocumentsWriter.java:474)
at
org.apache.lucene.index.DocumentsWriterFlushQueue$SegmentFlushTicket.publish(DocumentsWriterFlushQueue.java:201)
at
org.apache.lucene.index.DocumentsWriterFlushQueue.innerPurge(DocumentsWriterFlushQueue.java:119)
at
org.apache.lucene.index.DocumentsWriterFlushQueue.tryPurge(DocumentsWriterFlushQueue.java:148)
at
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:435)
at
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:551)
at
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2657)
at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2793)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2773)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:531)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:214)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)








2012/10/15 Erick Erickson erickerick...@gmail.com

 I have no idea how you managed to get so many files in
 your index directory, but that's definitely weird. How it
 relates to your file not found, I'm not quite sure, but it
 could be something as simple as you've run out of file
 handles.

 So you could try upping the number of
 file handles as a _temporary_ fix just to see if that's
 the problem. See your op-system's manuals for
 how.

 If it does work, then I'd run an optimize
 down to one segment and remove all the segment
 files _other_ than that one segment. NOTE: this
 means things like .fdt, .fdx, .tii files etc. NOT things
 like segments.gen and segments_1. Make a
 backup of course before you try this.

 But I think that's secondary. To generate this many
 fiels I suspect you've started a lot of indexing
 jobs that you then abort (hard kill?). To get this
 many files I'd guess it's something programmatic,
 but that's a guess.

 How are you committing? Autocommit? From a SolrJ
 (or equivalent) program? Have you implemented any
 custom merge policies?

 But to your immediate problem. You can try running
 CheckIndex (here's a tutorial from 2.9, but I think
 it's still good):
 http://java.dzone.com/news/lucene-and-solrs-checkindex

 If that doesn't help (and you can run it in diagnostic mode,
 without the --fix flag to see what it _would_ do) then I'm
 afraid you'll probably have to re-index.

 And you've got to get to the root of why you have so
 many segment files. That number is just crazy

 Best
 Erick

 On Sun, Oct 14, 2012 at 11:20 PM, Jun Wang wangjun...@gmail.com wrote

Re: core.SolrCore - java.io.FileNotFoundException

2012-10-14 Thread Jun Wang
PS, I have found that there lots of segment in index directory, and most of
them is empty, like . totoal file number is 35314 in  index directory.
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3n.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3o.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3o.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3p.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3p.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3q.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3q.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3r.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3r.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3s.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3s.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3t.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3t.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3u.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3u.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3v.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3v.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3w.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3w.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3x.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3x.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3y.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3y.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3z.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3z.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k40.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k40.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k41.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k41.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k42.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k42.fdx




2012/10/15 Jun Wang wangjun...@gmail.com

 I have encounter the a FileNotFoundException exception occasionally when
 indexing, it's not occur every time. Anyone have some clue? Here is
 the traceback:

 2012-10-14 11:37:28,105 ERROR core.SolrCore -
 java.io.FileNotFoundException:
 /home/admin/run/deploy/solr/core_p_shard2/data/index/_cwo.fnm (No such file
 or directory)
 at java.io.RandomAccessFile.open(Native Method)
 at java.io.RandomAccessFile.init(RandomAccessFile.java:216)
 at
 org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:218)
 at
 org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232)
 at
 org.apache.lucene.codecs.lucene40.Lucene40FieldInfosReader.read(Lucene40FieldInfosReader.java:47)
 at
 org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:101)
 at
 org.apache.lucene.index.SegmentReader.init(SegmentReader.java:55)
 at
 org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:120)
 at
 org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:267)
 at
 org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2928)
 at
 org.apache.lucene.index.DocumentsWriter.applyAllDeletes(DocumentsWriter.java:180)
 at
 org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:310)
 at
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:386)
 at
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1430)
 at
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:210)
 at
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
 at
 org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
 at
 org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:432)
 at
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:315)
 at
 org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:230)
 at
 org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:157)
 at
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
 at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173

It's there any way to specify config name for core in solr.xml?

2012-10-12 Thread jun Wang
Hi, all

I have two collections, and two machines. So, my deployment is like
|machine a  |machine b  |
|core a1 | core a2 | core b1 | core b2|

core a1 is for collection 1 shard1, core a2 is for collection 1 shard2.
config for collection is config 1.
core b1 is for collection 2 shard1, core b2 is for collection 2
shard2.  config for collection if config 2.

It's there any way to specify core config in solr.xml to start up two shard
in every machine whit correct config name?

-- 
from Jun Wang


Re: Solrcloud dataimport failed at first time after restart

2012-10-10 Thread jun Wang
I have found the reason.  The reason is that I am using jboss JNDI
datasource, and oracle driver is placed in WEB-INFO/lib, this is a very
common error, driver should be placed in %JBOSS_HOME%\server\default\lib.

2012/10/10 jun Wang wangjun...@gmail.com

 Hi, all
 I found that dataimport will failed at first time after restart. and the
 log is here . It's seem like a bug.

 2012-10-09 20:00:08,848 ERROR dataimport.DataImporter - Full Import
 failed:java.lang.RuntimeException: java.lang.RuntimeException:
 org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
 execute query: select a.id, a.subject, a.keywords, a.category_id,
 to_number((a.gmt_modified - to_date('1970-01-01','-mm-dd'))*24*60*60)
 as gmt_modified,a.member_seq,b.standard_attr_desc,
 b.custom_attr_desc, decode(a.product_min_price, null, 0,
 a.product_min_price)/100 as min_price, sign(a.ws_offline_date - sysdate) +
 1 as is_offlinefrom ws_product_draft a,
 ws_product_attribute_draft bwhere a.id =
 b.product_id(+) Processing Document # 1
 at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273)
 at
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
 at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
 at
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
 Caused by: java.lang.RuntimeException:
 org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
 execute query: select a.id, a.subject, a.keywords, a.category_id,
 to_number((a.gmt_modified - to_date('1970-01-01','-mm-dd'))*24*60*60)
 as gmt_modified,a.member_seq,b.standard_attr_desc,
 b.custom_attr_desc, decode(a.product_min_price, null, 0,
 a.product_min_price)/100 as min_price, sign(a.ws_offline_date - sysdate) +
 1 as is_offlinefrom ws_product_draft a,
 ws_product_attribute_draft bwhere a.id =
 b.product_id(+) Processing Document # 1
 at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:413)
 at
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326)
 at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234)
 ... 3 more
 Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
 Unable to execute query: select a.id, a.subject, a.keywords,
 a.category_id, to_number((a.gmt_modified -
 to_date('1970-01-01','-mm-dd'))*24*60*60) as gmt_modified,a.member_seq,
b.standard_attr_desc, b.custom_attr_desc,
 decode(a.product_min_price, null, 0, a.product_min_price)/100 as min_price,
 sign(a.ws_offline_date - sysdate) + 1 as is_offline
  from ws_product_draft a, ws_product_attribute_draft b
where a.id = b.product_id(+) Processing Document # 1
 at
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:252)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:209)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38)
 at
 org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
 at
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
 at
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
 at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:472)
 at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411)
 ... 5 more
 Caused by: java.lang.ClassNotFoundException: Unable to load null or
 org.apache.solr.handler.dataimport.null
 at
 org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:899)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:159)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:362)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:38)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:239)
 ... 12 more
 Caused by: java.lang.NullPointerException
 at
 java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
 at
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:387)
 at
 org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:889)
 ... 17 more



 --
 from Jun

Re: segment number during optimize of index

2012-10-10 Thread jun Wang
I have an other question, does the number of segment affect speed for
update index?

2012/10/10 jame vaalet jamevaa...@gmail.com

 Guys,
 thanks for all the inputs, I was continuing my research to know more about
 segments in Lucene. Below are my conclusion, please correct me if am wrong.

1. Segments are independent sub-indexes in seperate file, while indexing
its better to create new segment as it doesnt have to modify an existing
file. where as while searching, smaller the segment the better it is
 since
you open x (not exactly x but xn a value proportional to x) physical
 files
to search if you have got x segments in the index.
2. since lucene has memory map concept, for each file/segment in index a
new m-map file is created and mapped to the physcial file in disk. Can
someone explain or correct this in detail, i am sure there are lot many
people wondering how m-map works while you merge or optimze index
 segments.



 On 6 October 2012 07:41, Otis Gospodnetic otis.gospodne...@gmail.com
 wrote:

  If I were you and not knowing all your details...
 
  I would optimize indices that are static (not being modified) and
  would optimize down to 1 segment.
  I would do it when search traffic is low.
 
  Otis
  --
  Search Analytics - http://sematext.com/search-analytics/index.html
  Performance Monitoring - http://sematext.com/spm/index.html
 
 
  On Fri, Oct 5, 2012 at 4:27 PM, jame vaalet jamevaa...@gmail.com
 wrote:
   Hi Eric,
   I  am in a major dilemma with my index now. I have got 8 cores each
  around
   300 GB in size and half of them are deleted documents in it and above
  that
   each has got around 100 segments as well. Do i issue a expungeDelete
 and
   allow the merge policy to take care of the segments or optimize them
 into
   single segment. Search performance is not at par compared to usual solr
   speed.
   If i have to optimize what segment number should i choose? my RAM size
   around 120 GB and JVM heap is around 45 GB (oldGen being 30 GB). Pleas
   advice !
  
   thanks.
  
  
   On 6 October 2012 00:00, Erick Erickson erickerick...@gmail.com
 wrote:
  
   because eventually you'd run out of file handles. Imagine a
   long-running server with 100,000 segments. Totally
   unmanageable.
  
   I think shawn was emphasizing that RAM requirements don't
   depend on the number of segments. There are other
   resources that file consume however.
  
   Best
   Erick
  
   On Fri, Oct 5, 2012 at 1:08 PM, jame vaalet jamevaa...@gmail.com
  wrote:
hi Shawn,
thanks for the detailed explanation.
I have got one doubt, you said it doesn matter how many segments
 index
   have
but then why does solr has this merge policy which merges segments
frequently?  why can it leave the segments as it is rather than
  merging
smaller one's into bigger one?
   
thanks
.
   
On 5 October 2012 05:46, Shawn Heisey s...@elyograg.org wrote:
   
On 10/4/2012 3:22 PM, jame vaalet wrote:
   
so imagine i have merged the 150 Gb index into single segment,
 this
   would
make a single segment of 150 GB in memory. When new docs are
  indexed it
wouldn't alter this 150 Gb index unless i update or delete the
 older
   docs,
right? will 150 Gb single segment have problem with memory
 swapping
  at
   OS
level?
   
   
Supplement to my previous reply:  the real memory mentioned in the
  last
paragraph does not include the memory that the OS uses to cache
 disk
access.  If more memory is needed and all the free memory is being
  used
   by
the disk cache, the OS will throw away part of the disk cache (a
near-instantaneous operation that should never involve disk I/O)
 and
   give
that memory to the application that requests it.
   
Here's a very good breakdown of how memory gets used with
  MMapDirectory
   in
Solr.  It's applicable to any program that uses memory mapping, not
  just
Solr:
   
   
  http://java.dzone.com/**articles/use-lucene%E2%80%99s-**mmapdirectory
   http://java.dzone.com/articles/use-lucene%E2%80%99s-mmapdirectory
   
Thanks,
Shawn
   
   
   
   
--
   
-JAME
  
  
  
  
   --
  
   -JAME
 



 --

 -JAME




-- 
from Jun Wang


deletedPkQuery not work in solr 3.3

2012-09-05 Thread jun Wang
I have a data-config.xml with 2 entity, like

entity name=full PK=ID ...
...
/entity

and

entity name=delta_build PK=ID ...
...
/entity

entity delta_build is for delta import, query is

?command=full-importentity=delta_buildclean=false

and I want to using deletedPkQuery to delete index. So I have add those to
entity delta_build

deltaQuery=select -1 as ID from dual

deltaImportQuery=select * from product where a.id='${dataimporter.delta.ID}' 

deletedPKQuery=select product_id as ID from modified_product where
gmt_create gt; to_date('${dataimporter.last_index_time}','-mm-dd
hh24:mi:ss') and modification = 'deleted'

deltaQuery and deltaImportQuery is simply to avoid delta import any
records, course delta import has been implement by full import. and I am
just want using delta for delete index.

But when I hit query

?command=delta-import

deltaQuery and deltaImportQuery can be found in log, and without
deletedPKQuery. Is there any thing wrong in config file?

-- 
from Jun Wang