Re: Shard update error when using DIH
You shoud look at log of solr-shard-4, It's seem that some error occured in this shard. -- from Jun Wang
Re: core.SolrCore - java.io.FileNotFoundException
):C2836, _1oz5(4.0.0.2):C8231, _1oyy(4.0.0.2):C29, _1oz4(4.0.0.2):C2988, _1oz8(4.0.0.2):C1, _1ozb(4.0.0.2):C1] packetCount=4599 1491308 IW 0 [Mon Jan 14 09:21:37 PST 2013; http-0.0.0.0-8080-2]: hit exception updating document . It's seemed lucene used a segment that has been deleted. 2012/10/15 Jun Wang wangjun...@gmail.com Hi, Erick Thanks for your advice. My mergeFactor is set to 10, so it's impossible have so many segments, specially some .fdx, .fdt file is just empty. And sometime indexing is working fine, ended with 200+ files in data dir. My deployment is having two core and two shard for every core, using autocommit , DIH is used for pull data from DB, merge policies is using TieredMergePolicy. there is nothing customized. I am wondering how could empty .fdx file generated. may be some config in indexConfig is wrong. My final index is about 20G, having 40m+ docs. here is part of my solrconfig.xml - ramBufferSizeMB32/ramBufferSizeMB maxBufferedDocs100/maxBufferedDocs mergeFactor10/mergeFactor updateHandler class=solr.DirectUpdateHandler2 autoCommit maxTime15000/maxTime openSearcherfalse/openSearcher /autoCommit /updateHandler - PS, I found an other kind of log, but I am not sure it's the reason or the consequence. I am planing to open debug log, to gather more information tomorrow. 2012-10-14 10:13:19,854 ERROR update.CommitTracker - auto commit error...:java.io.FileNotFoundException: _cwj.fdt at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:266) at org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:177) at org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:103) at org.apache.lucene.index.IndexWriter.prepareFlushedSegment(IndexWriter.java:2126) at org.apache.lucene.index.DocumentsWriter.publishFlushedSegment(DocumentsWriter.java:495) at org.apache.lucene.index.DocumentsWriter.finishFlush(DocumentsWriter.java:474) at org.apache.lucene.index.DocumentsWriterFlushQueue$SegmentFlushTicket.publish(DocumentsWriterFlushQueue.java:201) at org.apache.lucene.index.DocumentsWriterFlushQueue.innerPurge(DocumentsWriterFlushQueue.java:119) at org.apache.lucene.index.DocumentsWriterFlushQueue.tryPurge(DocumentsWriterFlushQueue.java:148) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:435) at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:551) at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2657) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2793) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2773) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:531) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:214) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2012/10/15 Erick Erickson erickerick...@gmail.com I have no idea how you managed to get so many files in your index directory, but that's definitely weird. How it relates to your file not found, I'm not quite sure, but it could be something as simple as you've run out of file handles. So you could try upping the number of file handles as a _temporary_ fix just to see if that's the problem. See your op-system's manuals for how. If it does work, then I'd run an optimize down to one segment and remove all the segment files _other_ than that one segment. NOTE: this means things like .fdt, .fdx, .tii files etc. NOT things like segments.gen and segments_1. Make a backup of course before you try this. But I think that's secondary. To generate this many fiels I suspect you've started a lot of indexing jobs that you then abort (hard kill?). To get this many files I'd guess it's something programmatic, but that's a guess. How are you committing? Autocommit? From a SolrJ (or equivalent) program? Have you implemented any custom merge policies? But to your immediate problem. You can try running CheckIndex (here's a tutorial from 2.9, but I think it's still
Re: Solr 4.0 segment flush times has bigger difference between tow machines
I have found that segment flush is controlled by DocumentWriterFlushControl, and indexing is implemented by DocumentWriterPerThread. DocumentWriterFlushControl has information about number of doc and size of RAM buffer, but this seemed be shared by all DocumentWriterPerThread. Is that RAM limit is sum of all buffer of DocumentWriterPerThread? 2012/10/19 Jun Wang wangjun...@gmail.com Hi I have 2 machine for a collection, and it's using DIH to import data, DIH is trigger via url request at one machine, let's call it A, and A will forward some index to machine B. Recently I have found that segment flush happened more in machine B. here is part of INFOSTREAM.txt. Machine A: DWPT 0 [Thu Oct 18 20:06:20 PDT 2012; Thread-39]: flush postings as segment _4r3 numDocs=71616 DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: new segment has 0 deleted docs DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: new segment has no vectors; no norms; no docValues; prox; freqs DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: flushedFiles=[_4r3_Lucene40_0.prx, _4r3.fdt, _4r3.fdx, _4r3.fnm, _4r3_Lucene40_0.tip, _4r3_Lucene40_0.tim, _4r3_Lucene40_0.frq] DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: flushed codec=Lucene40 D Machine B -- DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: flush postings as segment _zi0 numDocs=4302 DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: new segment has 0 deleted docs DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: new segment has no vectors; no norms; no docValues; prox; freqs DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: flushedFiles=[_zi0_Lucene40_0.prx, _zi0.fdx, _zi0_Lucene40_0.tim, _zi0.fdt, _zi0.fnm, _zi0_Lucene40_0.frq, _zi0_Lucene40_0.tip] DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: flushed codec=Lucene40 D I have found that flush occured when number of doc in RAM reached 7~9000 in machine A, but the number in machine B is very different, almost is 4000. It seem that every doc in buffer used more RAM in machine B then machine A, that result in more flush . Does any one know why this happened? My conf is here. ramBufferSizeMB64/ramBufferSizeMBmaxBufferedDocs10/maxBufferedDocs -- from Jun Wang -- from Jun Wang
What does _version_ field used for?
I ma moving to solr4.0 from beta version. There is a exception was thrown, Caused by: org.apache.solr.common.SolrException: _version_field must exist in schema, using indexed=true stored=true and multiValued=false (_version_ does not exist) at org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57) at org.apache.solr.core.SolrCore.init(SolrCore.java:606) ... 26 more 2 It's seem that there need a field like field name=_version_ type=long indexed=true stored=true/ in schema.xml. I am wonder what does this used for? -- from Jun Wang
Re: What does _version_ field used for?
Is that said we just need to add this filed, and there is no more work? 2012/10/17 Rafał Kuć r@solr.pl Hello! It is used internally by Solr, for example by features like partial update functionality and update log. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch I ma moving to solr4.0 from beta version. There is a exception was thrown, Caused by: org.apache.solr.common.SolrException: _version_field must exist in schema, using indexed=true stored=true and multiValued=false (_version_ does not exist) at org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57) at org.apache.solr.core.SolrCore.init(SolrCore.java:606) ... 26 more 2 It's seem that there need a field like field name=_version_ type=long indexed=true stored=true/ in schema.xml. I am wonder what does this used for? -- from Jun Wang
Re: What does _version_ field used for?
Ok, I got it, thanks 2012/10/17 Alexandre Rafalovitch arafa...@gmail.com Yes, just make sure you have it in the scheme. Solr handles the rest. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Oct 17, 2012 at 12:57 PM, Jun Wang wangjun...@gmail.com wrote: Is that said we just need to add this filed, and there is no more work? 2012/10/17 Rafał Kuć r@solr.pl Hello! It is used internally by Solr, for example by features like partial update functionality and update log. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch I ma moving to solr4.0 from beta version. There is a exception was thrown, Caused by: org.apache.solr.common.SolrException: _version_field must exist in schema, using indexed=true stored=true and multiValued=false (_version_ does not exist) at org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57) at org.apache.solr.core.SolrCore.init(SolrCore.java:606) ... 26 more 2 It's seem that there need a field like field name=_version_ type=long indexed=true stored=true/ in schema.xml. I am wonder what does this used for? -- from Jun Wang -- from Jun Wang
Re: core.SolrCore - java.io.FileNotFoundException
Hi, Erick Thanks for your advice. My mergeFactor is set to 10, so it's impossible have so many segments, specially some .fdx, .fdt file is just empty. And sometime indexing is working fine, ended with 200+ files in data dir. My deployment is having two core and two shard for every core, using autocommit , DIH is used for pull data from DB, merge policies is using TieredMergePolicy. there is nothing customized. I am wondering how could empty .fdx file generated. may be some config in indexConfig is wrong. My final index is about 20G, having 40m+ docs. here is part of my solrconfig.xml - ramBufferSizeMB32/ramBufferSizeMB maxBufferedDocs100/maxBufferedDocs mergeFactor10/mergeFactor updateHandler class=solr.DirectUpdateHandler2 autoCommit maxTime15000/maxTime openSearcherfalse/openSearcher /autoCommit /updateHandler - PS, I found an other kind of log, but I am not sure it's the reason or the consequence. I am planing to open debug log, to gather more information tomorrow. 2012-10-14 10:13:19,854 ERROR update.CommitTracker - auto commit error...:java.io.FileNotFoundException: _cwj.fdt at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:266) at org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:177) at org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:103) at org.apache.lucene.index.IndexWriter.prepareFlushedSegment(IndexWriter.java:2126) at org.apache.lucene.index.DocumentsWriter.publishFlushedSegment(DocumentsWriter.java:495) at org.apache.lucene.index.DocumentsWriter.finishFlush(DocumentsWriter.java:474) at org.apache.lucene.index.DocumentsWriterFlushQueue$SegmentFlushTicket.publish(DocumentsWriterFlushQueue.java:201) at org.apache.lucene.index.DocumentsWriterFlushQueue.innerPurge(DocumentsWriterFlushQueue.java:119) at org.apache.lucene.index.DocumentsWriterFlushQueue.tryPurge(DocumentsWriterFlushQueue.java:148) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:435) at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:551) at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2657) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2793) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2773) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:531) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:214) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2012/10/15 Erick Erickson erickerick...@gmail.com I have no idea how you managed to get so many files in your index directory, but that's definitely weird. How it relates to your file not found, I'm not quite sure, but it could be something as simple as you've run out of file handles. So you could try upping the number of file handles as a _temporary_ fix just to see if that's the problem. See your op-system's manuals for how. If it does work, then I'd run an optimize down to one segment and remove all the segment files _other_ than that one segment. NOTE: this means things like .fdt, .fdx, .tii files etc. NOT things like segments.gen and segments_1. Make a backup of course before you try this. But I think that's secondary. To generate this many fiels I suspect you've started a lot of indexing jobs that you then abort (hard kill?). To get this many files I'd guess it's something programmatic, but that's a guess. How are you committing? Autocommit? From a SolrJ (or equivalent) program? Have you implemented any custom merge policies? But to your immediate problem. You can try running CheckIndex (here's a tutorial from 2.9, but I think it's still good): http://java.dzone.com/news/lucene-and-solrs-checkindex If that doesn't help (and you can run it in diagnostic mode, without the --fix flag to see what it _would_ do) then I'm afraid you'll probably have to re-index. And you've got to get to the root of why you have so many segment files. That number is just crazy Best Erick On Sun, Oct 14, 2012 at 11:20 PM, Jun Wang wangjun...@gmail.com wrote
Re: core.SolrCore - java.io.FileNotFoundException
PS, I have found that there lots of segment in index directory, and most of them is empty, like . totoal file number is 35314 in index directory. -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3n.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3o.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3o.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3p.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3p.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3q.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3q.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3r.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3r.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3s.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3s.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3t.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3t.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3u.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3u.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3v.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3v.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3w.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3w.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3x.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3x.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3y.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3y.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3z.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3z.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k40.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k40.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k41.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k41.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k42.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k42.fdx 2012/10/15 Jun Wang wangjun...@gmail.com I have encounter the a FileNotFoundException exception occasionally when indexing, it's not occur every time. Anyone have some clue? Here is the traceback: 2012-10-14 11:37:28,105 ERROR core.SolrCore - java.io.FileNotFoundException: /home/admin/run/deploy/solr/core_p_shard2/data/index/_cwo.fnm (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:216) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:218) at org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232) at org.apache.lucene.codecs.lucene40.Lucene40FieldInfosReader.read(Lucene40FieldInfosReader.java:47) at org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:101) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:55) at org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:120) at org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:267) at org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2928) at org.apache.lucene.index.DocumentsWriter.applyAllDeletes(DocumentsWriter.java:180) at org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:310) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:386) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1430) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:210) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:432) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:315) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:230) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:157) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173
It's there any way to specify config name for core in solr.xml?
Hi, all I have two collections, and two machines. So, my deployment is like |machine a |machine b | |core a1 | core a2 | core b1 | core b2| core a1 is for collection 1 shard1, core a2 is for collection 1 shard2. config for collection is config 1. core b1 is for collection 2 shard1, core b2 is for collection 2 shard2. config for collection if config 2. It's there any way to specify core config in solr.xml to start up two shard in every machine whit correct config name? -- from Jun Wang
Re: Solrcloud dataimport failed at first time after restart
I have found the reason. The reason is that I am using jboss JNDI datasource, and oracle driver is placed in WEB-INFO/lib, this is a very common error, driver should be placed in %JBOSS_HOME%\server\default\lib. 2012/10/10 jun Wang wangjun...@gmail.com Hi, all I found that dataimport will failed at first time after restart. and the log is here . It's seem like a bug. 2012-10-09 20:00:08,848 ERROR dataimport.DataImporter - Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select a.id, a.subject, a.keywords, a.category_id, to_number((a.gmt_modified - to_date('1970-01-01','-mm-dd'))*24*60*60) as gmt_modified,a.member_seq,b.standard_attr_desc, b.custom_attr_desc, decode(a.product_min_price, null, 0, a.product_min_price)/100 as min_price, sign(a.ws_offline_date - sysdate) + 1 as is_offlinefrom ws_product_draft a, ws_product_attribute_draft bwhere a.id = b.product_id(+) Processing Document # 1 at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429) Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select a.id, a.subject, a.keywords, a.category_id, to_number((a.gmt_modified - to_date('1970-01-01','-mm-dd'))*24*60*60) as gmt_modified,a.member_seq,b.standard_attr_desc, b.custom_attr_desc, decode(a.product_min_price, null, 0, a.product_min_price)/100 as min_price, sign(a.ws_offline_date - sysdate) + 1 as is_offlinefrom ws_product_draft a, ws_product_attribute_draft bwhere a.id = b.product_id(+) Processing Document # 1 at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:413) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234) ... 3 more Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select a.id, a.subject, a.keywords, a.category_id, to_number((a.gmt_modified - to_date('1970-01-01','-mm-dd'))*24*60*60) as gmt_modified,a.member_seq, b.standard_attr_desc, b.custom_attr_desc, decode(a.product_min_price, null, 0, a.product_min_price)/100 as min_price, sign(a.ws_offline_date - sysdate) + 1 as is_offline from ws_product_draft a, ws_product_attribute_draft b where a.id = b.product_id(+) Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:252) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:209) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:472) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411) ... 5 more Caused by: java.lang.ClassNotFoundException: Unable to load null or org.apache.solr.handler.dataimport.null at org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:899) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:159) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127) at org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:362) at org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:38) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:239) ... 12 more Caused by: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:387) at org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:889) ... 17 more -- from Jun
Re: segment number during optimize of index
I have an other question, does the number of segment affect speed for update index? 2012/10/10 jame vaalet jamevaa...@gmail.com Guys, thanks for all the inputs, I was continuing my research to know more about segments in Lucene. Below are my conclusion, please correct me if am wrong. 1. Segments are independent sub-indexes in seperate file, while indexing its better to create new segment as it doesnt have to modify an existing file. where as while searching, smaller the segment the better it is since you open x (not exactly x but xn a value proportional to x) physical files to search if you have got x segments in the index. 2. since lucene has memory map concept, for each file/segment in index a new m-map file is created and mapped to the physcial file in disk. Can someone explain or correct this in detail, i am sure there are lot many people wondering how m-map works while you merge or optimze index segments. On 6 October 2012 07:41, Otis Gospodnetic otis.gospodne...@gmail.com wrote: If I were you and not knowing all your details... I would optimize indices that are static (not being modified) and would optimize down to 1 segment. I would do it when search traffic is low. Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Fri, Oct 5, 2012 at 4:27 PM, jame vaalet jamevaa...@gmail.com wrote: Hi Eric, I am in a major dilemma with my index now. I have got 8 cores each around 300 GB in size and half of them are deleted documents in it and above that each has got around 100 segments as well. Do i issue a expungeDelete and allow the merge policy to take care of the segments or optimize them into single segment. Search performance is not at par compared to usual solr speed. If i have to optimize what segment number should i choose? my RAM size around 120 GB and JVM heap is around 45 GB (oldGen being 30 GB). Pleas advice ! thanks. On 6 October 2012 00:00, Erick Erickson erickerick...@gmail.com wrote: because eventually you'd run out of file handles. Imagine a long-running server with 100,000 segments. Totally unmanageable. I think shawn was emphasizing that RAM requirements don't depend on the number of segments. There are other resources that file consume however. Best Erick On Fri, Oct 5, 2012 at 1:08 PM, jame vaalet jamevaa...@gmail.com wrote: hi Shawn, thanks for the detailed explanation. I have got one doubt, you said it doesn matter how many segments index have but then why does solr has this merge policy which merges segments frequently? why can it leave the segments as it is rather than merging smaller one's into bigger one? thanks . On 5 October 2012 05:46, Shawn Heisey s...@elyograg.org wrote: On 10/4/2012 3:22 PM, jame vaalet wrote: so imagine i have merged the 150 Gb index into single segment, this would make a single segment of 150 GB in memory. When new docs are indexed it wouldn't alter this 150 Gb index unless i update or delete the older docs, right? will 150 Gb single segment have problem with memory swapping at OS level? Supplement to my previous reply: the real memory mentioned in the last paragraph does not include the memory that the OS uses to cache disk access. If more memory is needed and all the free memory is being used by the disk cache, the OS will throw away part of the disk cache (a near-instantaneous operation that should never involve disk I/O) and give that memory to the application that requests it. Here's a very good breakdown of how memory gets used with MMapDirectory in Solr. It's applicable to any program that uses memory mapping, not just Solr: http://java.dzone.com/**articles/use-lucene%E2%80%99s-**mmapdirectory http://java.dzone.com/articles/use-lucene%E2%80%99s-mmapdirectory Thanks, Shawn -- -JAME -- -JAME -- -JAME -- from Jun Wang
deletedPkQuery not work in solr 3.3
I have a data-config.xml with 2 entity, like entity name=full PK=ID ... ... /entity and entity name=delta_build PK=ID ... ... /entity entity delta_build is for delta import, query is ?command=full-importentity=delta_buildclean=false and I want to using deletedPkQuery to delete index. So I have add those to entity delta_build deltaQuery=select -1 as ID from dual deltaImportQuery=select * from product where a.id='${dataimporter.delta.ID}' deletedPKQuery=select product_id as ID from modified_product where gmt_create gt; to_date('${dataimporter.last_index_time}','-mm-dd hh24:mi:ss') and modification = 'deleted' deltaQuery and deltaImportQuery is simply to avoid delta import any records, course delta import has been implement by full import. and I am just want using delta for delete index. But when I hit query ?command=delta-import deltaQuery and deltaImportQuery can be found in log, and without deletedPKQuery. Is there any thing wrong in config file? -- from Jun Wang