Re: SOLR 4.2 SolrQuery exception
manually delete lock file /data/solr1/example/solr/collection1/./data/index/write.lock, And restart solr On Sun, Mar 24, 2013 at 9:32 PM, Sandeep Kumar Anumalla sanuma...@etisalat.ae wrote: Hi, I managed to resolve this issue and I am getting the results also. But this time I am getting a different exception while loading Solr Container Here is the Code. String SOLR_HOME = /data/solr1/example/solr/collection1; CoreContainer coreContainer = new CoreContainer(SOLR_HOME); CoreDescriptor discriptor = new CoreDescriptor(coreContainer, collection1, new File(SOLR_HOME).getAbsolutePath()); SolrCore solrCore = coreContainer.create(discriptor); coreContainer.register(solrCore, false); File home = new File( SOLR_HOME ); File f = new File( home, solr.xml ); coreContainer.load( SOLR_HOME, f ); server = new EmbeddedSolrServer( coreContainer, collection1 ); SolrQuery q = new SolrQuery(); Parameters inside Solrconfig.xml !-- writeLockTimeout1000/writeLockTimeout -- lockTypesimple/lockType unlockOnStartuptrue/unlockOnStartup WARNING: Unable to get IndexCommit on startup org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SimpleFSLock@/data/solr1/example/solr/collection1/./data/index/write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:84) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:636) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:77) at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64) at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:192) at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:106) at org.apache.solr.handler.ReplicationHandler.inform(ReplicationHandler.java:904) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:592) at org.apache.solr.core.SolrCore.init(SolrCore.java:801) at org.apache.solr.core.SolrCore.init(SolrCore.java:619) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) From: Sandeep Kumar Anumalla Sent: 24 March, 2013 03:44 PM To: solr-user@lucene.apache.org Subject: SOLR 4.2 SolrQuery exception I am using the below code and getting the exception while using SolrQuery Mar 24, 2013 3:08:07 PM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener sending requests to Searcher@795e0c2bmain{StandardDirectoryReader(segments_49:524 _4v(4.2):C299313 _4x(4.2):C2953/1396 _4y(4.2):C2866/1470 _4z(4.2):C4263/2793 _50(4.2):C3554/761 _51(4.2):C1126/365 _52(4.2):C650/285 _53(4.2):C500/215 _54(4.2):C1808/1593 _55(4.2):C1593)} Mar 24, 2013 3:08:07 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:181) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:64) at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1586) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) Mar 24, 2013 3:08:07 PM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=null path=null params={event=firstSearcherq=static+firstSearcher+warming+in+solrconfig.xmldistrib=false} status=500 QTime=4 Mar 24, 2013 3:08:07 PM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener done. Mar 24, 2013 3:08:07 PM org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener newSearcher INFO: Loading spell
Re: OutOfMemoryError
I changed my system memory to 12GB. Solr now gets -Xms2048m -Xmx8192m as parameters. I also added -XX:+UseG1GC to the java process. But now the whole machine crashes! Any idea why? Mar 22 20:30:01 solr01-gs kernel: [716098.077809] java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0 Mar 22 20:30:01 solr01-gs kernel: [716098.077962] java cpuset=/ mems_allowed=0 Mar 22 20:30:01 solr01-gs kernel: [716098.078019] Pid: 29339, comm: java Not tainted 2.6.32-5-amd64 #1 Mar 22 20:30:01 solr01-gs kernel: [716098.078095] Call Trace: Mar 22 20:30:01 solr01-gs kernel: [716098.078155] [810b6324] ? oom_kill_process+0x7f/0x23f Mar 22 20:30:01 solr01-gs kernel: [716098.078233] [810b6848] ? __out_of_memory+0x12a/0x141 Mar 22 20:30:01 solr01-gs kernel: [716098.078309] [810b699f] ? out_of_memory+0x140/0x172 Mar 22 20:30:01 solr01-gs kernel: [716098.078385] [810ba704] ? __alloc_pages_nodemask+0x4ec/0x5fc Mar 22 20:30:01 solr01-gs kernel: [716098.078469] [812fb47a] ? io_schedule+0x93/0xb7 Mar 22 20:30:01 solr01-gs kernel: [716098.078541] [810bbc69] ? __do_page_cache_readahead+0x9b/0x1b4 Mar 22 20:30:01 solr01-gs kernel: [716098.078626] [81064fc0] ? wake_bit_function+0x0/0x23 Mar 22 20:30:01 solr01-gs kernel: [716098.078702] [810bbd9e] ? ra_submit+0x1c/0x20 Mar 22 20:30:01 solr01-gs kernel: [716098.078773] [810b4a72] ? filemap_fault+0x17d/0x2f6 Mar 22 20:30:01 solr01-gs kernel: [716098.078849] [810ca9e2] ? __do_fault+0x54/0x3c3 Mar 22 20:30:01 solr01-gs kernel: [716098.078921] [810ccd36] ? handle_mm_fault+0x3b8/0x80f Mar 22 20:30:01 solr01-gs kernel: [716098.078999] [8101166e] ? apic_timer_interrupt+0xe/0x20 Mar 22 20:30:01 solr01-gs kernel: [716098.079078] [812febf6] ? do_page_fault+0x2e0/0x2fc Mar 22 20:30:01 solr01-gs kernel: [716098.079153] [812fca95] ? page_fault+0x25/0x30 Mar 22 20:30:01 solr01-gs kernel: [716098.079222] Mem-Info: Mar 22 20:30:01 solr01-gs kernel: [716098.079261] Node 0 DMA per-cpu: Mar 22 20:30:01 solr01-gs kernel: [716098.079310] CPU0: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079374] CPU1: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079439] CPU2: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079527] CPU3: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079591] Node 0 DMA32 per-cpu: Mar 22 20:30:01 solr01-gs kernel: [716098.079642] CPU0: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079706] CPU1: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079770] CPU2: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079834] CPU3: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079899] Node 0 Normal per-cpu: Mar 22 20:30:01 solr01-gs kernel: [716098.079951] CPU0: hi: 186, btch: 31 usd: 17 Mar 22 20:30:01 solr01-gs kernel: [716098.080015] CPU1: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.080079] CPU2: hi: 186, btch: 31 usd: 2 Mar 22 20:30:01 solr01-gs kernel: [716098.080142] CPU3: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.080209] active_anon:2638016 inactive_anon:388557 isolated_anon:0 Mar 22 20:30:01 solr01-gs kernel: [716098.080209] active_file:68 inactive_file:236 isolated_file:0 Mar 22 20:30:01 solr01-gs kernel: [716098.080210] unevictable:0 dirty:5 writeback:5 unstable:0 Mar 22 20:30:01 solr01-gs kernel: [716098.080211] free:16573 slab_reclaimable:2398 slab_unreclaimable:2335 Mar 22 20:30:01 solr01-gs kernel: [716098.080212] mapped:36 shmem:0 pagetables:24750 bounce:0 Mar 22 20:30:01 solr01-gs kernel: [716098.080575] Node 0 DMA free:15796kB min:16kB low:20kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15244kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Mar 22 20:30:01 solr01-gs kernel: [716098.081041] lowmem_reserve[]: 0 3000 12090 12090 Mar 22 20:30:01 solr01-gs kernel: [716098.081110] Node 0 DMA32 free:39824kB min:3488kB low:4360kB high:5232kB active_anon:2285240kB inactive_anon:520624kB active_file:0kB inactive_file:188kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072096kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4152kB slab_unreclaimable:1640kB kernel_stack:1104kB pagetables:31100kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:89 all_unreclaimable? no Mar 22 20:30:01 solr01-gs kernel: [716098.081600] lowmem_reserve[]: 0 0 9090 9090 Mar 22 20:30:01 solr01-gs kernel: [716098.081664] Node 0 Normal free:10672kB min:10572kB
SOLR - Unable to execute query error - DIH
Hello All, I am trying to index data from SQL Server view to the SOLR using the DIH with full-import command. The view has 750K rows and 427 columns. During the first execution i indexed only the first 50 rows of the view, the data got indexed in 10 min. But, when i executed the same scenario to index the complete set of 750K rows, the execution continued for 2 days and roll-backed, giving me the following error: Unable to execute the query: select * from. Following is my DIH configuration file, dataConfig dataSource type=JdbcDataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://server1\sql2012;databaseName=DBName user=x password=x / document name=Search batchsize=1 entity name=Search query=select top 500 * from view field column=ID name=Id / As suggested in some of the posts, i did try with batchsize=-1, but dint work out. Please suggest is this the correct approach or any parameter needs to be modified for tuning. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Unable-to-execute-query-error-DIH-tp4051028.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR - Unable to execute query error - DIH
In context of the above scenario, when i try to index set of 500 rows, it fetches and indexes around 400 odd rows and then it shows no progress and keeps on executing. What can be the possible cause of this issue? If possible, please do share if you guys have gone through such scenario with the respective details. -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Unable-to-execute-query-error-DIH-tp4051028p4051034.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [ANNOUNCE] Solr wiki editing change
On 3/25/13 4:18 AM, Steve Rowe wrote: The wiki at http://wiki.apache.org/solr/ has come under attack by spammers more frequently of late, so the PMC has decided to lock it down in an attempt to reduce the work involved in tracking and removing spam. From now on, only people who appear on http://wiki.apache.org/solr/ContributorsGroup will be able to create/modify/delete wiki pages. Please request either on the solr-user@lucene.apache.org or on d...@lucene.apache.org to have your wiki username added to the ContributorsGroup page - this is a one-time step. Please add AndrzejBialecki to this group. Thank you! -- Best regards, Andrzej Bialecki http://www.sigram.com, blog http://www.sigram.com/blog ___.,___,___,___,_._. __ [___||.__|__/|__||\/|: Information Retrieval, System Integration ___|||__||..\|..||..|: Contact: info at sigram dot com
storing key value pair in multivalued field solr4.0
Hi , i am using solr4.0.i want to store key value pairs of attributes in mutlivalued field. Example i have some documents (Products) which have attributes as one field and i indexed attributes as separate documents to power auto suggest . now in some auto suggest i have to show facet count of products also . for this i am using solr joins 4.0 and faceting on attributes. here i want to get the name and id of attributes. how i can achieve this? The Query is looks like below localhost:8980/solr/searchapp/select?q=%7B!join+from=attr_id+to=prod_attr_id%7Dterms:redwt=jsonindent=truefacet.field=prod_attr_idfacet=truerows=1000fl=product_name,product_id Thanks in advance !
Re: Very slow query when boosting involve with EnternalFileField
Floyd, I think you need provide stack trace or draft sampling. On Fri, Mar 22, 2013 at 6:23 AM, Floyd Wu floyd...@gmail.com wrote: Anybody can point me a direction? Many thanks. 2013/3/20 Floyd Wu floyd...@gmail.com Hi everyone, I have a problem and have no luck to figure out. When I issue a query to Query 1 http://localhost:8983/solr/select?q={!boost+b=recip(ms(NOW/HOUR,last_modified_datetime),3.16e-11,1,1)}all http://localhost:8983/solr/select?q=%7B!boost+b=recip(ms(NOW/HOUR,last_modified_datetime),3.16e-11,1,1)%7Dall :javastart=0rows=10fl=score,authorsort=score+desc Query 2 http://localhost:8983/solr/select?q={!boost+b=sum(ranking,recip(ms(NOW/HOUR,last_modified_datetime)),3.16e-11,1,1)}all http://localhost:8983/solr/select?q=%7B!boost+b=sum(ranking,recip(ms(NOW/HOUR,last_modified_datetime)),3.16e-11,1,1)%7Dall :javastart=0rows=10fl=score,authorsort=score+desc The difference between two query is boost. The boost function of Query 2 using a field named ranking and this field is ExternalFileField. External file is key=value pair about 1 lines. Execution time Query 1--100ms Query 2--2300ms I tried to issue Query 3 and change ranking to a constant 1 http://localhost:8983/solr/select?q={!boost+b=sum(1,recip(ms(NOW/HOUR,last_modified_datetime)),3.16e-11,1,1)}all http://localhost:8983/solr/select?q=%7B!boost+b=sum(1,recip(ms(NOW/HOUR,last_modified_datetime)),3.16e-11,1,1)%7Dall :javastart=0rows=10fl=score,authorsort=score+desc Execution time Query 3--110ms one thing I can sure that involved with externalFileField will slow down query execution time significantly. But I have no idea how to solve this problem as my boost function must calculate value of ranking field. Please help on this. PS: I'm using SOLR-4.1 Floyd -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Undefined field problem.
Hi, I recently added a new field (toptipp) to an existing solr schema.xml and it worked just fine. Subsequently I added to more fields (active_cruises and non_grata) to the schema and now I get this error: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime6/int/lstlst name=errorstr name=msgundefined field: active_cruise/strint name=code400/int/lst /response My solr db is populated via a program that creates and uploads a csv file. When I view the csv file, the field active_cruises (given as undefined above), is populated correctly. As far as I can tell, when I added the final fields to the schema, I did exactly the same as when I added toptipp. I updated schema.xml and restarted solr (java -jar start.jar). I am really at a loss here. Can someone please help with the answer or by pointing me in the right direction? Naturally I'd be happy to provide further info if needed. Thanks MK
Re: Undefined field problem.
Further to the prev msg: Here's an extract from my current schema.xml: field name=show_en type=boolean indexed=true stored=false required=true / field name=active_cruise type=boolean indexed=true stored=true/ field name=non_grata type=boolean indexed=true stored=true/ field name=toptipp type=int indexed=true stored=true/ The original schema.xml had the last 3 fields in the order toptipp, active_cruise and non_grata. Active_cruise and non_grata were also defined as type=int. I changed the order and field types in my attempts to fix the error. On 25 March 2013 11:21, Mid Night mid...@gmail.com wrote: Hi, I recently added a new field (toptipp) to an existing solr schema.xml and it worked just fine. Subsequently I added to more fields (active_cruises and non_grata) to the schema and now I get this error: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime6/int/lstlst name=errorstr name=msgundefined field: active_cruise/strint name=code400/int/lst /response My solr db is populated via a program that creates and uploads a csv file. When I view the csv file, the field active_cruises (given as undefined above), is populated correctly. As far as I can tell, when I added the final fields to the schema, I did exactly the same as when I added toptipp. I updated schema.xml and restarted solr (java -jar start.jar). I am really at a loss here. Can someone please help with the answer or by pointing me in the right direction? Naturally I'd be happy to provide further info if needed. Thanks MK
Re: OutOfMemoryError
Is sombody using the UseG1GC garbage collector with Solr and Tomcat 7? Any extra options needed? Thanks... On 03/25/2013 08:34 AM, Arkadi Colson wrote: I changed my system memory to 12GB. Solr now gets -Xms2048m -Xmx8192m as parameters. I also added -XX:+UseG1GC to the java process. But now the whole machine crashes! Any idea why? Mar 22 20:30:01 solr01-gs kernel: [716098.077809] java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0 Mar 22 20:30:01 solr01-gs kernel: [716098.077962] java cpuset=/ mems_allowed=0 Mar 22 20:30:01 solr01-gs kernel: [716098.078019] Pid: 29339, comm: java Not tainted 2.6.32-5-amd64 #1 Mar 22 20:30:01 solr01-gs kernel: [716098.078095] Call Trace: Mar 22 20:30:01 solr01-gs kernel: [716098.078155] [810b6324] ? oom_kill_process+0x7f/0x23f Mar 22 20:30:01 solr01-gs kernel: [716098.078233] [810b6848] ? __out_of_memory+0x12a/0x141 Mar 22 20:30:01 solr01-gs kernel: [716098.078309] [810b699f] ? out_of_memory+0x140/0x172 Mar 22 20:30:01 solr01-gs kernel: [716098.078385] [810ba704] ? __alloc_pages_nodemask+0x4ec/0x5fc Mar 22 20:30:01 solr01-gs kernel: [716098.078469] [812fb47a] ? io_schedule+0x93/0xb7 Mar 22 20:30:01 solr01-gs kernel: [716098.078541] [810bbc69] ? __do_page_cache_readahead+0x9b/0x1b4 Mar 22 20:30:01 solr01-gs kernel: [716098.078626] [81064fc0] ? wake_bit_function+0x0/0x23 Mar 22 20:30:01 solr01-gs kernel: [716098.078702] [810bbd9e] ? ra_submit+0x1c/0x20 Mar 22 20:30:01 solr01-gs kernel: [716098.078773] [810b4a72] ? filemap_fault+0x17d/0x2f6 Mar 22 20:30:01 solr01-gs kernel: [716098.078849] [810ca9e2] ? __do_fault+0x54/0x3c3 Mar 22 20:30:01 solr01-gs kernel: [716098.078921] [810ccd36] ? handle_mm_fault+0x3b8/0x80f Mar 22 20:30:01 solr01-gs kernel: [716098.078999] [8101166e] ? apic_timer_interrupt+0xe/0x20 Mar 22 20:30:01 solr01-gs kernel: [716098.079078] [812febf6] ? do_page_fault+0x2e0/0x2fc Mar 22 20:30:01 solr01-gs kernel: [716098.079153] [812fca95] ? page_fault+0x25/0x30 Mar 22 20:30:01 solr01-gs kernel: [716098.079222] Mem-Info: Mar 22 20:30:01 solr01-gs kernel: [716098.079261] Node 0 DMA per-cpu: Mar 22 20:30:01 solr01-gs kernel: [716098.079310] CPU0: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079374] CPU1: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079439] CPU2: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079527] CPU3: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079591] Node 0 DMA32 per-cpu: Mar 22 20:30:01 solr01-gs kernel: [716098.079642] CPU0: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079706] CPU1: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079770] CPU2: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079834] CPU3: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079899] Node 0 Normal per-cpu: Mar 22 20:30:01 solr01-gs kernel: [716098.079951] CPU0: hi: 186, btch: 31 usd: 17 Mar 22 20:30:01 solr01-gs kernel: [716098.080015] CPU1: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.080079] CPU2: hi: 186, btch: 31 usd: 2 Mar 22 20:30:01 solr01-gs kernel: [716098.080142] CPU3: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.080209] active_anon:2638016 inactive_anon:388557 isolated_anon:0 Mar 22 20:30:01 solr01-gs kernel: [716098.080209] active_file:68 inactive_file:236 isolated_file:0 Mar 22 20:30:01 solr01-gs kernel: [716098.080210] unevictable:0 dirty:5 writeback:5 unstable:0 Mar 22 20:30:01 solr01-gs kernel: [716098.080211] free:16573 slab_reclaimable:2398 slab_unreclaimable:2335 Mar 22 20:30:01 solr01-gs kernel: [716098.080212] mapped:36 shmem:0 pagetables:24750 bounce:0 Mar 22 20:30:01 solr01-gs kernel: [716098.080575] Node 0 DMA free:15796kB min:16kB low:20kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15244kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Mar 22 20:30:01 solr01-gs kernel: [716098.081041] lowmem_reserve[]: 0 3000 12090 12090 Mar 22 20:30:01 solr01-gs kernel: [716098.081110] Node 0 DMA32 free:39824kB min:3488kB low:4360kB high:5232kB active_anon:2285240kB inactive_anon:520624kB active_file:0kB inactive_file:188kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072096kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4152kB slab_unreclaimable:1640kB kernel_stack:1104kB pagetables:31100kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:89 all_unreclaimable? no Mar 22 20:30:01
Re: OutOfMemoryError
The of UseG1GC yes, but with Solr 4.x, Jetty 8.1.8 and Java HotSpot(TM) 64-Bit Server VM (1.7.0_07). os.arch: amd64 os.name: Linux os.version: 2.6.32.13-0.5-xen Only args are -XX:+UseG1GC -Xms16g -Xmx16g. Monitoring shows that 16g is a bit high, I might reduce it to 10g or 12g for the slaves. Start is at 5g, runtime is between 6 and 8g with some peaks to 9.5g. Single index, 130GByte, 43.5 mio. dokuments. Regards, Bernd Am 25.03.2013 11:55, schrieb Arkadi Colson: Is sombody using the UseG1GC garbage collector with Solr and Tomcat 7? Any extra options needed? Thanks... On 03/25/2013 08:34 AM, Arkadi Colson wrote: I changed my system memory to 12GB. Solr now gets -Xms2048m -Xmx8192m as parameters. I also added -XX:+UseG1GC to the java process. But now the whole machine crashes! Any idea why? Mar 22 20:30:01 solr01-gs kernel: [716098.077809] java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0 Mar 22 20:30:01 solr01-gs kernel: [716098.077962] java cpuset=/ mems_allowed=0 Mar 22 20:30:01 solr01-gs kernel: [716098.078019] Pid: 29339, comm: java Not tainted 2.6.32-5-amd64 #1 Mar 22 20:30:01 solr01-gs kernel: [716098.078095] Call Trace: Mar 22 20:30:01 solr01-gs kernel: [716098.078155] [810b6324] ? oom_kill_process+0x7f/0x23f Mar 22 20:30:01 solr01-gs kernel: [716098.078233] [810b6848] ? __out_of_memory+0x12a/0x141 Mar 22 20:30:01 solr01-gs kernel: [716098.078309] [810b699f] ? out_of_memory+0x140/0x172 Mar 22 20:30:01 solr01-gs kernel: [716098.078385] [810ba704] ? __alloc_pages_nodemask+0x4ec/0x5fc Mar 22 20:30:01 solr01-gs kernel: [716098.078469] [812fb47a] ? io_schedule+0x93/0xb7 Mar 22 20:30:01 solr01-gs kernel: [716098.078541] [810bbc69] ? __do_page_cache_readahead+0x9b/0x1b4 Mar 22 20:30:01 solr01-gs kernel: [716098.078626] [81064fc0] ? wake_bit_function+0x0/0x23 Mar 22 20:30:01 solr01-gs kernel: [716098.078702] [810bbd9e] ? ra_submit+0x1c/0x20 Mar 22 20:30:01 solr01-gs kernel: [716098.078773] [810b4a72] ? filemap_fault+0x17d/0x2f6 Mar 22 20:30:01 solr01-gs kernel: [716098.078849] [810ca9e2] ? __do_fault+0x54/0x3c3 Mar 22 20:30:01 solr01-gs kernel: [716098.078921] [810ccd36] ? handle_mm_fault+0x3b8/0x80f Mar 22 20:30:01 solr01-gs kernel: [716098.078999] [8101166e] ? apic_timer_interrupt+0xe/0x20 Mar 22 20:30:01 solr01-gs kernel: [716098.079078] [812febf6] ? do_page_fault+0x2e0/0x2fc Mar 22 20:30:01 solr01-gs kernel: [716098.079153] [812fca95] ? page_fault+0x25/0x30 Mar 22 20:30:01 solr01-gs kernel: [716098.079222] Mem-Info: Mar 22 20:30:01 solr01-gs kernel: [716098.079261] Node 0 DMA per-cpu: Mar 22 20:30:01 solr01-gs kernel: [716098.079310] CPU0: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079374] CPU1: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079439] CPU2: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079527] CPU3: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079591] Node 0 DMA32 per-cpu: Mar 22 20:30:01 solr01-gs kernel: [716098.079642] CPU0: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079706] CPU1: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079770] CPU2: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079834] CPU3: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079899] Node 0 Normal per-cpu: Mar 22 20:30:01 solr01-gs kernel: [716098.079951] CPU0: hi: 186, btch: 31 usd: 17 Mar 22 20:30:01 solr01-gs kernel: [716098.080015] CPU1: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.080079] CPU2: hi: 186, btch: 31 usd: 2 Mar 22 20:30:01 solr01-gs kernel: [716098.080142] CPU3: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.080209] active_anon:2638016 inactive_anon:388557 isolated_anon:0 Mar 22 20:30:01 solr01-gs kernel: [716098.080209] active_file:68 inactive_file:236 isolated_file:0 Mar 22 20:30:01 solr01-gs kernel: [716098.080210] unevictable:0 dirty:5 writeback:5 unstable:0 Mar 22 20:30:01 solr01-gs kernel: [716098.080211] free:16573 slab_reclaimable:2398 slab_unreclaimable:2335 Mar 22 20:30:01 solr01-gs kernel: [716098.080212] mapped:36 shmem:0 pagetables:24750 bounce:0 Mar 22 20:30:01 solr01-gs kernel: [716098.080575] Node 0 DMA free:15796kB min:16kB low:20kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15244kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Mar 22 20:30:01 solr01-gs kernel: [716098.081041]
Re: Tlog File not removed after hard commit
My understanding is that logs stick around for a while just in case they can be used to catch up a shard that rejoins the cluster. On Mar 24, 2013 12:03 PM, Niran Fajemisin afa...@yahoo.com wrote: Hi all, We import about 1.5 million documents on a nightly basis using DIH. During this time, we need to ensure that all documents make it into index otherwise rollback on any errors; which DIH takes care of for us. We also disable autoCommit in DIH but instruct it to commit at the very end of the import. This is all done through configuration of the DIH config XML file and the command issued to the request handler. We have noticed that the tlog file appears to linger around even after DIH has issued the hard commit. My expectation would be that after the hard commit has occurred, the tlog file will be removed. I'm obviously misunderstanding how this all works. Can someone please help me understand how this is meant to function? Thanks! -Niran
Retriving results based on SOLR query data.
Hi Team, I want to overcome a sort issue here.. sort feature works fine. I have indexed few documents in SOLR.. which have a unique document ID. Now when I retrieve result's from SOLR results comes automatically sorted. However I would like to fetch results based on the sequence I mention in my SOLR query. http://hostname:8080/SOLR/browse?q=documentID:D12133 OR documentID:D14423 OR documentID:D912 I want results in same order... D12133 D14423 D912 Regards, Atul -- View this message in context: http://lucene.472066.n3.nabble.com/Retriving-results-based-on-SOLR-query-data-tp4051076.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [ANNOUNCE] Solr wiki editing change
On Mar 25, 2013, at 3:30 AM, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: Can you add me to? We have a few pages which we maintain (search results clustering related). My wiki user is DawidWeiss Added to AdminGroup. On Mar 25, 2013, at 5:11 AM, Andrzej Bialecki a...@getopt.org wrote: Please add AndrzejBialecki to this group. Thank you! Added to AdminGroup. On Mar 25, 2013, at 5:48 AM, xie kidd xiezh...@gmail.com wrote: Please add adderllyer to this group. Thank you! Added to ContributorsGroup.
Re: Timeout occured while waiting response from server
A timeout like this _probably_ means your docs were indexed just fine. I'm curious why adding the docs takes so long, how many docs are you sending at a time? Best Erick On Thu, Mar 21, 2013 at 1:31 PM, Benjamin, Roy rbenja...@ebay.com wrote: I'm calling: m_server.add(docs, 12); Wondering if the timeout that expires was the one set when the server was created? m_server = new HttpSolrServer(serverUrl); m_server.setRequestWriter(new BinaryRequestWriter()); m_server.setConnectionTimeout(3); m_server.setSoTimeout(1); Also, does the exception always mean the docs were not added? Thanks Roy Solr 3.6 2013-03-21 10:21:32,487 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2078: Caught error from UDF: checkout.regexudf.SolrAccumulator [org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://10.94.238.86:8080/solr]
Re: Solr 4.2.0 results links
Solr doesn't do anything with links natively, it just echoes back what you put in. So you're sending file-based http links to Solr... Best Erick On Thu, Mar 21, 2013 at 1:40 PM, zeroeffect g.paul.r...@gmail.com wrote: While I am still in the beginning phase of solr I have been able to index a directory of HTML files. I can search keywords and get results. The problem I am having is the links to the HTML document is file based and http based. I get the link but it points to file:\\ and not http:\\. I have been looking for where to set this information. My setup is exporting database information to individual HTML files then FTP them to the solr server and have them indexed and accessed on our intranet. Thank you for your guidance. ZeroEffect -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-2-0-results-links-tp4049788.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: OutOfMemoryError
Thanks for the info! I just upgraded java from 6 to 7... How exactly do you monitor the memory usage and the affect of the garbage collector? On 03/25/2013 01:18 PM, Bernd Fehling wrote: The of UseG1GC yes, but with Solr 4.x, Jetty 8.1.8 and Java HotSpot(TM) 64-Bit Server VM (1.7.0_07). os.arch: amd64 os.name: Linux os.version: 2.6.32.13-0.5-xen Only args are -XX:+UseG1GC -Xms16g -Xmx16g. Monitoring shows that 16g is a bit high, I might reduce it to 10g or 12g for the slaves. Start is at 5g, runtime is between 6 and 8g with some peaks to 9.5g. Single index, 130GByte, 43.5 mio. dokuments. Regards, Bernd Am 25.03.2013 11:55, schrieb Arkadi Colson: Is sombody using the UseG1GC garbage collector with Solr and Tomcat 7? Any extra options needed? Thanks... On 03/25/2013 08:34 AM, Arkadi Colson wrote: I changed my system memory to 12GB. Solr now gets -Xms2048m -Xmx8192m as parameters. I also added -XX:+UseG1GC to the java process. But now the whole machine crashes! Any idea why? Mar 22 20:30:01 solr01-gs kernel: [716098.077809] java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0 Mar 22 20:30:01 solr01-gs kernel: [716098.077962] java cpuset=/ mems_allowed=0 Mar 22 20:30:01 solr01-gs kernel: [716098.078019] Pid: 29339, comm: java Not tainted 2.6.32-5-amd64 #1 Mar 22 20:30:01 solr01-gs kernel: [716098.078095] Call Trace: Mar 22 20:30:01 solr01-gs kernel: [716098.078155] [810b6324] ? oom_kill_process+0x7f/0x23f Mar 22 20:30:01 solr01-gs kernel: [716098.078233] [810b6848] ? __out_of_memory+0x12a/0x141 Mar 22 20:30:01 solr01-gs kernel: [716098.078309] [810b699f] ? out_of_memory+0x140/0x172 Mar 22 20:30:01 solr01-gs kernel: [716098.078385] [810ba704] ? __alloc_pages_nodemask+0x4ec/0x5fc Mar 22 20:30:01 solr01-gs kernel: [716098.078469] [812fb47a] ? io_schedule+0x93/0xb7 Mar 22 20:30:01 solr01-gs kernel: [716098.078541] [810bbc69] ? __do_page_cache_readahead+0x9b/0x1b4 Mar 22 20:30:01 solr01-gs kernel: [716098.078626] [81064fc0] ? wake_bit_function+0x0/0x23 Mar 22 20:30:01 solr01-gs kernel: [716098.078702] [810bbd9e] ? ra_submit+0x1c/0x20 Mar 22 20:30:01 solr01-gs kernel: [716098.078773] [810b4a72] ? filemap_fault+0x17d/0x2f6 Mar 22 20:30:01 solr01-gs kernel: [716098.078849] [810ca9e2] ? __do_fault+0x54/0x3c3 Mar 22 20:30:01 solr01-gs kernel: [716098.078921] [810ccd36] ? handle_mm_fault+0x3b8/0x80f Mar 22 20:30:01 solr01-gs kernel: [716098.078999] [8101166e] ? apic_timer_interrupt+0xe/0x20 Mar 22 20:30:01 solr01-gs kernel: [716098.079078] [812febf6] ? do_page_fault+0x2e0/0x2fc Mar 22 20:30:01 solr01-gs kernel: [716098.079153] [812fca95] ? page_fault+0x25/0x30 Mar 22 20:30:01 solr01-gs kernel: [716098.079222] Mem-Info: Mar 22 20:30:01 solr01-gs kernel: [716098.079261] Node 0 DMA per-cpu: Mar 22 20:30:01 solr01-gs kernel: [716098.079310] CPU0: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079374] CPU1: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079439] CPU2: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079527] CPU3: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079591] Node 0 DMA32 per-cpu: Mar 22 20:30:01 solr01-gs kernel: [716098.079642] CPU0: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079706] CPU1: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079770] CPU2: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079834] CPU3: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079899] Node 0 Normal per-cpu: Mar 22 20:30:01 solr01-gs kernel: [716098.079951] CPU0: hi: 186, btch: 31 usd: 17 Mar 22 20:30:01 solr01-gs kernel: [716098.080015] CPU1: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.080079] CPU2: hi: 186, btch: 31 usd: 2 Mar 22 20:30:01 solr01-gs kernel: [716098.080142] CPU3: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.080209] active_anon:2638016 inactive_anon:388557 isolated_anon:0 Mar 22 20:30:01 solr01-gs kernel: [716098.080209] active_file:68 inactive_file:236 isolated_file:0 Mar 22 20:30:01 solr01-gs kernel: [716098.080210] unevictable:0 dirty:5 writeback:5 unstable:0 Mar 22 20:30:01 solr01-gs kernel: [716098.080211] free:16573 slab_reclaimable:2398 slab_unreclaimable:2335 Mar 22 20:30:01 solr01-gs kernel: [716098.080212] mapped:36 shmem:0 pagetables:24750 bounce:0 Mar 22 20:30:01 solr01-gs kernel: [716098.080575] Node 0 DMA free:15796kB min:16kB low:20kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15244kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB
Re: How can I compile and debug Solr from source code?
Furkan: Stop. back up, you're making it too complicated. Follow Erik's instructions. The ant example just compiles all of Solr, just like the distribution. Then you can go into the example directory and change it to look just like whatever you want, change the schema, change the solrconfig, add custom components, etc. There's no difference between that and the distro. It _is_ the distro just in a convenient form for running in Jetty. So you create some custom code (say a filter or whatever). You put the path to it in your solroconfig in a lib.../ directive. In fact I usually path the lib directive out to wherever the code gets built by my IDE for debugging purposes, then I don't have to copy the jar around. I can then set breakpoints in my custom code. I can debug Solr as well. It's just way cool. About the only thing I'd add to Hatchers instructions is the possibility of specifying suspend=y rather than suspend=n, and that's just if I want to debug Solr startup code. BTW, IntelliJ has, under the edit configurations section a remote option that guides you through the flags etc that Erik pointed out. Eclipse has similar but I use IntelliJ. Best Erick On Thu, Mar 21, 2013 at 8:00 PM, Furkan KAMACI furkankam...@gmail.comwrote: Ok I run that and see that there is a .war file at /lucene-solr/solr/dist Do you know that how can I run that ant phase from Intellij without command line (there are many phases under Ant build window) On the other hand within Intellij Idea how can I auto deploy it into Tomcat. All in all I will edit configurations and it will run that ant command and deploy it to Tomcat itself? 2013/3/22 Steve Rowe sar...@gmail.com Perhaps you didn't see what I wrote earlier?: Sounds like you want 'ant dist', which will create the .war and put it into the solr/dist/ directory: PROMPT$ ant dist Steve On Mar 21, 2013, at 7:38 PM, Furkan KAMACI furkankam...@gmail.com wrote: I mean I need that: There is a .war file shipped with Solr source code. How can I regenerate (build my code and generate a .war file) as like that? I will deploy it to Tomcat then? 2013/3/22 Furkan KAMACI furkankam...@gmail.com Your mentioned suggestion is for only example application? Can I imply it to just pure Solr (I don't want to generate example application because my aim is not just debugging Solr, I want to extend it and I will debug that extended code)? 2013/3/22 Alexandre Rafalovitch arafa...@gmail.com That's nice. Can we put that on a Wiki? Or as a quick screencast? Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Mar 21, 2013 at 5:42 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Here's my development/debug workflow: - ant idea at the top-level to generate the IntelliJ project - cd solr; ant example - to build the full example - cd example; java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005 -jar start.jar - to launch Jetty+Solr in debug mode - set breakpoints in IntelliJ, set up a Remote run option (localhost:5005) in IntelliJ and debug pleasantly All the unit tests in Solr run very nicely in IntelliJ too, and for tight development loops, I spend my time doing that instead of running full on Solr. Erik On Mar 21, 2013, at 05:56 , Furkan KAMACI wrote: I use Intellij Idea 12 and Solr 4.1 on a Centos 6.4 64 bit computer. I have opened Solr source code at Intellij IDEA as explained documentation. I want to deploy Solr into Tomcat 7. When I open the project there are configurations set previosly (I used ant idea command before I open the project) . However they are all test configurations and some of them are not passed test (this is another issue, no need to go detail at this e-mail). I have added a Tomcat Local configuration into configurations but I don't know which one is the main method of Solr and is there any documentation that explains code. i.e. I want to debug a point what Solr receives from when I say -index from nutch and what Solr does? I tried somehing to run code (I don't think I could generate a .war or an exploded folder) an this is the error that I get:(I did't point any artifact for edit configurations) Error: Exception thrown by the agent : java.net.MalformedURLException: Local host name unknown: java.net.UnknownHostException: me.local: me.local: Name or service not known (me.local is the name I set when I install Centos 6.4 on my computer) Any ideas how to run source code will be nice for me.
Re: Continue to the next record
This has been a long-standing issue with updates, several attempts have been started to change the behavior, but they haven't gotten off the ground. Your options are to send one record at a time, or have error-handling logic that, say, transmits the docs one at a time whenever a packet fails. Best Erick On Thu, Mar 21, 2013 at 9:21 PM, randolf.julian randolf.jul...@dominionenterprises.com wrote: I have an XML file that has several documents in it. For example: add doc field name=id1/field field name=name update=setMyName1/field /doc doc field name=id2/field field name=name update=setMyName2/field /doc doc field name=id3/field field name=name update=setMyName3/field /doc /add I upload the data using SOLR's post.sh script. For some reason, document 2 failed and it cause the post.sh script to stop. How can I make it continue to the next document (3) even if it fails on 2? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Continue-to-the-next-record-tp4049920.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr using a ridiculous amount of memory
I apologize for the slow reply. Today has been killer. I will reply to everyone as soon as I get the time. I am having difficulties understanding how docValues work. Should I only add docValues to the fields that I actually use for sorting and faceting or on all fields? Will the docValues magic apply to the fields i activate docValues on or on the entire document when sorting/faceting on a field that has docValues activated? I'm not even sure which question to ask. I am struggling to understand this on a conceptual level. On Sun, Mar 24, 2013 at 7:11 PM, Robert Muir rcm...@gmail.com wrote: On Sun, Mar 24, 2013 at 4:19 AM, John Nielsen j...@mcb.dk wrote: Schema with DocValues attempt at solving problem: http://pastebin.com/Ne23NnW4 Config: http://pastebin.com/x1qykyXW This schema isn't using docvalues, due to a typo in your config. it should not be DocValues=true but docValues=true. Are you not getting an error? Solr needs to throw exception if you provide invalid attributes to the field. Nothing is more frustrating than having a typo or something in your configuration and solr just ignores this, reports no error, and doesnt work the way you want. I'll look into this (I already intend to add these checks to analysis factories for the same reason). Separately, if you really want the terms data and so on to remain on disk, it is not enough to just enable docvalues for the field. The default implementation uses the heap. So if you want that, you need to set docValuesFormat=Disk on the fieldtype. This will keep the majority of the data on disk, and only some key datastructures in heap memory. This might have significant performance impact depending upon what you are doing so you need to test that. -- Med venlig hilsen / Best regards *John Nielsen* Programmer *MCB A/S* Enghaven 15 DK-7500 Holstebro Kundeservice: +45 9610 2824 p...@mcb.dk www.mcb.dk
RE: SOLR - Unable to execute query error - DIH
With MS SqlServer, try adding selectMethod=cursor to your conenction string and set your batch size to a reasonable amount (possibly just omit it and DIH has a default value it will use.) James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: kobe.free.wo...@gmail.com [mailto:kobe.free.wo...@gmail.com] Sent: Monday, March 25, 2013 3:25 AM To: solr-user@lucene.apache.org Subject: SOLR - Unable to execute query error - DIH Hello All, I am trying to index data from SQL Server view to the SOLR using the DIH with full-import command. The view has 750K rows and 427 columns. During the first execution i indexed only the first 50 rows of the view, the data got indexed in 10 min. But, when i executed the same scenario to index the complete set of 750K rows, the execution continued for 2 days and roll-backed, giving me the following error: Unable to execute the query: select * from. Following is my DIH configuration file, dataConfig dataSource type=JdbcDataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://server1\sql2012;databaseName=DBName user=x password=x / document name=Search batchsize=1 entity name=Search query=select top 500 * from view field column=ID name=Id / As suggested in some of the posts, i did try with batchsize=-1, but dint work out. Please suggest is this the correct approach or any parameter needs to be modified for tuning. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Unable-to-execute-query-error-DIH-tp4051028.html Sent from the Solr - User mailing list archive at Nabble.com.
Contributors Group
Hello, Can I be added to the contributors group? Username sswoboda. Thank you. Swati
Re: Contributors Group
On Mar 25, 2013, at 10:32 AM, Swati Swoboda sswob...@igloosoftware.com wrote: Can I be added to the contributors group? Username sswoboda. Added to solr ContributorsGroup.
Re: Continue to the next record
Erick, Thanks for the info. That's also what I had in mind and that's what I did since I can't find anything on the web regarding this issue. Randolf -- View this message in context: http://lucene.472066.n3.nabble.com/Continue-to-the-next-record-tp4049920p4051113.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 4 automatic DB updates for sync using Delta query DIH with scheduler
Hi, Please let me know how to get the db changes reflected into my solr index,Iam using Solr4 with DIH and delta query with scheduler in dataimport scheduler properties.Ultimately i want my DB to be in sync with solr Everything is all set and working except Every time i modify the data in the DB column my scheduler automatically creates new index to the solr,I therefore get two values with different _version_.What iam looking is the index get updated as and when the DB colums are updated.Kindly assist... with regards majied -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-automatic-DB-updates-for-sync-using-Delta-query-DIH-with-scheduler-tp4051114.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: OutOfMemoryError
How can I see if GC is actually working? Is it written in the tomcat logs as well or will I only see it in the memory graphs? BR, Arkadi On 03/25/2013 03:50 PM, Bernd Fehling wrote: We use munin with jmx plugin for monitoring all server and Solr installations. (http://munin-monitoring.org/) Only for short time monitoring we also use jvisualvm delivered with Java SE JDK. Regards Bernd Am 25.03.2013 14:45, schrieb Arkadi Colson: Thanks for the info! I just upgraded java from 6 to 7... How exactly do you monitor the memory usage and the affect of the garbage collector? On 03/25/2013 01:18 PM, Bernd Fehling wrote: The of UseG1GC yes, but with Solr 4.x, Jetty 8.1.8 and Java HotSpot(TM) 64-Bit Server VM (1.7.0_07). os.arch: amd64 os.name: Linux os.version: 2.6.32.13-0.5-xen Only args are -XX:+UseG1GC -Xms16g -Xmx16g. Monitoring shows that 16g is a bit high, I might reduce it to 10g or 12g for the slaves. Start is at 5g, runtime is between 6 and 8g with some peaks to 9.5g. Single index, 130GByte, 43.5 mio. dokuments. Regards, Bernd Am 25.03.2013 11:55, schrieb Arkadi Colson: Is sombody using the UseG1GC garbage collector with Solr and Tomcat 7? Any extra options needed? Thanks... On 03/25/2013 08:34 AM, Arkadi Colson wrote: I changed my system memory to 12GB. Solr now gets -Xms2048m -Xmx8192m as parameters. I also added -XX:+UseG1GC to the java process. But now the whole machine crashes! Any idea why? Mar 22 20:30:01 solr01-gs kernel: [716098.077809] java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0 Mar 22 20:30:01 solr01-gs kernel: [716098.077962] java cpuset=/ mems_allowed=0 Mar 22 20:30:01 solr01-gs kernel: [716098.078019] Pid: 29339, comm: java Not tainted 2.6.32-5-amd64 #1 Mar 22 20:30:01 solr01-gs kernel: [716098.078095] Call Trace: Mar 22 20:30:01 solr01-gs kernel: [716098.078155] [810b6324] ? oom_kill_process+0x7f/0x23f Mar 22 20:30:01 solr01-gs kernel: [716098.078233] [810b6848] ? __out_of_memory+0x12a/0x141 Mar 22 20:30:01 solr01-gs kernel: [716098.078309] [810b699f] ? out_of_memory+0x140/0x172 Mar 22 20:30:01 solr01-gs kernel: [716098.078385] [810ba704] ? __alloc_pages_nodemask+0x4ec/0x5fc Mar 22 20:30:01 solr01-gs kernel: [716098.078469] [812fb47a] ? io_schedule+0x93/0xb7 Mar 22 20:30:01 solr01-gs kernel: [716098.078541] [810bbc69] ? __do_page_cache_readahead+0x9b/0x1b4 Mar 22 20:30:01 solr01-gs kernel: [716098.078626] [81064fc0] ? wake_bit_function+0x0/0x23 Mar 22 20:30:01 solr01-gs kernel: [716098.078702] [810bbd9e] ? ra_submit+0x1c/0x20 Mar 22 20:30:01 solr01-gs kernel: [716098.078773] [810b4a72] ? filemap_fault+0x17d/0x2f6 Mar 22 20:30:01 solr01-gs kernel: [716098.078849] [810ca9e2] ? __do_fault+0x54/0x3c3 Mar 22 20:30:01 solr01-gs kernel: [716098.078921] [810ccd36] ? handle_mm_fault+0x3b8/0x80f Mar 22 20:30:01 solr01-gs kernel: [716098.078999] [8101166e] ? apic_timer_interrupt+0xe/0x20 Mar 22 20:30:01 solr01-gs kernel: [716098.079078] [812febf6] ? do_page_fault+0x2e0/0x2fc Mar 22 20:30:01 solr01-gs kernel: [716098.079153] [812fca95] ? page_fault+0x25/0x30 Mar 22 20:30:01 solr01-gs kernel: [716098.079222] Mem-Info: Mar 22 20:30:01 solr01-gs kernel: [716098.079261] Node 0 DMA per-cpu: Mar 22 20:30:01 solr01-gs kernel: [716098.079310] CPU0: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079374] CPU1: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079439] CPU2: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079527] CPU3: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079591] Node 0 DMA32 per-cpu: Mar 22 20:30:01 solr01-gs kernel: [716098.079642] CPU0: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079706] CPU1: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079770] CPU2: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079834] CPU3: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079899] Node 0 Normal per-cpu: Mar 22 20:30:01 solr01-gs kernel: [716098.079951] CPU0: hi: 186, btch: 31 usd: 17 Mar 22 20:30:01 solr01-gs kernel: [716098.080015] CPU1: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.080079] CPU2: hi: 186, btch: 31 usd: 2 Mar 22 20:30:01 solr01-gs kernel: [716098.080142] CPU3: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.080209] active_anon:2638016 inactive_anon:388557 isolated_anon:0 Mar 22 20:30:01 solr01-gs kernel: [716098.080209] active_file:68 inactive_file:236 isolated_file:0 Mar 22 20:30:01 solr01-gs kernel: [716098.080210] unevictable:0 dirty:5 writeback:5 unstable:0 Mar 22 20:30:01 solr01-gs kernel: [716098.080211] free:16573 slab_reclaimable:2398 slab_unreclaimable:2335 Mar 22 20:30:01 solr01-gs kernel: [716098.080212] mapped:36
Re: Slow queries for common terms
take a look here: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html looking at memory consumption can be a bit tricky to interpret with MMapDirectory. But you say I see the CPU working very hard which implies that your issue is just scoring 90M documents. A way to test: try q=*:*fq=field:book. My bet is that that will be much faster, in which case scoring is your choke-point and you'll need to spread that load across more servers, i.e. shard. When running the above, make sure of a couple of things: 1 you haven't run the fq query before (or you have filterCache turned completely off). 2 you _have_ run a query or two that warms up your low-level caches. Doesn't matter what, just as long as it doesn't have an fq clause. Best Erick On Sat, Mar 23, 2013 at 3:10 AM, David Parks davidpark...@yahoo.com wrote: I see the CPU working very hard, and at the same time I see 2 MB/sec disk access for that 15 seconds. I am not running it this instant, but it seems to me that there was more CPU cycles available, so unless it's an issue of not being able to multithread it any further I'd say it's more IO related. I'm going to set up solr cloud and shard across the 2 servers I have available for now. It's not an optimal setup we have while we're in a private beta period, but maybe it'll improve things (I've got 2 servers with 2x 4TB disks in raid-0 shared with the webservers). I'll work towards some improved IO performance and maybe more shards and see how things go. I'll also be able to up the RAM in just a couple of weeks. Are there any settings I should think of in terms of improving cache performance when I can give it say 10GB of RAM? Thanks, this has been tremendously helpful. David -Original Message- From: Tom Burton-West [mailto:tburt...@umich.edu] Sent: Saturday, March 23, 2013 1:38 AM To: solr-user@lucene.apache.org Subject: Re: Slow queries for common terms Hi David and Jan, I wrote the blog post, and David, you are right, the problem we had was with phrase queries because our positions lists are so huge. Boolean queries don't need to read the positions lists. I think you need to determine whether you are CPU bound or I/O bound.It is possible that you are I/O bound and reading the term frequency postings for 90 million docs is taking a long time. In that case, More memory in the machine (but not dedicated to Solr) might help because Solr relies on OS disk caching for caching the postings lists. You would still need to do some cache warming with your most common terms. On the other hand as Jan pointed out, you may be cpu bound because Solr doesn't have early termination and has to rank all 90 million docs in order to show the top 10 or 25. Did you try the OR search to see if your CPU is at 100%? Tom On Fri, Mar 22, 2013 at 10:14 AM, Jan Høydahl jan@cominvent.com wrote: Hi There might not be a final cure with more RAM if you are CPU bound. Scoring 90M docs is some work. Can you check what's going on during those 15 seconds? Is your CPU at 100%? Try an (foo OR bar OR baz) search which generates 100mill hits and see if that is slow too, even if you don't use frequent words. I'm sure you can find other frequent terms in your corpus which display similar behaviour, words which are even more frequent than book. Are you using AND as default operator? You will benefit from limiting the number of results as much as possible. The real solution is to shard across N number of servers, until you reach the desired performance for the desired indexing/querying load. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com
Re: Two problems (missing updates and timeouts)
For your first problem I'd be looking at the solr logs and verifying that 1 the update was sent 2 no stack traces are thrown 3 You probably already know all about commits, but just in case the commit interval is passed. For your second problem, I'm not quite sure where you're setting these timeouts. SolrJ? Best Erick On Sat, Mar 23, 2013 at 4:23 PM, Aaron Jensen aaronjen...@gmail.com wrote: Hi all, I'm having two problem with our solr implementation. I don't have a lot of detail about them because we're just starting to get into diagnosing them. I'm hoping for some help with that diagnosis, ideas, tips, whatever. Our stack: Rails Sunspot Solr sunspot_index_queue two solr servers, master and slave, all traffic currently going to master, slave is just a replication slave/backup. The first and biggest problem is that we occasionally lose updates. Something will get added to the database, it will trigger a solr update, but then we can't search for that thing. It's just gone. indexing that thing again will have it show up. There are a number of moving parts in our stack and this is a relatively new problem. It was working fine for 1.5 years without a problem. We're considering adding a delayed job that will index anything that is newly created a second after it is created just to be sure but this is a giant hack. Any ideas around this would be helpful. The second problem is that we get occasional timeouts. These don't happen very often, maybe 5-7/day. Solr is serving at most like 350 requests per minute. Our timeouts are set to 2 seconds on read and 1 second on open. Average response time is around 20ms. It doesn't seem like any requests should be timing out but they are. I have no idea how to debug it either. Any ideas? Thanks, Aaron
Re: Solr 4.2 Incremental backups
That's essentially what replication does, only backs up parts of the index that have changed. However, when segments merge, that might mean the entire index needs to be replicated. Best Erick On Sun, Mar 24, 2013 at 12:08 AM, Sandeep Kumar Anumalla sanuma...@etisalat.ae wrote: Hi, Is there any option to do Incremental backups in Solr 4.2? Thanks Regards Sandeep A Ext : 02618-2856 M : 0502493820 The content of this email together with any attachments, statements and opinions expressed herein contains information that is private and confidential are intended for the named addressee(s) only. If you are not the addressee of this email you may not copy, forward, disclose or otherwise use it or any part of it in any form whatsoever. If you have received this message in error please notify postmas...@etisalat.ae by email immediately and delete the message without making any copies.
Re: Too many fields to Sort in Solr
Certainly that will be true for the bare q=*:*, I meant with the boosting clause added. Best Erick On Sun, Mar 24, 2013 at 7:01 PM, adityab aditya_ba...@yahoo.com wrote: thanks Eric. in this query q=*:* the Lucene score is always 1 -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4050944.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: OutOfMemoryError
You can also use -verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails -Xloggc:gc.log as additional options to get a gc.log file and see what GC is doing. Regards Bernd Am 25.03.2013 16:01, schrieb Arkadi Colson: How can I see if GC is actually working? Is it written in the tomcat logs as well or will I only see it in the memory graphs? BR, Arkadi On 03/25/2013 03:50 PM, Bernd Fehling wrote: We use munin with jmx plugin for monitoring all server and Solr installations. (http://munin-monitoring.org/) Only for short time monitoring we also use jvisualvm delivered with Java SE JDK. Regards Bernd Am 25.03.2013 14:45, schrieb Arkadi Colson: Thanks for the info! I just upgraded java from 6 to 7... How exactly do you monitor the memory usage and the affect of the garbage collector? On 03/25/2013 01:18 PM, Bernd Fehling wrote: The of UseG1GC yes, but with Solr 4.x, Jetty 8.1.8 and Java HotSpot(TM) 64-Bit Server VM (1.7.0_07). os.arch: amd64 os.name: Linux os.version: 2.6.32.13-0.5-xen Only args are -XX:+UseG1GC -Xms16g -Xmx16g. Monitoring shows that 16g is a bit high, I might reduce it to 10g or 12g for the slaves. Start is at 5g, runtime is between 6 and 8g with some peaks to 9.5g. Single index, 130GByte, 43.5 mio. dokuments. Regards, Bernd Am 25.03.2013 11:55, schrieb Arkadi Colson: Is sombody using the UseG1GC garbage collector with Solr and Tomcat 7? Any extra options needed? Thanks... On 03/25/2013 08:34 AM, Arkadi Colson wrote: I changed my system memory to 12GB. Solr now gets -Xms2048m -Xmx8192m as parameters. I also added -XX:+UseG1GC to the java process. But now the whole machine crashes! Any idea why? Mar 22 20:30:01 solr01-gs kernel: [716098.077809] java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0 Mar 22 20:30:01 solr01-gs kernel: [716098.077962] java cpuset=/ mems_allowed=0 Mar 22 20:30:01 solr01-gs kernel: [716098.078019] Pid: 29339, comm: java Not tainted 2.6.32-5-amd64 #1 Mar 22 20:30:01 solr01-gs kernel: [716098.078095] Call Trace: Mar 22 20:30:01 solr01-gs kernel: [716098.078155] [810b6324] ? oom_kill_process+0x7f/0x23f Mar 22 20:30:01 solr01-gs kernel: [716098.078233] [810b6848] ? __out_of_memory+0x12a/0x141 Mar 22 20:30:01 solr01-gs kernel: [716098.078309] [810b699f] ? out_of_memory+0x140/0x172 Mar 22 20:30:01 solr01-gs kernel: [716098.078385] [810ba704] ? __alloc_pages_nodemask+0x4ec/0x5fc Mar 22 20:30:01 solr01-gs kernel: [716098.078469] [812fb47a] ? io_schedule+0x93/0xb7 Mar 22 20:30:01 solr01-gs kernel: [716098.078541] [810bbc69] ? __do_page_cache_readahead+0x9b/0x1b4 Mar 22 20:30:01 solr01-gs kernel: [716098.078626] [81064fc0] ? wake_bit_function+0x0/0x23 Mar 22 20:30:01 solr01-gs kernel: [716098.078702] [810bbd9e] ? ra_submit+0x1c/0x20 Mar 22 20:30:01 solr01-gs kernel: [716098.078773] [810b4a72] ? filemap_fault+0x17d/0x2f6 Mar 22 20:30:01 solr01-gs kernel: [716098.078849] [810ca9e2] ? __do_fault+0x54/0x3c3 Mar 22 20:30:01 solr01-gs kernel: [716098.078921] [810ccd36] ? handle_mm_fault+0x3b8/0x80f Mar 22 20:30:01 solr01-gs kernel: [716098.078999] [8101166e] ? apic_timer_interrupt+0xe/0x20 Mar 22 20:30:01 solr01-gs kernel: [716098.079078] [812febf6] ? do_page_fault+0x2e0/0x2fc Mar 22 20:30:01 solr01-gs kernel: [716098.079153] [812fca95] ? page_fault+0x25/0x30 Mar 22 20:30:01 solr01-gs kernel: [716098.079222] Mem-Info: Mar 22 20:30:01 solr01-gs kernel: [716098.079261] Node 0 DMA per-cpu: Mar 22 20:30:01 solr01-gs kernel: [716098.079310] CPU0: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079374] CPU1: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079439] CPU2: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079527] CPU3: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079591] Node 0 DMA32 per-cpu: Mar 22 20:30:01 solr01-gs kernel: [716098.079642] CPU0: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079706] CPU1: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079770] CPU2: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079834] CPU3: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079899] Node 0 Normal per-cpu: Mar 22 20:30:01 solr01-gs kernel: [716098.079951] CPU0: hi: 186, btch: 31 usd: 17 Mar 22 20:30:01 solr01-gs kernel: [716098.080015] CPU1: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.080079] CPU2: hi: 186, btch: 31 usd: 2 Mar 22 20:30:01 solr01-gs kernel: [716098.080142] CPU3: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.080209] active_anon:2638016 inactive_anon:388557 isolated_anon:0 Mar 22 20:30:01 solr01-gs kernel:
Re: Undefined field problem.
unless you're manually typing things and did a typo, your problem is that your csv file defines: active_cruises and your schema has active_cruise Note the lack of an 's'... Best Erick On Mon, Mar 25, 2013 at 6:30 AM, Mid Night mid...@gmail.com wrote: Further to the prev msg: Here's an extract from my current schema.xml: field name=show_en type=boolean indexed=true stored=false required=true / field name=active_cruise type=boolean indexed=true stored=true/ field name=non_grata type=boolean indexed=true stored=true/ field name=toptipp type=int indexed=true stored=true/ The original schema.xml had the last 3 fields in the order toptipp, active_cruise and non_grata. Active_cruise and non_grata were also defined as type=int. I changed the order and field types in my attempts to fix the error. On 25 March 2013 11:21, Mid Night mid...@gmail.com wrote: Hi, I recently added a new field (toptipp) to an existing solr schema.xml and it worked just fine. Subsequently I added to more fields (active_cruises and non_grata) to the schema and now I get this error: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime6/int/lstlst name=errorstr name=msgundefined field: active_cruise/strint name=code400/int/lst /response My solr db is populated via a program that creates and uploads a csv file. When I view the csv file, the field active_cruises (given as undefined above), is populated correctly. As far as I can tell, when I added the final fields to the schema, I did exactly the same as when I added toptipp. I updated schema.xml and restarted solr (java -jar start.jar). I am really at a loss here. Can someone please help with the answer or by pointing me in the right direction? Naturally I'd be happy to provide further info if needed. Thanks MK
Re: Tlog File not removed after hard commit
The tlogs will stay there to provide peer synch on the last 100 docs. Say a node somehow gets out of synch. There are two options 1 replay from the log 2 replicate the entire index. To avoid 2 if possible, the tlog is kept around. In your case, all your data is put in the tlog file, so the keep the last 100 docs available rule means you'll keep the entire log for the run around until the _next_ run completes, at which point I'd expect the oldest one to be deleted. Best Erick On Mon, Mar 25, 2013 at 8:40 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: My understanding is that logs stick around for a while just in case they can be used to catch up a shard that rejoins the cluster. On Mar 24, 2013 12:03 PM, Niran Fajemisin afa...@yahoo.com wrote: Hi all, We import about 1.5 million documents on a nightly basis using DIH. During this time, we need to ensure that all documents make it into index otherwise rollback on any errors; which DIH takes care of for us. We also disable autoCommit in DIH but instruct it to commit at the very end of the import. This is all done through configuration of the DIH config XML file and the command issued to the request handler. We have noticed that the tlog file appears to linger around even after DIH has issued the hard commit. My expectation would be that after the hard commit has occurred, the tlog file will be removed. I'm obviously misunderstanding how this all works. Can someone please help me understand how this is meant to function? Thanks! -Niran
Re: Retriving results based on SOLR query data.
There's no good way that I know of to have Solr do that for you. But you have the original query so it seems like your app layer could sort the results accordingly. Best Erick On Mon, Mar 25, 2013 at 8:44 AM, atuldj.jadhav atuldj.jad...@gmail.comwrote: Hi Team, I want to overcome a sort issue here.. sort feature works fine. I have indexed few documents in SOLR.. which have a unique document ID. Now when I retrieve result's from SOLR results comes automatically sorted. However I would like to fetch results based on the sequence I mention in my SOLR query. http://hostname:8080/SOLR/browse?q=documentID:D12133 OR documentID:D14423 OR documentID:D912 I want results in same order... D12133 D14423 D912 Regards, Atul -- View this message in context: http://lucene.472066.n3.nabble.com/Retriving-results-based-on-SOLR-query-data-tp4051076.html Sent from the Solr - User mailing list archive at Nabble.com.
Query slow with termVectors termPositions termOffsets
Hello, We re-indexed our entire core of 115 docs with some of the fields having termVectors=true termPositions=true termOffsets=true, prior to the reindex we only had termVectors=true. After the reindex the the query component has become very slow. I thought that adding the termOffsets and termPositions will increase the speed, am I wrong ? Several queries like the one shown below which used to run fine are now very slow. Can somebody kindly clarify how termOffsets and termPositions affect query component ? lst name=processdouble name=time19076.0/double lst name=org.apache.solr.handler.component.QueryComponentdouble name=time18972.0/double/lst lst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.component.StatsComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.component.QueryElevationComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.clustering.ClusteringComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.component.DebugComponentdouble name=time104.0/double/lst /lst [#|2013-03-25T11:22:53.446-0400|INFO|sun-appserver2.1|org.apache.solr.core.SolrCore|_ThreadID=45;_ThreadName=httpSSLWorkerThread-9001-19;|[xxx] webapp=/solr-admin path=/select params={q=primarysectionnode:(/national*+OR+/health*)+OR+(contenttype:Blog+AND+subheadline:(The+Checkup+OR+Checkpoint+Washington+OR+Post+Carbon+OR+TSA+OR+College+Inc.+OR+Campus+Overload+OR+Planet+Panel+OR+The+Answer+Sheet+OR+Class+Struggle+OR+BlogPost))+OR+(contenttype:Photo+Gallery+AND+headline:day+in+photos)start=0rows=1sort=displaydatetime+descfq=-source:(Reuters+OR+PC+World+OR+CBS+News+OR+NC8/WJLA+OR+NewsChannel+8+OR+NC8+OR+WJLA+OR+CBS)+-contenttype:(Discussion+OR+Photo)+-slug:(op-*dummy*+OR+noipad-*)+-(contenttype:Photo+Gallery+AND+headline:(Drawing+Board+OR+Drawing+board+OR+drawing+board))+headline:[*+TO+*]+contenttype:[*+TO+*]+pubdatetime:[NOW/DAY-3YEARS+TO+NOW/DAY%2B1DAY]+-headline:(Summary+Box*+OR+Video*+OR+Post+Sports+Live*)+-slug:(warren*+OR+history)+-(contenttype:Blog+AND+subheadline:(DC+Schools+Insider+OR+On+Leadership))+contenttype:Blog+-systemid:(999c7102-955a-11e2-95ca-dd43e7ffee9c+OR+72bbb724-9554-11e2-95ca-dd43e7ffee9c+OR+2d008b80-9520-11e2-95ca-dd43e7ffee9c+OR+d2443d3c-9514-11e2-95ca-dd43e7ffee9c+OR+173764d6-9520-11e2-95ca-dd43e7ffee9c+OR+0181fd42-953c-11e2-95ca-dd43e7ffee9c+OR+e6cacb96-9559-11e2-95ca-dd43e7ffee9c+OR+03288052-9501-11e2-95ca-dd43e7ffee9c+OR+ddbf020c-9517-11e2-95ca-dd43e7ffee9c)+fullbody:[*+TO+*]wt=javabinversion=2} hits=4985 status=0 QTime=19044 |#] Thanks, Ravi Kiran Bhaskar
Re: Undefined field problem.
Generally, you will need to delete the index and completely reindex your data if you change the type of a field. I don't think that would account for active_cruise being an undefined field though. I did try your scenario with the Solr 4.2 example, and a field named active_cruise, and it worked fine for me. The only issue was that existing data (e.g., 1 in the int field) was all considered as boolean false after I changed the schema and restarted. -- Jack Krupansky -Original Message- From: Mid Night Sent: Monday, March 25, 2013 6:30 AM To: solr-user@lucene.apache.org Subject: Re: Undefined field problem. Further to the prev msg: Here's an extract from my current schema.xml: field name=show_en type=boolean indexed=true stored=false required=true / field name=active_cruise type=boolean indexed=true stored=true/ field name=non_grata type=boolean indexed=true stored=true/ field name=toptipp type=int indexed=true stored=true/ The original schema.xml had the last 3 fields in the order toptipp, active_cruise and non_grata. Active_cruise and non_grata were also defined as type=int. I changed the order and field types in my attempts to fix the error. On 25 March 2013 11:21, Mid Night mid...@gmail.com wrote: Hi, I recently added a new field (toptipp) to an existing solr schema.xml and it worked just fine. Subsequently I added to more fields (active_cruises and non_grata) to the schema and now I get this error: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime6/int/lstlst name=errorstr name=msgundefined field: active_cruise/strint name=code400/int/lst /response My solr db is populated via a program that creates and uploads a csv file. When I view the csv file, the field active_cruises (given as undefined above), is populated correctly. As far as I can tell, when I added the final fields to the schema, I did exactly the same as when I added toptipp. I updated schema.xml and restarted solr (java -jar start.jar). I am really at a loss here. Can someone please help with the answer or by pointing me in the right direction? Naturally I'd be happy to provide further info if needed. Thanks MK
Re: Contributors Group
While you're in that mode, could you please add 'Upayavira'. Thanks! Upayavira On Mon, Mar 25, 2013, at 02:41 PM, Steve Rowe wrote: On Mar 25, 2013, at 10:32 AM, Swati Swoboda sswob...@igloosoftware.com wrote: Can I be added to the contributors group? Username sswoboda. Added to solr ContributorsGroup.
Re: Contributors Group
On Mar 25, 2013, at 11:59 AM, Upayavira u...@odoko.co.uk wrote: While you're in that mode, could you please add 'Upayavira'. Added to solr ContributorsGroup.
lucene 42 codec
Hi, I noticed that apache solr 4.2 uses the lucene codec 4.1. How can I switch to 4.2? Thanks in advance Mario
Re: Query slow with termVectors termPositions termOffsets
Did index size increase after turning on termPositions and termOffsets? Thanks. Alex. -Original Message- From: Ravi Solr ravis...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Mar 25, 2013 8:27 am Subject: Query slow with termVectors termPositions termOffsets Hello, We re-indexed our entire core of 115 docs with some of the fields having termVectors=true termPositions=true termOffsets=true, prior to the reindex we only had termVectors=true. After the reindex the the query component has become very slow. I thought that adding the termOffsets and termPositions will increase the speed, am I wrong ? Several queries like the one shown below which used to run fine are now very slow. Can somebody kindly clarify how termOffsets and termPositions affect query component ? lst name=processdouble name=time19076.0/double lst name=org.apache.solr.handler.component.QueryComponentdouble name=time18972.0/double/lst lst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.component.StatsComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.component.QueryElevationComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.clustering.ClusteringComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.component.DebugComponentdouble name=time104.0/double/lst /lst [#|2013-03-25T11:22:53.446-0400|INFO|sun-appserver2.1|org.apache.solr.core.SolrCore|_ThreadID=45;_ThreadName=httpSSLWorkerThread-9001-19;|[xxx] webapp=/solr-admin path=/select params={q=primarysectionnode:(/national*+OR+/health*)+OR+(contenttype:Blog+AND+subheadline:(The+Checkup+OR+Checkpoint+Washington+OR+Post+Carbon+OR+TSA+OR+College+Inc.+OR+Campus+Overload+OR+Planet+Panel+OR+The+Answer+Sheet+OR+Class+Struggle+OR+BlogPost))+OR+(contenttype:Photo+Gallery+AND+headline:day+in+photos)start=0rows=1sort=displaydatetime+descfq=-source:(Reuters+OR+PC+World+OR+CBS+News+OR+NC8/WJLA+OR+NewsChannel+8+OR+NC8+OR+WJLA+OR+CBS)+-contenttype:(Discussion+OR+Photo)+-slug:(op-*dummy*+OR+noipad-*)+-(contenttype:Photo+Gallery+AND+headline:(Drawing+Board+OR+Drawing+board+OR+drawing+board))+headline:[*+TO+*]+contenttype:[*+TO+*]+pubdatetime:[NOW/DAY-3YEARS+TO+NOW/DAY%2B1DAY]+-headline:(Summary+Box*+OR+Video*+OR+Post+Sports+Live*)+-slug:(warren*+OR+history)+-(contenttype:Blog+AND+subheadline:(DC+Schools+Insider+OR+On+Leadership))+contenttype:Blog+-systemid:(999c7102-955a-11e2-95ca-dd43e7ffee9c+OR+72bbb724-9554-11e2-95ca-dd43e7ffee9c+OR+2d008b80-9520-11e2-95ca-dd43e7ffee9c+OR+d2443d3c-9514-11e2-95ca-dd43e7ffee9c+OR+173764d6-9520-11e2-95ca-dd43e7ffee9c+OR+0181fd42-953c-11e2-95ca-dd43e7ffee9c+OR+e6cacb96-9559-11e2-95ca-dd43e7ffee9c+OR+03288052-9501-11e2-95ca-dd43e7ffee9c+OR+ddbf020c-9517-11e2-95ca-dd43e7ffee9c)+fullbody:[*+TO+*]wt=javabinversion=2} hits=4985 status=0 QTime=19044 |#] Thanks, Ravi Kiran Bhaskar
Error creating collection using CORE-API
Hi, I'm having an issue when I trying to create a collection: curl http://192.168.1.142:8983/solr/admin/cores?action=CREATEname=RT-4A46DF1563_12collection=RT-4A46DF1563_12shard=00collection.configName=reportssBucket-regular The curl call has an error because the collection.configName doesn't exists, then I fixed the curl call to: curl http://192.168.1.142:8983/solr/admin/cores?action=CREATEname=RT-4A46DF1563_12collection=RT-4A46DF1563_12shard=00collection.configName=reportsBucket-regular But now I have this stacktrace: INFO: Creating SolrCore 'RT-4A46DF1563_12' using instanceDir: /Users/yriveiro/Dump/solrCloud/node00.solrcloud/solr/home/RT-4A46DF1563_12 Mar 25, 2013 5:15:35 PM org.apache.solr.cloud.ZkController createCollectionZkNode INFO: Check for collection zkNode:RT-4A46DF1563_12 Mar 25, 2013 5:15:35 PM org.apache.solr.cloud.ZkController createCollectionZkNode INFO: Collection zkNode exists Mar 25, 2013 5:15:35 PM org.apache.solr.cloud.ZkController readConfigName INFO: Load collection config from:/collections/RT-4A46DF1563_12 Mar 25, 2013 5:15:35 PM org.apache.solr.cloud.ZkController readConfigName SEVERE: Specified config does not exist in ZooKeeper:reportssBucket-regular Mar 25, 2013 5:15:35 PM org.apache.solr.core.CoreContainer recordAndThrow SEVERE: Unable to create core: RT-4A46DF1563_12 org.apache.solr.common.cloud.ZooKeeperException: Specified config does not exist in ZooKeeper:reportssBucket-regular In fact the collection is in zookeeper as a file and not as a folder, the question here is: If the CREATE command doesn't find the config, why it's created a file? and Why after this, I can't run the command again with the correct syntax without remove the file create by the failed CREATE command? - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Error-creating-collection-using-CORE-API-tp4051156.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strange error in Solr 4.2
I fixed it by setting JVM properties in glassfish. -Djavax.net.ssl.keyStorePassword=changeit -- View this message in context: http://lucene.472066.n3.nabble.com/Strange-error-in-Solr-4-2-tp4047386p4051159.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tlog File not removed after hard commit
Thanks Erick and Michael for the prompt responses. Cheers, Niran From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, March 25, 2013 10:21 AM Subject: Re: Tlog File not removed after hard commit The tlogs will stay there to provide peer synch on the last 100 docs. Say a node somehow gets out of synch. There are two options 1 replay from the log 2 replicate the entire index. To avoid 2 if possible, the tlog is kept around. In your case, all your data is put in the tlog file, so the keep the last 100 docs available rule means you'll keep the entire log for the run around until the _next_ run completes, at which point I'd expect the oldest one to be deleted. Best Erick On Mon, Mar 25, 2013 at 8:40 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: My understanding is that logs stick around for a while just in case they can be used to catch up a shard that rejoins the cluster. On Mar 24, 2013 12:03 PM, Niran Fajemisin afa...@yahoo.com wrote: Hi all, We import about 1.5 million documents on a nightly basis using DIH. During this time, we need to ensure that all documents make it into index otherwise rollback on any errors; which DIH takes care of for us. We also disable autoCommit in DIH but instruct it to commit at the very end of the import. This is all done through configuration of the DIH config XML file and the command issued to the request handler. We have noticed that the tlog file appears to linger around even after DIH has issued the hard commit. My expectation would be that after the hard commit has occurred, the tlog file will be removed. I'm obviously misunderstanding how this all works. Can someone please help me understand how this is meant to function? Thanks! -Niran
Re: Multi-core and replicated Solr cloud testing. Data-directory mis-configures
That example does not work if you have 1 collection (core) per node, all end up sharing the same index and overwrite one another. On Mon, Mar 25, 2013 at 6:27 PM, Gopal Patwa gopalpa...@gmail.com wrote: if you use default directory then it will use solr.home directory, I have tested solr cloud example on local machine with 5-6 nodes.And data directory was created under core name, like example2/solr/collection1/data. you could see example startup script from source code solr/cloud-dev/solrcloud-multi-start.sh example solrconfig.xml dataDir${solr.data.dir:}/dataDir On Sun, Mar 24, 2013 at 10:44 PM, Trevor Campbell tcampb...@atlassian.comwrote: I have three indexes which I have set up as three separate cores, using this solr.xml config. cores adminPath=/admin/cores host=${host:} hostPort=${jetty.port:} core name=jira-issue instanceDir=jira-issue property name=dataDir value=jira-issue/data/ / /core core name=jira-comment instanceDir=jira-comment property name=dataDir value=jira-comment/data/ / /core core name=jira-change-history instanceDir=jira-change-**history property name=dataDir value=jira-change-history/**data/ / /core /cores This works just fine a standalone solr. I duplicated this setup on the same machine under a completely separate solr installation (solr-nodeb) and modified all the data directroies to point to the direstories in nodeb. This all worked fine. I then connected the 2 instances together with zoo-keeper using settings -Dbootstrap_conf=true -Dcollection.configName=**jiraCluster -DzkRun -DnumShards=1 for the first intsance and -DzkHost=localhost:9080 for the second. (I'm using tomcat and ports 8080 and 8081 for the 2 Solr instances) Now the data directories of the second node point to the data directories in the first node. I have tried many settings in the solrconfig.xml for each core but am now using absolute paths, e.g. dataDir/home//solr-**4.2.0-nodeb/example/multicore/** jira-comment/data/dataDir previously I used ${solr.jira-comment.data.dir:/**home/tcampbell/solr-4.2.0-** nodeb/example/multicore/jira-**comment/data} but that had the same result. It seems zookeeper is forcing data directory config from the uploaded configuration on all the nodes in the cluster? How can I do testing on a single machine? Do I really need identical directory layouts on all machines?
Re: DocValues and field requirements
Hi Chris, Thanks for your detailed explanations. The default value is a difficult limitation. Especially for financial figures. I may try with some workaround like the lowest possible number for TrieLongField, but would be better to avoid such :) Regards. On 22 March 2013 20:39, Chris Hostetter hossman_luc...@fucit.org wrote: : Thank you for your response. Yes, that's strange. By enabling DocValues the : information about missing fields is lost, which changes the way of sorting : as well. Adding default value to the fields can change a logic of : application dramatically (I can't set default value to 0 for all : Trie*Fields fields, because it could impact the results displayed to the : end user, which is not good). It's a pity that using DocValues is so : limited. I'm not really up on docvalues, but i asked rmuir about this a bit on IRC the crux of the issue is that there are two differnet docvalue impls, one that uses a fixed amount of space per doc (ie: exactly one value per doc) and one that alloaws an ordered set of values per doc (ie: multivalued). the multivalued docvals impl was wired into solr for multivalued fields, and the single valued docvals impl was wired in for hte single valued case -- but since since the single valued docvals impl *has* to have a value for every doc, the schema error you encountered was added if you try to use it on a field that isn't required or doesn't have a default value -- to force you to be explicit about which default you want, instead of hte low level lucene 0 default coming into play w/o you knowing about it. (as Shawn mentioned) the multivalued docvals impl could concivably be used instead for these types of single valued fields (ie: to support 0 or 1 values) but there is no sorting support for multivalued docvals, so it would cause other problems. One possible workarround for people who want to take advantage of sort missing first/last type sorting on a docvals type field would be to mange the missing information yourself in a distinct field which you also leveraged in any filtering or sorting on the docvals field. ie, have a docvalues field myfield which is single valued, with some configured default value, and then have a myfield_exists boolean field which is single valued and required. when indexing docs, if myfield does/doesn't have a value set myfield_exists to accordingly (this would be fairly trivial in an updated processor) and then instead of sorting just on myfield desc you would sort on myfield_exists (asc|desc), myfield desc (where you pick hte asc or desc depending on wether you want docs w/o values first or last). you would likewise need to filter on myfield_exists:true anytime you did queries against the myfield field. (perhaps someoen could work on patch to inject a synthetic field like this automatically for fields that are docValues=true multiValued=false required=false w/o a defualtValue?) -Hoss
Accessing SolrZkClient instance from a plug-in?
I have a custom ValueSourceParser that sets up a Zookeeper Watcher on some frequently changing metadata that a custom ValueSource depends on. Basic flow of events is - VSP watches for metadata changes, which triggers a refresh of some expensive data that my custom ValueSource uses at query time. Think of the data in Zookeeper as a pointer to some larger dataset that is computed offline and then loaded into memory for use by my custom ValueSource. In my ValueSourceParser, I connect to Zookeeper using an instance of the SolrZkClient class and am receiving WatchedEvents when my metadata changes (as expected). All this works great until core reload happens. From what I can tell, there's no shutdown hook for ValueSourceParsers, so what's happening is that my code ends up adding multiple Watchers and thus receives multiple update events when the metadata changes. What I need is either 1) a shutdown hook in my VSP that allows me to clean-up the SolrZkClient instance my code is managing, or 2) access to the ZkController instance owned by the CoreContainer from my VSP. For me #2 is better as I'd prefer to just re-use Solr's instance of SolrZkClient. I can go and hack either of these in pretty easily but wanted to see if someone knows a better way to get 1 or 2? In general, it might be handy to allow plug-ins to get access to the Zookeeper client SolrCloud is using. Thanks. Tim
Re: Multi-core and replicated Solr cloud testing. Data-directory mis-configures
Solved. I was able to solve this by removing any reference to dataDir from the solrconfig.xml. So in solr.xml for each node I have: cores adminPath=/admin/cores host=${host:} hostPort=${jetty.port:} core name=jira-issue instanceDir=jira-issue property name=dataDir value=jira-issue/data/ / /core core name=jira-comment instanceDir=jira-comment property name=dataDir value=jira-comment/data/ / /core core name=jira-change-history instanceDir=jira-change-history property name=dataDir value=jira-change-history/data/ / /core /cores and in solrconfig.xml in each core I have removed the reference to dataDir completely. !-- dataDir${solr.core0.data.dir:}/dataDir -- On Tue, Mar 26, 2013 at 8:41 AM, Trevor Campbell tcampb...@atlassian.comwrote: That example does not work if you have 1 collection (core) per node, all end up sharing the same index and overwrite one another. On Mon, Mar 25, 2013 at 6:27 PM, Gopal Patwa gopalpa...@gmail.com wrote: if you use default directory then it will use solr.home directory, I have tested solr cloud example on local machine with 5-6 nodes.And data directory was created under core name, like example2/solr/collection1/data. you could see example startup script from source code solr/cloud-dev/solrcloud-multi-start.sh example solrconfig.xml dataDir${solr.data.dir:}/dataDir On Sun, Mar 24, 2013 at 10:44 PM, Trevor Campbell tcampb...@atlassian.comwrote: I have three indexes which I have set up as three separate cores, using this solr.xml config. cores adminPath=/admin/cores host=${host:} hostPort=${jetty.port:} core name=jira-issue instanceDir=jira-issue property name=dataDir value=jira-issue/data/ / /core core name=jira-comment instanceDir=jira-comment property name=dataDir value=jira-comment/data/ / /core core name=jira-change-history instanceDir=jira-change-**history property name=dataDir value=jira-change-history/**data/ / /core /cores This works just fine a standalone solr. I duplicated this setup on the same machine under a completely separate solr installation (solr-nodeb) and modified all the data directroies to point to the direstories in nodeb. This all worked fine. I then connected the 2 instances together with zoo-keeper using settings -Dbootstrap_conf=true -Dcollection.configName=**jiraCluster -DzkRun -DnumShards=1 for the first intsance and -DzkHost=localhost:9080 for the second. (I'm using tomcat and ports 8080 and 8081 for the 2 Solr instances) Now the data directories of the second node point to the data directories in the first node. I have tried many settings in the solrconfig.xml for each core but am now using absolute paths, e.g. dataDir/home//solr-**4.2.0-nodeb/example/multicore/** jira-comment/data/dataDir previously I used ${solr.jira-comment.data.dir:/**home/tcampbell/solr-4.2.0-** nodeb/example/multicore/jira-**comment/data} but that had the same result. It seems zookeeper is forcing data directory config from the uploaded configuration on all the nodes in the cluster? How can I do testing on a single machine? Do I really need identical directory layouts on all machines?
Re: Accessing SolrZkClient instance from a plug-in?
I don't know the ValueSourceParser from a hole in my head, but it looks like it has access to the solrcore with fp.req.getCore? If so, it's easy to get the zk stuff core.getCoreDescriptor.getCoreContainer.getZkController(.getZkClient). From memory, so perhaps with some minor misname. - Mark On Mar 25, 2013, at 6:03 PM, Timothy Potter thelabd...@gmail.com wrote: I have a custom ValueSourceParser that sets up a Zookeeper Watcher on some frequently changing metadata that a custom ValueSource depends on. Basic flow of events is - VSP watches for metadata changes, which triggers a refresh of some expensive data that my custom ValueSource uses at query time. Think of the data in Zookeeper as a pointer to some larger dataset that is computed offline and then loaded into memory for use by my custom ValueSource. In my ValueSourceParser, I connect to Zookeeper using an instance of the SolrZkClient class and am receiving WatchedEvents when my metadata changes (as expected). All this works great until core reload happens. From what I can tell, there's no shutdown hook for ValueSourceParsers, so what's happening is that my code ends up adding multiple Watchers and thus receives multiple update events when the metadata changes. What I need is either 1) a shutdown hook in my VSP that allows me to clean-up the SolrZkClient instance my code is managing, or 2) access to the ZkController instance owned by the CoreContainer from my VSP. For me #2 is better as I'd prefer to just re-use Solr's instance of SolrZkClient. I can go and hack either of these in pretty easily but wanted to see if someone knows a better way to get 1 or 2? In general, it might be handy to allow plug-ins to get access to the Zookeeper client SolrCloud is using. Thanks. Tim
Re: lucene 42 codec
: I noticed that apache solr 4.2 uses the lucene codec 4.1. How can I : switch to 4.2? Unless you've configured something oddly, Solr is already using the 4.2 codec. What you are probably seeing is that the fileformat for several types of files hasn't changed from the 4.1 (or even 4.0) versions, so they are still used in 4.2 (and confusingly include Lucene41 in the filenames in several cases). Note that in the 4.2 codec package javadocs, several codec related classes are not implemented, and the docs link back to the 4.1 and 4.0 implementations... https://lucene.apache.org/core/4_2_0/core/org/apache/lucene/codecs/lucene42/package-summary.html If you peek inside the Lucene42Codec class you'll also see... private final StoredFieldsFormat fieldsFormat = new Lucene41StoredFieldsFormat(); private final TermVectorsFormat vectorsFormat = new Lucene42TermVectorsFormat(); private final FieldInfosFormat fieldInfosFormat = new Lucene42FieldInfosFormat(); private final SegmentInfoFormat infosFormat = new Lucene40SegmentInfoFormat(); private final LiveDocsFormat liveDocsFormat = new Lucene40LiveDocsFormat(); -Hoss
Re: Accessing SolrZkClient instance from a plug-in?
Brilliant! Thank you - I was focusing on the init method and totally ignored the FunctionQParser passed to the parse method. Cheers, Tim On Mon, Mar 25, 2013 at 4:16 PM, Mark Miller markrmil...@gmail.com wrote: I don't know the ValueSourceParser from a hole in my head, but it looks like it has access to the solrcore with fp.req.getCore? If so, it's easy to get the zk stuff core.getCoreDescriptor.getCoreContainer.getZkController(.getZkClient). From memory, so perhaps with some minor misname. - Mark On Mar 25, 2013, at 6:03 PM, Timothy Potter thelabd...@gmail.com wrote: I have a custom ValueSourceParser that sets up a Zookeeper Watcher on some frequently changing metadata that a custom ValueSource depends on. Basic flow of events is - VSP watches for metadata changes, which triggers a refresh of some expensive data that my custom ValueSource uses at query time. Think of the data in Zookeeper as a pointer to some larger dataset that is computed offline and then loaded into memory for use by my custom ValueSource. In my ValueSourceParser, I connect to Zookeeper using an instance of the SolrZkClient class and am receiving WatchedEvents when my metadata changes (as expected). All this works great until core reload happens. From what I can tell, there's no shutdown hook for ValueSourceParsers, so what's happening is that my code ends up adding multiple Watchers and thus receives multiple update events when the metadata changes. What I need is either 1) a shutdown hook in my VSP that allows me to clean-up the SolrZkClient instance my code is managing, or 2) access to the ZkController instance owned by the CoreContainer from my VSP. For me #2 is better as I'd prefer to just re-use Solr's instance of SolrZkClient. I can go and hack either of these in pretty easily but wanted to see if someone knows a better way to get 1 or 2? In general, it might be handy to allow plug-ins to get access to the Zookeeper client SolrCloud is using. Thanks. Tim
Any experience with adding documents batch sizes?
My application is update intensive. The documents are pretty small, less than 1K bytes. Just now I'm batching 4K documents with each SolrJ addDocs() call. Wondering what I should expect with increasing this batch size? Say 8K docs per update? Thanks Roy Solr 3.6
Re: status 400 on posting json
Hi Jack, I tried putting the schema.xml file (further below) in the path you specified below, but when i tried to start (java -jar start.jar) got the message below. I can try a fresh install like you suggested, but I'm not sure what would be different. I was using documenationt at http://lucene.apache.org/solr/4_1_0/tutorial.html using the binary from zip. Are you suggesting building from source and/or some other approach? Also, what is the best documentation currently for 4.1 install (for mac), (there are a lot of sites out there.) Thanks in advance. -Patrice SEVERE: Unable to create core: collection1 org.apache.solr.common.SolrException: Unknown fieldtype 'string' specified on field id at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:390) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1000) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:680) Mar 25, 2013 7:14:53 PM org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException: Unable to create core: collection1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1654) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1039) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:680) Caused by: org.apache.solr.common.SolrException: Unknown fieldtype 'string' specified on field id at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:390) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1000) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033) ... 10 more --- Here's the normal path to the example configuration in Solr 4.1: .../solr-4.1.0/example/solr/collection1/conf That's the directory in which the example schema.xml and other configuration files live. There is no solr-4.1.0/example/conf directory, unless you managed to create one yourself. I suggest that you start with a fresh install of Solr 4.1 As far as keywords, the existing field is set up to be a comma-separated list of keyword phrases. Of course, you can structure it any way that your application requires. -- Jack Krupansky -Original Message- From: Patrice Seyed Sent: Saturday, March 16, 2013 2:48 AM To: solr-user@lucene.apache.org Subject: Re: status 400 on posting json Hi, Re: - Is there some place I should indicate what parameters are including in the json objects send? I was able to test books.json without the error. Yes, in Solr's schema.xml (under the conf/ directory). See http://wiki.apache.org/solr/SchemaXml for more details. Erik Hatcher and: - I tried it and I get the same error response! Which is because... I don't have a field named datasource. You need to check the Solr schema.xml for the available fields and then add any fields that your JSON uses that are not already there. Be sure to shutdown and restart Solr after editing the schema. I did notice that there is a keywords field, but it is not multivalued, while you keywords are multivalued. Or, you can us dynamic fields, such as datasource_s and keywords_ss (s for string and a second s for multivalued), etc. for your other fields. -- Jack Krupansky - Thanks very much for these responses. I'm still
Re: Problem with DataImportHandler and embedded entities
Did you ever resolve the issue with your full-import only importing 1 document. I'm monitoring the source db and its only issuing one query, it never attempts to query for the other documents on the top of the nest. I'm running into the exact same issue with NO help out there. Thanks in advance
Solrcloud 4.1 Collection with multiple slices only use
I have two issues and I'm unsure if they are related: Problem: After setting up a multiple collection Solrcloud 4.1 instance on seven servers, when I index the documents they aren't distributed across the index slices. It feels as though, I don't actually have a cloud implementation, yet everything I see in the admin interface and zookeeper implies I do. I feel as I'm overlooking something obvious, but have not been able to figure out what. Configuration: Seven servers and four collections, each with 12 slices (no replica shards yet). Zookeeper configured in a three node ensemble. When I send documents to Server1/Collection1 (which holds two slices of collection1), all the documents show up in a single index shard (core). Perhaps related, I have found it impossible to get Solr to recognize the server names with anything but a literal host=servername parameter in the solr.xml. hostname parameters, host files, network, dns, are all configured correctly I have a Solr 4.0 single collection set up similarly and it works just fine. I'm using the same schema.xml and solrconfig.xml files on the 4.1 implementation with only the luceneMatchVersion changed to LUCENE_41. sample solr.xml from server1 ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores hostPort=8080 host=server1 shareSchema=true zkClientTimeout=6 core collection=col201301 shard=col201301s04 instanceDir=/solr/col201301/col201301s04sh01 name=col201301s04sh01 dataDir=/solr/col201301/col201301s04sh01/data/ core collection=col201301 shard=col201301s11 instanceDir=/solr/col201301/col201301s11sh01 name=col201301s11sh01 dataDir=/solr/col201301/col201301s11sh01/data/ core collection=col201302 shard=col201302s06 instanceDir=/solr/col201302/col201302s06sh01 name=col201302s06sh01 dataDir=/solr/col201302/col201302s06sh01/data/ core collection=col201303 shard=col201303s01 instanceDir=/solr/col201303/col201303s01sh01 name=col201303s01sh01 dataDir=/solr/col201303/col201303s01sh01/data/ core collection=col201303 shard=col201303s08 instanceDir=/solr/col201303/col201303s08sh01 name=col201303s08sh01 dataDir=/solr/col201303/col201303s08sh01/data/ core collection=col201304 shard=col201304s03 instanceDir=/solr/col201304/col201304s03sh01 name=col201304s03sh01 dataDir=/solr/col201304/col201304s03sh01/data/ core collection=col201304 shard=col201304s10 instanceDir=/solr/col201304/col201304s10sh01 name=col201304s10sh01 dataDir=/solr/col201304/col201304s10sh01/data/ /cores /solr Thanks Chris
Re: Any experience with adding documents batch sizes?
Hi, You'll have to test because there is no general rule that works in all environments, but from testing this a while back, you will reach the point of diminishing returns at some point. You don't mention using StreamingUpdateSolrServer, so you may want to try that instead: http://lucene.apache.org/solr/api-3_6_1/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.html Otis -- Solr ElasticSearch Support http://sematext.com/ On Mon, Mar 25, 2013 at 7:06 PM, Benjamin, Roy rbenja...@ebay.com wrote: My application is update intensive. The documents are pretty small, less than 1K bytes. Just now I'm batching 4K documents with each SolrJ addDocs() call. Wondering what I should expect with increasing this batch size? Say 8K docs per update? Thanks Roy Solr 3.6
Re: OutOfMemoryError
Arkadi, jstat -gcutil -h20 pid 2000 100 also gives useful info about GC and I use it a lot for quick insight into what is going on with GC. SPM (see http://sematext.com/spm/index.html ) may also be worth using. Otis -- Solr ElasticSearch Support http://sematext.com/ On Mon, Mar 25, 2013 at 11:01 AM, Arkadi Colson ark...@smartbit.be wrote: How can I see if GC is actually working? Is it written in the tomcat logs as well or will I only see it in the memory graphs? BR, Arkadi On 03/25/2013 03:50 PM, Bernd Fehling wrote: We use munin with jmx plugin for monitoring all server and Solr installations. (http://munin-monitoring.org/) Only for short time monitoring we also use jvisualvm delivered with Java SE JDK. Regards Bernd Am 25.03.2013 14:45, schrieb Arkadi Colson: Thanks for the info! I just upgraded java from 6 to 7... How exactly do you monitor the memory usage and the affect of the garbage collector? On 03/25/2013 01:18 PM, Bernd Fehling wrote: The of UseG1GC yes, but with Solr 4.x, Jetty 8.1.8 and Java HotSpot(TM) 64-Bit Server VM (1.7.0_07). os.arch: amd64 os.name: Linux os.version: 2.6.32.13-0.5-xen Only args are -XX:+UseG1GC -Xms16g -Xmx16g. Monitoring shows that 16g is a bit high, I might reduce it to 10g or 12g for the slaves. Start is at 5g, runtime is between 6 and 8g with some peaks to 9.5g. Single index, 130GByte, 43.5 mio. dokuments. Regards, Bernd Am 25.03.2013 11:55, schrieb Arkadi Colson: Is sombody using the UseG1GC garbage collector with Solr and Tomcat 7? Any extra options needed? Thanks... On 03/25/2013 08:34 AM, Arkadi Colson wrote: I changed my system memory to 12GB. Solr now gets -Xms2048m -Xmx8192m as parameters. I also added -XX:+UseG1GC to the java process. But now the whole machine crashes! Any idea why? Mar 22 20:30:01 solr01-gs kernel: [716098.077809] java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0 Mar 22 20:30:01 solr01-gs kernel: [716098.077962] java cpuset=/ mems_allowed=0 Mar 22 20:30:01 solr01-gs kernel: [716098.078019] Pid: 29339, comm: java Not tainted 2.6.32-5-amd64 #1 Mar 22 20:30:01 solr01-gs kernel: [716098.078095] Call Trace: Mar 22 20:30:01 solr01-gs kernel: [716098.078155] [810b6324] ? oom_kill_process+0x7f/0x23f Mar 22 20:30:01 solr01-gs kernel: [716098.078233] [810b6848] ? __out_of_memory+0x12a/0x141 Mar 22 20:30:01 solr01-gs kernel: [716098.078309] [810b699f] ? out_of_memory+0x140/0x172 Mar 22 20:30:01 solr01-gs kernel: [716098.078385] [810ba704] ? __alloc_pages_nodemask+0x4ec/**0x5fc Mar 22 20:30:01 solr01-gs kernel: [716098.078469] [812fb47a] ? io_schedule+0x93/0xb7 Mar 22 20:30:01 solr01-gs kernel: [716098.078541] [810bbc69] ? __do_page_cache_readahead+**0x9b/0x1b4 Mar 22 20:30:01 solr01-gs kernel: [716098.078626] [81064fc0] ? wake_bit_function+0x0/0x23 Mar 22 20:30:01 solr01-gs kernel: [716098.078702] [810bbd9e] ? ra_submit+0x1c/0x20 Mar 22 20:30:01 solr01-gs kernel: [716098.078773] [810b4a72] ? filemap_fault+0x17d/0x2f6 Mar 22 20:30:01 solr01-gs kernel: [716098.078849] [810ca9e2] ? __do_fault+0x54/0x3c3 Mar 22 20:30:01 solr01-gs kernel: [716098.078921] [810ccd36] ? handle_mm_fault+0x3b8/0x80f Mar 22 20:30:01 solr01-gs kernel: [716098.078999] [8101166e] ? apic_timer_interrupt+0xe/0x20 Mar 22 20:30:01 solr01-gs kernel: [716098.079078] [812febf6] ? do_page_fault+0x2e0/0x2fc Mar 22 20:30:01 solr01-gs kernel: [716098.079153] [812fca95] ? page_fault+0x25/0x30 Mar 22 20:30:01 solr01-gs kernel: [716098.079222] Mem-Info: Mar 22 20:30:01 solr01-gs kernel: [716098.079261] Node 0 DMA per-cpu: Mar 22 20:30:01 solr01-gs kernel: [716098.079310] CPU0: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079374] CPU1: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079439] CPU2: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079527] CPU3: hi: 0, btch: 1 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079591] Node 0 DMA32 per-cpu: Mar 22 20:30:01 solr01-gs kernel: [716098.079642] CPU0: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079706] CPU1: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079770] CPU2: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079834] CPU3: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.079899] Node 0 Normal per-cpu: Mar 22 20:30:01 solr01-gs kernel: [716098.079951] CPU0: hi: 186, btch: 31 usd: 17 Mar 22 20:30:01 solr01-gs kernel: [716098.080015] CPU1: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.080079] CPU2: hi: 186, btch: 31 usd: 2 Mar 22 20:30:01 solr01-gs kernel: [716098.080142] CPU3: hi: 186, btch: 31 usd: 0 Mar 22 20:30:01 solr01-gs kernel: [716098.080209]
Re: status 400 on posting json
Your schema has only fields, but no field types. Check the Solr example schema for reference, and include all of the types defined there unless you know that you do not need them. string is clearly one that is needed. -- Jack Krupansky -Original Message- From: Patrice Seyed Sent: Monday, March 25, 2013 7:19 PM To: solr-user@lucene.apache.org Subject: Re: status 400 on posting json Hi Jack, I tried putting the schema.xml file (further below) in the path you specified below, but when i tried to start (java -jar start.jar) got the message below. I can try a fresh install like you suggested, but I'm not sure what would be different. I was using documenationt at http://lucene.apache.org/solr/4_1_0/tutorial.html using the binary from zip. Are you suggesting building from source and/or some other approach? Also, what is the best documentation currently for 4.1 install (for mac), (there are a lot of sites out there.) Thanks in advance. -Patrice SEVERE: Unable to create core: collection1 org.apache.solr.common.SolrException: Unknown fieldtype 'string' specified on field id at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:390) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1000) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:680) Mar 25, 2013 7:14:53 PM org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException: Unable to create core: collection1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1654) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1039) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:680) Caused by: org.apache.solr.common.SolrException: Unknown fieldtype 'string' specified on field id at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:390) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1000) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033) ... 10 more --- Here's the normal path to the example configuration in Solr 4.1: .../solr-4.1.0/example/solr/collection1/conf That's the directory in which the example schema.xml and other configuration files live. There is no solr-4.1.0/example/conf directory, unless you managed to create one yourself. I suggest that you start with a fresh install of Solr 4.1 As far as keywords, the existing field is set up to be a comma-separated list of keyword phrases. Of course, you can structure it any way that your application requires. -- Jack Krupansky -Original Message- From: Patrice Seyed Sent: Saturday, March 16, 2013 2:48 AM To: solr-user@lucene.apache.org Subject: Re: status 400 on posting json Hi, Re: - Is there some place I should indicate what parameters are including in the json objects send? I was able to test books.json without the error. Yes, in Solr's schema.xml (under the conf/ directory). See http://wiki.apache.org/solr/SchemaXml for more details. Erik Hatcher and: - I tried it and I get the same error response! Which is because... I don't have a field named datasource. You need to check the Solr schema.xml for the available fields and then add any fields that your JSON uses that are not already there. Be sure to shutdown and restart Solr after editing the schema. I did notice that there is a keywords field, but it is not multivalued, while you keywords are multivalued. Or, you can us dynamic fields, such as datasource_s and keywords_ss (s for string
Re: Using Solr For a Real Search Engine
Hi, This question is too open-ended for anyone to give you a good answer. Maybe you want to ask more specific questions? As for embedding vs. war, start with a simpler war and think about the alternatives if that doesn't work for you. Otis -- Solr ElasticSearch Support http://sematext.com/ On Fri, Mar 22, 2013 at 8:07 AM, Furkan KAMACI furkankam...@gmail.comwrote: If I want to use Solr in a web search engine what kind of strategies should I follow about how to run Solr. I mean I can run it via embedded jetty or use war and deploy to a container? You should consider that I will have heavy work load on my Solr.
Re: Solrcloud 4.1 Collection with multiple slices only use
I'm guessing you didn't specify numShards. Things changed in 4.1 - if you don't specify numShards it goes into a mode where it's up to you to distribute updates. - Mark On Mar 25, 2013, at 10:29 PM, Chris R corg...@gmail.com wrote: I have two issues and I'm unsure if they are related: Problem: After setting up a multiple collection Solrcloud 4.1 instance on seven servers, when I index the documents they aren't distributed across the index slices. It feels as though, I don't actually have a cloud implementation, yet everything I see in the admin interface and zookeeper implies I do. I feel as I'm overlooking something obvious, but have not been able to figure out what. Configuration: Seven servers and four collections, each with 12 slices (no replica shards yet). Zookeeper configured in a three node ensemble. When I send documents to Server1/Collection1 (which holds two slices of collection1), all the documents show up in a single index shard (core). Perhaps related, I have found it impossible to get Solr to recognize the server names with anything but a literal host=servername parameter in the solr.xml. hostname parameters, host files, network, dns, are all configured correctly I have a Solr 4.0 single collection set up similarly and it works just fine. I'm using the same schema.xml and solrconfig.xml files on the 4.1 implementation with only the luceneMatchVersion changed to LUCENE_41. sample solr.xml from server1 ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores hostPort=8080 host=server1 shareSchema=true zkClientTimeout=6 core collection=col201301 shard=col201301s04 instanceDir=/solr/col201301/col201301s04sh01 name=col201301s04sh01 dataDir=/solr/col201301/col201301s04sh01/data/ core collection=col201301 shard=col201301s11 instanceDir=/solr/col201301/col201301s11sh01 name=col201301s11sh01 dataDir=/solr/col201301/col201301s11sh01/data/ core collection=col201302 shard=col201302s06 instanceDir=/solr/col201302/col201302s06sh01 name=col201302s06sh01 dataDir=/solr/col201302/col201302s06sh01/data/ core collection=col201303 shard=col201303s01 instanceDir=/solr/col201303/col201303s01sh01 name=col201303s01sh01 dataDir=/solr/col201303/col201303s01sh01/data/ core collection=col201303 shard=col201303s08 instanceDir=/solr/col201303/col201303s08sh01 name=col201303s08sh01 dataDir=/solr/col201303/col201303s08sh01/data/ core collection=col201304 shard=col201304s03 instanceDir=/solr/col201304/col201304s03sh01 name=col201304s03sh01 dataDir=/solr/col201304/col201304s03sh01/data/ core collection=col201304 shard=col201304s10 instanceDir=/solr/col201304/col201304s10sh01 name=col201304s10sh01 dataDir=/solr/col201304/col201304s10sh01/data/ /cores /solr Thanks Chris
Re: opinion: Stats over the faceting component
Nope, this doesn't find it: http://search-lucene.com/?q=facet+statsfc_project=Solrfc_type=issue Maybe Anirudha wants to do that? Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, Mar 21, 2013 at 5:16 AM, Upayavira u...@odoko.co.uk wrote: Have you made a JIRA ticket for this? This is useful generally, isn't it? Thx, Upayavira On Thu, Mar 21, 2013, at 03:18 AM, Tirthankar Chatterjee wrote: We have done something similar. Please read http://lucene.472066.n3.nabble.com/How-to-modify-Solr-StatsComponent-to-support-stats-query-td4028991.html https://plus.google.com/101157854606139706613/posts/HmYYit3RABM If this is something you wanted. On Mar 20, 2013, at 7:08 PM, Anirudha Jadhav wrote: I want to get an opinion here , instead of having statistics as an independent component which is always limited by faceting features ( eg. does not support date ranges or custom ranges , pivots etc). Why not have a parameter to facet component to compute and return stats. eg. facet.stats=true,facet.stats.stat=min,max,(sum(sqrt(x),log(y),z,0.5)) let me know your thoughts, -- Anirudha P. Jadhav **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you. *
Re: Solr index Backup and restore of large indexs
Hi, Try something like this: http://host/solr/replication?command=backup See: http://wiki.apache.org/solr/SolrReplication Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, Mar 21, 2013 at 3:23 AM, Sandeep Kumar Anumalla sanuma...@etisalat.ae wrote: Hi, We are loading daily 1TB (Apprx) of index data .Please let me know the best procedure to take Backup and restore of the indexes. I am using Solr 4.2. Thanks Regards Sandeep A Ext : 02618-2856 M : 0502493820 The content of this email together with any attachments, statements and opinions expressed herein contains information that is private and confidential are intended for the named addressee(s) only. If you are not the addressee of this email you may not copy, forward, disclose or otherwise use it or any part of it in any form whatsoever. If you have received this message in error please notify postmas...@etisalat.ae by email immediately and delete the message without making any copies.
Re: Shingles Filter Query time behaviour
Hi, What does your query look like? Does it look like q=name:dark knight? If so, note that only dark is going against the name field. Try q=name:dark name:knight or q=name:dark knight. Otis -- Solr ElasticSearch Support http://sematext.com/ On Mon, Mar 18, 2013 at 6:21 PM, Catala, Francois francois.cat...@nuance.com wrote: Hello, I am trying to have the input darkknight match documents containing either dark knight and darkknight. The reverse should also work (dark knight matching dark knight and darkknight) but it doesn't. Does anyone know why? When I run the following query I get the expected response with the two documents matched lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str name=flname/str str name=indenttrue/str str name=qname:darkknight/str str name=wtxml/str /lst /lst result name=response numFound=2 start=0 doc str name=nameBatman, the darkknight Rises/str/doc doc str name=nameBatman, the dark knight Rises/str/doc /result /response HOWEVER when I run the same query looking for dark knight two words I get only 1 document matched as shows the response : lst name=responseHeader int name=status0/int int name=QTime0/int lst name=params str name=flname/str str name=indenttrue/str str name=qname:dark knight/str str name=wtxml/str /lst /lst result name=response numFound=1 start=0 doc str name=nameBatman, the dark knight Rises/str/doc /result /response I have these documents as input : doc field name=idbat1/field field name=nameBatman, the dark knight Rises/field /doc doc field name=idbat2/field field name=nameBatman, the darkknight Rises/field /doc And I defined this analyser : analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ShingleFilterFactory tokenSeparator= outputUnigrams=true/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ShingleFilterFactory tokenSeparator= outputUnigrams=true outputUnigramIfNoNgrams=true/ /analyzer
Re: Shingles Filter Query time behaviour
Or, q=name:(dark knight) . -- Jack Krupansky -Original Message- From: Otis Gospodnetic Sent: Monday, March 25, 2013 11:51 PM To: solr-user@lucene.apache.org Subject: Re: Shingles Filter Query time behaviour Hi, What does your query look like? Does it look like q=name:dark knight? If so, note that only dark is going against the name field. Try q=name:dark name:knight or q=name:dark knight. Otis -- Solr ElasticSearch Support http://sematext.com/ On Mon, Mar 18, 2013 at 6:21 PM, Catala, Francois francois.cat...@nuance.com wrote: Hello, I am trying to have the input darkknight match documents containing either dark knight and darkknight. The reverse should also work (dark knight matching dark knight and darkknight) but it doesn't. Does anyone know why? When I run the following query I get the expected response with the two documents matched lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str name=flname/str str name=indenttrue/str str name=qname:darkknight/str str name=wtxml/str /lst /lst result name=response numFound=2 start=0 doc str name=nameBatman, the darkknight Rises/str/doc doc str name=nameBatman, the dark knight Rises/str/doc /result /response HOWEVER when I run the same query looking for dark knight two words I get only 1 document matched as shows the response : lst name=responseHeader int name=status0/int int name=QTime0/int lst name=params str name=flname/str str name=indenttrue/str str name=qname:dark knight/str str name=wtxml/str /lst /lst result name=response numFound=1 start=0 doc str name=nameBatman, the dark knight Rises/str/doc /result /response I have these documents as input : doc field name=idbat1/field field name=nameBatman, the dark knight Rises/field /doc doc field name=idbat2/field field name=nameBatman, the darkknight Rises/field /doc And I defined this analyser : analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ShingleFilterFactory tokenSeparator= outputUnigrams=true/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ShingleFilterFactory tokenSeparator= outputUnigrams=true outputUnigramIfNoNgrams=true/ /analyzer
Re: Query slow with termVectors termPositions termOffsets
Yes the index size increased after turning on termPositions and termOffsets Ravi Kiran Bhaskar On Mon, Mar 25, 2013 at 1:13 PM, alx...@aim.com wrote: Did index size increase after turning on termPositions and termOffsets? Thanks. Alex. -Original Message- From: Ravi Solr ravis...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Mar 25, 2013 8:27 am Subject: Query slow with termVectors termPositions termOffsets Hello, We re-indexed our entire core of 115 docs with some of the fields having termVectors=true termPositions=true termOffsets=true, prior to the reindex we only had termVectors=true. After the reindex the the query component has become very slow. I thought that adding the termOffsets and termPositions will increase the speed, am I wrong ? Several queries like the one shown below which used to run fine are now very slow. Can somebody kindly clarify how termOffsets and termPositions affect query component ? lst name=processdouble name=time19076.0/double lst name=org.apache.solr.handler.component.QueryComponentdouble name=time18972.0/double/lst lst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.component.StatsComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.component.QueryElevationComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.clustering.ClusteringComponentdouble name=time0.0/double/lst lst name=org.apache.solr.handler.component.DebugComponentdouble name=time104.0/double/lst /lst [#|2013-03-25T11:22:53.446-0400|INFO|sun-appserver2.1|org.apache.solr.core.SolrCore|_ThreadID=45;_ThreadName=httpSSLWorkerThread-9001-19;|[xxx] webapp=/solr-admin path=/select params={q=primarysectionnode:(/national*+OR+/health*)+OR+(contenttype:Blog+AND+subheadline:(The+Checkup+OR+Checkpoint+Washington+OR+Post+Carbon+OR+TSA+OR+College+Inc.+OR+Campus+Overload+OR+Planet+Panel+OR+The+Answer+Sheet+OR+Class+Struggle+OR+BlogPost))+OR+(contenttype:Photo+Gallery+AND+headline:day+in+photos)start=0rows=1sort=displaydatetime+descfq=-source:(Reuters+OR+PC+World+OR+CBS+News+OR+NC8/WJLA+OR+NewsChannel+8+OR+NC8+OR+WJLA+OR+CBS)+-contenttype:(Discussion+OR+Photo)+-slug:(op-*dummy*+OR+noipad-*)+-(contenttype:Photo+Gallery+AND+headline:(Drawing+Board+OR+Drawing+board+OR+drawing+board))+headline:[*+TO+*]+contenttype:[*+TO+*]+pubdatetime:[NOW/DAY-3YEARS+TO+NOW/DAY%2B1DAY]+-headline:(Summary+Box*+OR+Video*+OR+Post+Sports+Live*)+-slug:(warren*+OR+history)+-(contenttype:Blog+AND+subheadline:(DC+Schools+Insider+OR+On+Leadership))+contenttype:Blog+-systemid:(999c7102-955a-11e2-95ca-dd43e7ffee9c+OR+72bbb724-9554-11e2-95ca-dd43e7ffee9c+OR+2d008b80-9520-11e2-95ca-dd43e7ffee9c+OR+d2443d3c-9514-11e2-95ca-dd43e7ffee9c+OR+173764d6-9520-11e2-95ca-dd43e7ffee9c+OR+0181fd42-953c-11e2-95ca-dd43e7ffee9c+OR+e6cacb96-9559-11e2-95ca-dd43e7ffee9c+OR+03288052-9501-11e2-95ca-dd43e7ffee9c+OR+ddbf020c-9517-11e2-95ca-dd43e7ffee9c)+fullbody:[*+TO+*]wt=javabinversion=2} hits=4985 status=0 QTime=19044 |#] Thanks, Ravi Kiran Bhaskar
Re: Solrcloud 4.1 Collection with multiple slices only use
Interesting, I saw some comments about numshards, but it wasnt ever specific enough to catch.my attention. I will give it a try tomorrow. Thanks. On Mar 25, 2013 11:35 PM, Mark Miller markrmil...@gmail.com wrote: I'm guessing you didn't specify numShards. Things changed in 4.1 - if you don't specify numShards it goes into a mode where it's up to you to distribute updates. - Mark On Mar 25, 2013, at 10:29 PM, Chris R corg...@gmail.com wrote: I have two issues and I'm unsure if they are related: Problem: After setting up a multiple collection Solrcloud 4.1 instance on seven servers, when I index the documents they aren't distributed across the index slices. It feels as though, I don't actually have a cloud implementation, yet everything I see in the admin interface and zookeeper implies I do. I feel as I'm overlooking something obvious, but have not been able to figure out what. Configuration: Seven servers and four collections, each with 12 slices (no replica shards yet). Zookeeper configured in a three node ensemble. When I send documents to Server1/Collection1 (which holds two slices of collection1), all the documents show up in a single index shard (core). Perhaps related, I have found it impossible to get Solr to recognize the server names with anything but a literal host=servername parameter in the solr.xml. hostname parameters, host files, network, dns, are all configured correctly I have a Solr 4.0 single collection set up similarly and it works just fine. I'm using the same schema.xml and solrconfig.xml files on the 4.1 implementation with only the luceneMatchVersion changed to LUCENE_41. sample solr.xml from server1 ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores hostPort=8080 host=server1 shareSchema=true zkClientTimeout=6 core collection=col201301 shard=col201301s04 instanceDir=/solr/col201301/col201301s04sh01 name=col201301s04sh01 dataDir=/solr/col201301/col201301s04sh01/data/ core collection=col201301 shard=col201301s11 instanceDir=/solr/col201301/col201301s11sh01 name=col201301s11sh01 dataDir=/solr/col201301/col201301s11sh01/data/ core collection=col201302 shard=col201302s06 instanceDir=/solr/col201302/col201302s06sh01 name=col201302s06sh01 dataDir=/solr/col201302/col201302s06sh01/data/ core collection=col201303 shard=col201303s01 instanceDir=/solr/col201303/col201303s01sh01 name=col201303s01sh01 dataDir=/solr/col201303/col201303s01sh01/data/ core collection=col201303 shard=col201303s08 instanceDir=/solr/col201303/col201303s08sh01 name=col201303s08sh01 dataDir=/solr/col201303/col201303s08sh01/data/ core collection=col201304 shard=col201304s03 instanceDir=/solr/col201304/col201304s03sh01 name=col201304s03sh01 dataDir=/solr/col201304/col201304s03sh01/data/ core collection=col201304 shard=col201304s10 instanceDir=/solr/col201304/col201304s10sh01 name=col201304s10sh01 dataDir=/solr/col201304/col201304s10sh01/data/ /cores /solr Thanks Chris
Re: Scaling Solr on VMWare
Hi Frank, If your servlet container had a crazy low setting for the max number of threads I think you would see the CPU underutilized. But I think you would also see errors in on the client about connections being requested. Sounds like a possibly VM issue that's not Solr-specific... Otis -- Solr ElasticSearch Support http://sematext.com/ On Mon, Mar 25, 2013 at 1:18 PM, Frank Wennerdahl frank.wennerd...@arcadelia.com wrote: Hi. We are currently benchmarking our Solr setup and are having trouble with scaling hardware for a single Solr instance. We want to investigate how one instance scales with hardware to find the optimal ratio of hardware vs sharding when scaling. Our main problem is that we cannot identify any hardware limitations, CPU is far from maxed out, disk I/O is not an issue as far as we can see and there is plenty of RAM available. In short we have a couple of questions that we hope someone here could help us with. Detailed information about our setup, use case and things we've tried is provided below the questions. Questions: 1. What could cause Solr to utilize only 2 CPU cores when sending multiple update requests in parallel in a VMWare environment? 2. Is there a software limit on the number of CPU cores that Solr can utilize while indexing? 3. Ruling out network and disk performance, what could cause a decrease in indexing speed when sending data over a network as opposed to sending it from the local machine? We are running on three cores per Solr instance, however only one core receives any non-trivial load. We are using VMWare (ESX 5.0) virtual machines for hosting Solr and a QNAP NAS containing 12 HDDs in a RAID5 setup for storage. Our data consists of a huge amount of small-sized documents. When indexing we are using Solr's javabin format (although not through Solrj, we have implemented the format in C#/.NET) and our batch size is currently 1000 documents. The actual size of the data varies, but the batches we have used range from approximately 450KB to 1050KB. We're sending these batches to Solr in parallel using a number of send threads. There are two issues that we've run into: 1. When sending data from one VM to Solr on another VM we observed that Solr did not seem to utilize CPU cores properly. The Solr VM had 8 vCPUs available and we were using 4 threads sending data in parallel. We saw a low (~29%) CPU utilization on the Solr VM with 2 cores doing almost all the work while the remaining cores remained almost idle. Increasing the number of send threads to 8 yielded the same result, capping our indexing speed to about 4.88MB per second. The client VM had 4 vCPUs which were hardly utilized as we were reading data from pre-generated files. To rule out network limitations we sent the test data to a server on the Solr VM that simply accepted the request and returned an empty response. We were able to send data at 219MB per second, so the network did not seem to be the bottleneck. We also tested sending data to Solr locally from the Solr VM to see if disk I/O was the problem. Surprisingly we were able to index significantly faster at 7.34MB per second using 4 send threads (8.4MB with 6 send threads) which indicated that the disk was not slowing us down when sending data over the network. Worth noting is that the CPU utilization was now higher (47,81% with 4 threads, 58,8% with 6) and the work was spread out over all cores. As before we used pre-generated files and the process sending the data used almost no CPU. 2. We decided to investigate how Solr would scale with additional vCPUs when indexing locally. We increased the number of vCPUs to 16 and the number of send threads to 8. Sadly we now experienced a decrease in performance: 7MB/s with 8 threads, 6.4MB/s with 12 threads and 4.95/s with 16 threads. The CPU usage was in average 30%, regardless of the number of threads used. We know that additional vCPUs can cause decreased performance in VMWare virtual machines due to time waiting for CPUs to become available. We investigated this using esxtop which only showed a 1% CSTP. According to VMWare http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=di splayKCexternalId=1005362 a CSTP above 3% could indictate that multiple vCPUs are causing performance issues. We noticed that the average disk write speed seemed to cap at around 11.5 million bytes per second so we tested the same VM setup using a faster disk. This did not yield any increase in performance (it was actually somewhat slower), neither did using a RAM-mapped drive for Solr. Any help or ideas of what could be the bottleneck in our setup would be greatly appreciated! Best regards, Frank Wennerdahl Developer Arcadelia AB
RE: Slow queries for common terms
book by itself returns in 4s (non-optimized disk IO), running it a second time returned 0s, so I think I can presume that the query was not cached the first time. This system has been up for week, so it's warm. I'm going to give your article a good long read, thanks for that. I guess good fast disks/SSDs and sharding should also improve on the base 4 sec query time. How _does_ Google get their queries times down to 0.35s anyway? I presume their indexes are larger than my 150G index. :) I still am a bit worried about what will happen when my index is 500GB (it'll happen soon enough), even with sharding... well... I'd just need a lot of servers it seems, and my feeling of it is that if I need a lot of servers for a few users, how will it scale to many users? Thanks for the great discussion, Dave -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, March 25, 2013 10:04 PM To: solr-user@lucene.apache.org Subject: Re: Slow queries for common terms take a look here: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html looking at memory consumption can be a bit tricky to interpret with MMapDirectory. But you say I see the CPU working very hard which implies that your issue is just scoring 90M documents. A way to test: try q=*:*fq=field:book. My bet is that that will be much faster, in which case scoring is your choke-point and you'll need to spread that load across more servers, i.e. shard. When running the above, make sure of a couple of things: 1 you haven't run the fq query before (or you have filterCache turned completely off). 2 you _have_ run a query or two that warms up your low-level caches. Doesn't matter what, just as long as it doesn't have an fq clause. Best Erick On Sat, Mar 23, 2013 at 3:10 AM, David Parks davidpark...@yahoo.com wrote: I see the CPU working very hard, and at the same time I see 2 MB/sec disk access for that 15 seconds. I am not running it this instant, but it seems to me that there was more CPU cycles available, so unless it's an issue of not being able to multithread it any further I'd say it's more IO related. I'm going to set up solr cloud and shard across the 2 servers I have available for now. It's not an optimal setup we have while we're in a private beta period, but maybe it'll improve things (I've got 2 servers with 2x 4TB disks in raid-0 shared with the webservers). I'll work towards some improved IO performance and maybe more shards and see how things go. I'll also be able to up the RAM in just a couple of weeks. Are there any settings I should think of in terms of improving cache performance when I can give it say 10GB of RAM? Thanks, this has been tremendously helpful. David -Original Message- From: Tom Burton-West [mailto:tburt...@umich.edu] Sent: Saturday, March 23, 2013 1:38 AM To: solr-user@lucene.apache.org Subject: Re: Slow queries for common terms Hi David and Jan, I wrote the blog post, and David, you are right, the problem we had was with phrase queries because our positions lists are so huge. Boolean queries don't need to read the positions lists. I think you need to determine whether you are CPU bound or I/O bound.It is possible that you are I/O bound and reading the term frequency postings for 90 million docs is taking a long time. In that case, More memory in the machine (but not dedicated to Solr) might help because Solr relies on OS disk caching for caching the postings lists. You would still need to do some cache warming with your most common terms. On the other hand as Jan pointed out, you may be cpu bound because Solr doesn't have early termination and has to rank all 90 million docs in order to show the top 10 or 25. Did you try the OR search to see if your CPU is at 100%? Tom On Fri, Mar 22, 2013 at 10:14 AM, Jan Høydahl jan@cominvent.com wrote: Hi There might not be a final cure with more RAM if you are CPU bound. Scoring 90M docs is some work. Can you check what's going on during those 15 seconds? Is your CPU at 100%? Try an (foo OR bar OR baz) search which generates 100mill hits and see if that is slow too, even if you don't use frequent words. I'm sure you can find other frequent terms in your corpus which display similar behaviour, words which are even more frequent than book. Are you using AND as default operator? You will benefit from limiting the number of results as much as possible. The real solution is to shard across N number of servers, until you reach the desired performance for the desired indexing/querying load. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com