abt Multicore
Hi, I have an app running on weblogic and oracle. Oracle DB is quite huge; say some 10 millions of records. I need to integrate Solr for this and I am planning to use multicore. How can multicore feature can be at the best? -Raghu
Re: Build Solr to run SolrJS
To give you more information. The error I get is this one: java.lang.NoClassDefFoundError: org/apache/solr/request/VelocityResponseWriter (wrong name: contrib/velocity/src/main/java/org/apache/solr/request/VelocityResponseWriter) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:621) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124) at org.apache.catalina.loader.WebappClassLoader.findClassInternal(WebappClassLoader.java:1847) at org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.java:890) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1354) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1233) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at and in the build , if i do a build-contrib-dist i get these messages.. ... build: [jar] Building jar: /home/joan/workspace/solr/contrib/dataimporthandler/target/apache-solr-dataimporthandler-1.4-dev.jar dist: [copy] Copying 1 file to /home/joan/workspace/solr/build/web/WEB-INF/lib [copy] Copying 1 file to /home/joan/workspace/solr/dist init: init-forrest-entities: compile-common: compile: make-manifest: compile: [javac] Compiling 4 source files to /home/joan/workspace/solr/contrib/velocity/target/classes build: [jar] Building jar: /home/joan/workspace/solr/contrib/velocity/src/main/solr/lib/apache-solr-velocity-1.4-dev.jar dist: ... where the dataimporthanler seems that is copied in the dist folders but velocity is not. Hope this saves you some time.. Joan -- View this message in context: http://www.nabble.com/Build-Solr-to-run-SolrJS-tp20526644p20535777.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: abt Multicore
On Mon, Nov 17, 2008 at 2:17 PM, Raghunandan Rao [EMAIL PROTECTED] wrote: I have an app running on weblogic and oracle. Oracle DB is quite huge; say some 10 millions of records. I need to integrate Solr for this and I am planning to use multicore. How can multicore feature can be at the best? To index records from a database, you can take a look at DataImportHandler. It would help if you are a bit more specific than that. What exactly do you want to know? It also helps if you tell us why you want to know about one particular thing, so that we may advise on better alternative solutions. -- Regards, Shalin Shekhar Mangar.
using deduplication with dataimporthandler
Hey there, I have posted before telling about my situation but I thing my explanation was a bit confusing... I am using dataImportHanlder and delta-import and it's working perfectly. I have also coded my own SqlEntityProcesor to delete from the index and database expired rows. Now I need to do duplication control at indexing time. In my old lucene core I made my own duplication control but it was so slow as it worked comparing strings... I have been investigating solr deduplication (http://wiki.apache.org/solr/Deduplication) and it seems so cool as it works with hashes instead of strings. I have learned how to use deduplication using the /update requestHandler as the wiki says: requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.processordedupe/str /lst /requestHandler But the thing is that I want to use it with the /dataimport requestHanlder (the one used by dataimporthandler). I don't know if there's a possible xml configuration to add deduplication to dataimportHandler or I should code a plugin... in that case, I don't exacly now where. Hope my explanation is more clear now... Thank's in advanced! -- View this message in context: http://www.nabble.com/using-deduplication-with-dataimporthandler-tp20536053p20536053.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: using deduplication with dataimporthandler
Any update processor can be used with DIH . First of all you may register your dedupe update processor as you do now. You can either pass the update.processor is the request parameter pr you can keep the it in the 'defaults' of datataimport handler str name=update.processordedupe/str On Mon, Nov 17, 2008 at 2:48 PM, Marc Sturlese [EMAIL PROTECTED] wrote: Hey there, I have posted before telling about my situation but I thing my explanation was a bit confusing... I am using dataImportHanlder and delta-import and it's working perfectly. I have also coded my own SqlEntityProcesor to delete from the index and database expired rows. Now I need to do duplication control at indexing time. In my old lucene core I made my own duplication control but it was so slow as it worked comparing strings... I have been investigating solr deduplication (http://wiki.apache.org/solr/Deduplication) and it seems so cool as it works with hashes instead of strings. I have learned how to use deduplication using the /update requestHandler as the wiki says: requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.processordedupe/str /lst /requestHandler But the thing is that I want to use it with the /dataimport requestHanlder (the one used by dataimporthandler). I don't know if there's a possible xml configuration to add deduplication to dataimportHandler or I should code a plugin... in that case, I don't exacly now where. Hope my explanation is more clear now... Thank's in advanced! -- View this message in context: http://www.nabble.com/using-deduplication-with-dataimporthandler-tp20536053p20536053.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
RE: solr 1.3 Modification field in schema.xml
Hi Todd, Thanks for this answer, ok but it's not just showing or not in the list, if a field is not shown but it's boost using qf do I need to store it ??? For a language field which need some special configuration like stemming ... thanks a lot for your clear answer, I believe (someone correct me if I'm wrong) that the only fields you need to store are those fields which you wish returned from the query. In other words, if you will never put the field on the list of fields (fl) to return, there is no need to store it. It would be advantageous not to store more then you have to. It reduces disk access, index size, memory usage, etc. However, you have to balance this against future needs. If re-indexing is costly just to start storing 1 more field, it may be worth it to just leave it in. -Todd Feak -- View this message in context: http://www.nabble.com/solr-1.3--Modification-field-in-schema.xml-tp20483691p20536926.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need help with SolrIndexSearcher CoreContainer
Hi, After 5-6 searches I run out of memory :-( Examples: String homeDir = /var/lib/tomcat5.5/webapps/solr; File configFile = new File( homeDir, solr.xml ); CoreContainer myCoreContainer = new CoreContainer( homeDir, configFile ); mySolrCore = myCoreContainer.getCore(core_de); RefCountedSolrIndexSearcher temp_search = mySolrCore.getSearcher(); SolrIndexSearcher searcher = temp_search.get(); No one ever worked directly with CoreContainer and SolrIndexSearcher ? Greets -Ralf-
Re: solr 1.3 Modification field in schema.xml
On Thu, Nov 13, 2008 at 10:43 PM, sunnyfr [EMAIL PROTECTED] wrote: Hi everybody, I don't get really when do I have to re index datas or not. I did a full import but I realised I stored too many fields which I don't need. So I have to change some fields inedexed which are stored to not stored. And I don't know if I have to re index my datas or not and in which case really do I have to re index datas. You will have to re-index Another question, I would like to know which field must be stored, I thought it was field which use function for boosting, but I just tried to boost one field indexed but not stored and it worked. Thanks a lot for putting some light on my questions, -- View this message in context: http://www.nabble.com/solr-1.3--Modification-field-in-schema.xml-tp20483691p20483691.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: using deduplication with dataimporthandler
Thank you so much. I have it sorted. I am wondering now if there is any more stable way to use deduplication than adding to the solr source project this patch: https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel (SOLR-799.patch 2008-11-12 05:10 PM this one exactly). I have downloaded the last nightly-build source code and couldn't see the needed classes in there. Anyones knows something?Should I ask this in the developers forum? Thanks in advanced Marc Sturlese wrote: Hey there, I have posted before telling about my situation but I thing my explanation was a bit confusing... I am using dataImportHanlder and delta-import and it's working perfectly. I have also coded my own SqlEntityProcesor to delete from the index and database expired rows. Now I need to do duplication control at indexing time. In my old lucene core I made my own duplication control but it was so slow as it worked comparing strings... I have been investigating solr deduplication (http://wiki.apache.org/solr/Deduplication) and it seems so cool as it works with hashes instead of strings. I have learned how to use deduplication using the /update requestHandler as the wiki says: requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.processordedupe/str /lst /requestHandler But the thing is that I want to use it with the /dataimport requestHanlder (the one used by dataimporthandler). I don't know if there's a possible xml configuration to add deduplication to dataimportHandler or I should code a plugin... in that case, I don't exacly now where. Hope my explanation is more clear now... Thank's in advanced! -- View this message in context: http://www.nabble.com/using-deduplication-with-dataimporthandler-tp20536053p20538008.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: using deduplication with dataimporthandler
Marc Sturlese wrote: Thank you so much. I have it sorted. I am wondering now if there is any more stable way to use deduplication than adding to the solr source project this patch: https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel (SOLR-799.patch 2008-11-12 05:10 PM this one exactly). I have downloaded the last nightly-build source code and couldn't see the needed classes in there. Anyones knows something?Should I ask this in the developers forum? The thing is I can't find the class org.apache.solr.update.processor.DeduplicateUpdateProcessorFactory anywhere... Thanks in advanced -- View this message in context: http://www.nabble.com/using-deduplication-with-dataimporthandler-tp20536053p20538077.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: using deduplication with dataimporthandler
On Mon, Nov 17, 2008 at 5:18 PM, Marc Sturlese [EMAIL PROTECTED]wrote: Thank you so much. I have it sorted. I am wondering now if there is any more stable way to use deduplication than adding to the solr source project this patch: https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel (SOLR-799.patchhttps://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel%28SOLR-799.patch 2008-11-12 05:10 PM this one exactly). I have downloaded the last nightly-build source code and couldn't see the needed classes in there. Anyones knows something?Should I ask this in the developers forum? The issue is still open, but I don't think it will remain open for long. Most likely, it will be released with the next Solr version. -- Regards, Shalin Shekhar Mangar.
Re: Build Solr to run SolrJS
On Nov 17, 2008, at 3:55 AM, JCodina wrote: java.lang.NoClassDefFoundError: org/apache/solr/request/VelocityResponseWriter (wrong name: ... [jar] Building jar: /home/joan/workspace/solr/contrib/dataimporthandler/target/apache- solr-dataimporthandler-1.4-dev.jar dist: ... [jar] Building jar: /home/joan/workspace/solr/contrib/velocity/src/main/solr/lib/apache- solr-velocity-1.4-dev.jar dist: ... where the dataimporthanler seems that is copied in the dist folders but velocity is not. Correct - I didn't want VelocityResponseWriter put into the example WAR by default. It's a contrib, not core, so I intentionally put it in a separate lib directory. Here are the instructions to wire it in successfully from trunk: http://wiki.apache.org/solr/VelocityResponseWriter However, it isn't currently suitable for wiring to SolrJS - Matthias and I will have to resolve that. Erik
Re: Build Solr to run SolrJS
On Nov 16, 2008, at 1:40 PM, Matthias Epheser wrote: Matthias and Ryan - let's get SolrJS integrated into contrib/ velocity. Any objections/reservations? As SolrJS may be used without velocity at all (using eg. ClientSideWidgets), is it possible to put it into contrib/ javascript and create a dependency to contrib/velocity for ServerSideWidgets? Sure, contrib/javascript sounds perfect. If that's ok, I'll have a look at the directory structure and the current ant build.xml to make them fit into the common solr structure and build. Awesome, thanks! Erik
Re: Solr security
On Nov 16, 2008, at 6:12 PM, Ian Holsman wrote: famous last words and all, but you shouldn't be just passing what a user types directly into a application should you? LOL I'd be parsing out wildcards, boosts, and fuzzy searches (or at least thinking about the effects). I mean jakarta apache~1000 or roam~0.1 aren't as efficient as a regular query. Sounds like the perfect case for a query parser plugin... or use dismax as Ryan mentioned. Shouldn't Solr be hardened for these cases anyway? Or at least hardenable. but they don't let me into design meetings any more ;( Apparently they shouldn't let me into them either ;) Erik
Re: Solr security
On Nov 16, 2008, at 6:18 PM, Ryan McKinley wrote: my assumption with solrjs is that you are hitting read-only solr servers that you don't mind if people query directly. Exactly the assumption I'm going with too. It would not be appropriate for something where you don't want people (who really care) to know you are running solr and could execute arbitrary queries. Since it is an example, I don't mind leaving the /admin interface open on: http://example.solrstuff.org/solrjs/admin/ but /update has a password: http://example.solrstuff.org/solrjs/update I have said in the past I like the idea of a read-only flag in solr config that would throw an error if you try to do something with the UpdateHandler. However there are other ways to do that also. Yes, I was asked about this elusive read-only switch at Solr Boot Camp at ApacheCon as well. How are you password protecting the update handler? This is the kind of goody I'd like to distill out of this thread and wikify http://wiki.apache.org/solr/SolrSecurity What's it take to make a read-only Solr server now? Can replication still be made to work? (I plead ignorance on the guts of the Java- based replication feature) - requires password protected handlers? Shouldn't we bake some of this into the default example configuration instead of update handlers being wide open by default? Erik
Re: Solr security
On Nov 16, 2008, at 6:27 PM, Ryan McKinley wrote: I'd be parsing out wildcards, boosts, and fuzzy searches (or at least thinking about the effects). I mean jakarta apache~1000 or roam~0.1 aren't as efficient as a regular query. Even if you leave the solr instance public, you can still limit grossly inefficent params by forcing things to use the dismax query parser. You can use invariants to lock what options are available. I suppose we don't have a way to say the *maximum* number of rows you can request is 100 (or something like that) A LimitingRowsSearchComponent could easily do this as a plugin though. Erik
Re: Solr security
On Nov 16, 2008, at 6:55 PM, Walter Underwood wrote: Limiting the maximum number of rows doesn't work, because they can request rows 2-20100. --wunder But you could limit how many rows could be returned in a single request... that'd close off one DoS mechanism. Erik
Re: Solr security
On Mon, Nov 17, 2008 at 8:54 AM, Erik Hatcher [EMAIL PROTECTED] wrote: Sounds like the perfect case for a query parser plugin... or use dismax as Ryan mentioned. Shouldn't Solr be hardened for these cases anyway? Or at least hardenable. Say you do filtering by user - how would you enforce that the client (if it's a browser) only send in the proper filter? Doesn't seem like you can unless you put all the user authentication stuff and application logic right in Solr. Now I guess you *could* stick everything in Solr that you would normally stick in the middle tier, but it doesn't seem like a great idea to me. -Yonik
Re: abt Multicore
Are all the documents in the same search space? That is, for a given query, could any of the 10MM docs be returned? If so, I don't think you need to worry about multicore. You may however need to put part of the index on various machines: http://wiki.apache.org/solr/DistributedSearch ryan On Nov 17, 2008, at 3:47 AM, Raghunandan Rao wrote: Hi, I have an app running on weblogic and oracle. Oracle DB is quite huge; say some 10 millions of records. I need to integrate Solr for this and I am planning to use multicore. How can multicore feature can be at the best? -Raghu
Using properties from core configuration in data-config.xml
Hello, is it possible to use properties from core configuration in data-config.xml? I want to define the baseDir for DataImportHandler. I tried the following configuration: *** solr.xml *** solr persistent=false cores adminPath='null' core name=core0 instanceDir=/opt/solr/cores/core0 property name=solrDataDir value=/opt/solr/cores/core0/data / property name=xmlDataDir value=/home/xml/core0 / /core ... /cores /solr *** data-config.xml *** dataConfig dataSource type=FileDataSource / document entity name=xmlFile processor=FileListEntityProcessor baseDir=${xmlDataDir} fileName=id-.*\.xml rootEntity=false dataSource=null entity name=data pk=id url=${xmlFile.fileAbsolutePath} processor=XPathEntityProcessor ... /dataConfig But this is the result: ... Nov 17, 2008 1:50:08 PM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Nov 17, 2008 1:50:08 PM org.apache.solr.core.SolrCore execute INFO: [posts-politics] webapp=/solr path=/dataimport params={optimize=truecommit=truecommand=full-importqt=/dataimportwt=javabinversion=2.2} status=0 QTime=66 Nov 17, 2008 1:50:08 PM org.apache.solr.core.SolrCore execute INFO: [posts-politics] webapp=/solr path=/dataimport params={qt=/dataimportwt=javabinversion=2.2} status=0 QTime=0 Nov 17, 2008 1:50:08 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [posts-politics] REMOVING ALL DOCUMENTS FROM INDEX Nov 17, 2008 1:50:08 PM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: 'baseDir' should point to a directory Processing Document # 1 at org.apache.solr.handler.dataimport.FileListEntityProcessor.init(FileListEntityProcessor.java:81) ... I tried also to configure all dataimport settings in solrconfig.xml, but I don't know how to do this exactly. Among other things, I tried this format: *** solrconfig.xml *** ... requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults lst name=datasource str name=typeFileDataSource/str lst name=document lst name=entity str name=namexmlFile/str str name=processorFileListEntityProcessor/str str name=baseDir${xmlDataDir}/str str name=fileNameid-.*\.xml/str str name=rootEntityfalse/str str name=dataSourcenull/str lst name=entity str name=namedata/str str name=pkid/str str name=url${xmlFile.fileAbsolutePath}/str ... /requestHandler ... But all my tests (with different dataimport formats in solrconfig.xml) failed: ... INFO: Reusing parent classloader Nov 17, 2008 2:18:14 PM org.apache.solr.common.SolrException log SEVERE: Error in solrconfig.xml:org.apache.solr.common.SolrException: No system property or default value specified for xmlFile.fileAbsolutePath at org.apache.solr.common.util.DOMUtil.substituteProperty(DOMUtil.java:311) at org.apache.solr.common.util.DOMUtil.substituteProperties(DOMUtil.java:264) ... Thanks again for your excellent support! Gisto -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
Re: Solr security
On Nov 17, 2008, at 9:07 AM, Yonik Seeley wrote: On Mon, Nov 17, 2008 at 8:54 AM, Erik Hatcher [EMAIL PROTECTED] wrote: Sounds like the perfect case for a query parser plugin... or use dismax as Ryan mentioned. Shouldn't Solr be hardened for these cases anyway? Or at least hardenable. Say you do filtering by user - how would you enforce that the client (if it's a browser) only send in the proper filter? Ryan already mentioned his technique... and here's how I'd do it similarly... Write a custom servlet Filter that grokked roles/authentication (this piece you'd need in any Java application tier anyway) [or plugin in an existing implementation through Spring or something like that] And then massaging of the request to Solr could happen in that pipeline, or adding a query parameter to the Solr request (ignoring anything sent by the client request for say, user=...). Perhaps plug in a custom SearchComponent that massaged a request parameter into a Solr filter query or whatever. Doesn't seem like you can unless you put all the user authentication stuff and application logic right in Solr. ;) Exactly. Sort of. Now I guess you *could* stick everything in Solr that you would normally stick in the middle tier, but it doesn't seem like a great idea to me. Let's be clear about where we are drawing the boundaries of the definition of Solr. One could say that Solr is solr.war and the HTTP conventions. Or is it solr.jar? Or is it the SolrJ API? Erik
Re: Solr security
Erik Hatcher schrieb: On Nov 16, 2008, at 6:18 PM, Ryan McKinley wrote: my assumption with solrjs is that you are hitting read-only solr servers that you don't mind if people query directly. Exactly the assumption I'm going with too. It would not be appropriate for something where you don't want people (who really care) to know you are running solr and could execute arbitrary queries. Since it is an example, I don't mind leaving the /admin interface open on: http://example.solrstuff.org/solrjs/admin/ but /update has a password: http://example.solrstuff.org/solrjs/update I have said in the past I like the idea of a read-only flag in solr config that would throw an error if you try to do something with the UpdateHandler. However there are other ways to do that also. As the thoughts and ideas of this thread are spread in several emails, let me just drop my uncoordinated thoughts here: For solrjs, what exactly is the required information solr has to provide directly: - We need data for several widgets. This data will be in 99% of the cases some facet information and/or result docs. The result docs will be in suitable ranges, no webpage will display 10+ result items at the same time. - So potentially dangerous request params like rows1000 or some other handlers apart from StandardRequest may be blocked. - update handlers and admin interface shouldn't be exposed. Like others mentioned before, I'm not sure this is a task that *has* to be solved inside Solr. As a standalone servlet, it is verly likely that it is NOT accessible directly in a production environment. Hiding or password protecting update/admin is an easy task using a proxy like apache http. It could also be solved by a configurable ServletFilter delivered with solr, that is initialized inside solr's web.xml. To separate the concerns, I think it should not be coded deeper inside the solr code. The idea of a read-only server can be implemented like that. Optional update urls that are only accessed inside a firewall or something may also be present. This servlet filter may also check the request params for things that are not needed for solrjs and potentially dangerous. It even may check how frequently urls are accessed (thinking about DoS). I think even if it looks like a direct access, using solrjs doesn't have to be different to common solr webapps. Usually these apps take user input, a web application translates this input into a solr query and translates the result in a suitable client format. Other solr stuff is blocked indirectly because only this app has access to solr. Now the last 2 steps are done inside the client. But if we block stuff that isn't used by the client, we are in control of what may happen. If that isn't secure enough, the more complicated solution would be the create such a stateful servlet that holds the query state of a client, and solrjs only performs /select/solrjs/?new_query=city:vienna or something. Then the query generation and all solr related stuff happens again on the server. I think it should easily be reached to deliver this SecuritySolrFilter with the standard solr distribution, making it configurable for the user to decide what urls are blocked/password protected and what request parameters should be checked for illegal values. On the other hand, existing firewalls and proxies of the destination system may be used.Therefore some best-practices may be helpful in the solr wiki. I would be fine by me to help implementing a standard securty filter for solr. WDYT? regards, matthias
Re: Solr security
Limiting the number of rows only handles one attack. The one I mentioned, fetching one page deep in the result set, caused a big issue on prod at our site. We needed to limit the max for start as well as rows. It is possible to make it safe, but a lot of work. We did this for Ultraseek. I would always, always front it with Apache, to get some of Apache's protection. wunder On 11/17/08 6:04 AM, Erik Hatcher [EMAIL PROTECTED] wrote: On Nov 16, 2008, at 6:55 PM, Walter Underwood wrote: Limiting the maximum number of rows doesn't work, because they can request rows 2-20100. --wunder But you could limit how many rows could be returned in a single request... that'd close off one DoS mechanism. Erik
Re: Solr security
On Nov 17, 2008, at 10:22 AM, Walter Underwood wrote: It is possible to make it safe, but a lot of work. We did this for Ultraseek. I would always, always front it with Apache, to get some of Apache's protection. What protections specifically are you speaking of with Apache in front? Authentication? Row limiting? Erik
Solr build with Rich text document plugin added?
Solr build with Rich Document (Doc/PDF etc) plugin already added?
Advice for indexing page numbers
How would you best deal with a page field in solr? Possible ranges are numbers (1 to 1000s) but also could include appendix page that include roman and alphabet characters (i, ii, iii, iv, as well as a, b, c, etc). It makes sense people would want to search for things between page 1 to 5 but I cannot really see how someone would search for page iv to 50. I was thinking to split this into two fields, one is just a string for exact matching (maybe case insensitive) and the other as a number for ranges. This would allow search for page ranges as well as exact matches. Has anyone had experience with pages or the like in solr? Is splitting it into two fields like this needed or can I do that with one of the standard filters that I have missed? -- Regards, Ian Connor
Re: Solr security
Say you do filtering by user - how would you enforce that the client (if it's a browser) only send in the proper filter? Ryan already mentioned his technique... and here's how I'd do it similarly... Write a custom servlet Filter that grokked roles/authentication (this piece you'd need in any Java application tier anyway) [or plugin in an existing implementation through Spring or something like that] And then massaging of the request to Solr could happen in that pipeline, or adding a query parameter to the Solr request (ignoring anything sent by the client request for say, user=...). Perhaps plug in a custom SearchComponent that massaged a request parameter into a Solr filter query or whatever. right, but the question is still: is there anything general enough to be in solr core? Everything I can think of requires a good sense of how the auth model is encoded in your data and how you want to expose it. Nothing I have done is general enough to share with even my next project. The only think I could imagine is perhaps adding getUserPrincipal() to the SolrRequest interface -- but this quickly explodes into also wanting the request method (POST vs GET) or the user-agent... in the end I just add the HttpServletRequest to the context and grab stuff from there. Perhaps the default RequestDispatcher could add the HttpServletRequest to the context... Doesn't seem like you can unless you put all the user authentication stuff and application logic right in Solr. ;) Exactly. Sort of. Now I guess you *could* stick everything in Solr that you would normally stick in the middle tier, but it doesn't seem like a great idea to me. Let's be clear about where we are drawing the boundaries of the definition of Solr. One could say that Solr is solr.war and the HTTP conventions. Or is it solr.jar? Or is it the SolrJ API? all of the above :) In my view we need to be clear about who solr.war is packaged for. I think we are pretty clear that solr.war should be thought of similar to a MySQL install -- that is a database server that unless you *really* know what you are doing should most likely be behind a firewall. solr.jar on the other hand lets you package what you want around search features to build a setup for your needs. Java already has so many options for how to secure / authenticate that you can just plug them into your own app. (if that is appropriate). In the past I have used a filter based on: http://www.onjava.com/pub/a/onjava/2004/03/24/loadcontrol.html to limit load -- however I have found that in any site where stability/ load and uptime are a serious concern, this is better handled in a tier in front of java -- typically the loadbalancer / haproxy / whatever -- and managed by people more cautious then me. ryan
Re: Solr security
Ryan McKinley wrote: solr.jar on the other hand lets you package what you want around search features to build a setup for your needs. Java already has so many options for how to secure / authenticate that you can just plug them into your own app. (if that is appropriate). In the past I have used a filter based on: http://www.onjava.com/pub/a/onjava/2004/03/24/loadcontrol.html to limit load -- however I have found that in any site where stability/load and uptime are a serious concern, this is better handled in a tier in front of java -- typically the loadbalancer / haproxy / whatever -- and managed by people more cautious then me. ryan Couldn't agree more. Almost all security and protection belong outside of solr. It can and will be done better, and solr can stick to what its good at. Smaller things like limiting complex query attacks or something seem more reasonable, but any real security should be provided elsewhere. Wouldn't that be odd if a bunch of open source products reimplemented network security layers and defenses on every project...
Re: Build Solr to run SolrJS
Erik Hatcher schrieb: However, it isn't currently suitable for wiring to SolrJS - Matthias and I will have to resolve that. Just noticed that VelocityResponeWriter in trunk is very reduced to my last patch from 2008-07-25. Moving the templates into a jar shouldn't be a problem. Setting the contentType is still possible, the methods{} wrapper may be moved into the template itself. The crucial difference is the missing translation into a solrj response by specifying the vl.response parameter. This was intended to make template creation more handy, cause the queryResponse is much nicer to navigate. If this conversion is to specific and shouldn't be in VelocityResponseWriter, would it be a problem to create a subclass inside contrib/javascript? matthias Erik
Re: Solr security
Ryan McKinley schrieb: however I have found that in any site where stability/load and uptime are a serious concern, this is better handled in a tier in front of java -- typically the loadbalancer / haproxy / whatever -- and managed by people more cautious then me. Full ack. What do you think about the only solr related thing left, the paramter filtering/blocking (eg. rows1000). Is this suitable to do it in a Filter delivered by solr? Of course as an optional alternative. ryan
RE: Solr security
I see value in this in the form of protecting the client from itself. For example, our Solr isn't accessible from the Internet. It's all behind firewalls. But, the client applications can make programming mistakes. I would love the ability to lock them down to a certain number of rows, just in case someone typos and puts in 1000 instead of 100, or the like. Admittedly, testing and QA should catch these things, but sometimes it's nice to put in a few safeguards to stop the obvious mistakes from occurring. -Todd Feak -Original Message- From: Matthias Epheser [mailto:[EMAIL PROTECTED] Sent: Monday, November 17, 2008 9:07 AM To: solr-user@lucene.apache.org Subject: Re: Solr security Ryan McKinley schrieb: however I have found that in any site where stability/load and uptime are a serious concern, this is better handled in a tier in front of java -- typically the loadbalancer / haproxy / whatever -- and managed by people more cautious then me. Full ack. What do you think about the only solr related thing left, the paramter filtering/blocking (eg. rows1000). Is this suitable to do it in a Filter delivered by solr? Of course as an optional alternative. ryan
sole 1.3: bug in phps response writer
Distributed queries: curl 'http://devxen0:8983/solr/core0/select? shards=search3:0,search3:8983/solr/ core2version=2.2start=0rows=10q=instance%3Arit%5C- csm.symplicity.com+AND+label%3ALoginwt=php' curl 'http://devxen0:8983/solr/core0/select? shards=search3:0,search3:8983/solr/ core2version=2.2start=0rows=10q=instance%3Arit%5C- csm.symplicity.com+AND+label%3ALoginwt=xml curl 'http://devxen0:8983/solr/core0/select? shards=search3:0,search3:8983/solr/ core2version=2.2start=0rows=10q=instance%3Arit%5C- csm.symplicity.com+AND+label%3ALoginwt=json'' All work fine, providing identical results in their respective formats (note the change in the wt param). curl 'http://devxen0:8983/solr/core0/select?shards=search3:8983/solr/ core0,search3:8983/solr/core2version=2.2start=0rows=10q=instance %3Arit%5C-csm.symplicity.com+AND+label%3ALoginwt=phps' fails with: java.lang.IllegalArgumentException: Map size must not be negative at org .apache .solr .request .PHPSerializedWriter.writeMapOpener(PHPSerializedResponseWriter.java: 195) at org .apache .solr.request.JSONWriter.writeSolrDocument(JSONResponseWriter.java:392) at org .apache .solr.request.JSONWriter.writeSolrDocumentList(JSONResponseWriter.java: 547) at org .apache .solr.request.TextResponseWriter.writeVal(TextResponseWriter.java:147) at org .apache .solr .request.JSONWriter.writeNamedListAsMapMangled(JSONResponseWriter.java: 150) at org .apache .solr .request .PHPSerializedWriter.writeNamedList(PHPSerializedResponseWriter.java:71) at org .apache .solr .request .PHPSerializedWriter.writeResponse(PHPSerializedResponseWriter.java:66) at org .apache .solr .request .PHPSerializedResponseWriter.write(PHPSerializedResponseWriter.java:47) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257) at org.mortbay.jetty.servlet.ServletHandler $CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java: 216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java: 405) at org .mortbay .jetty .handler.ContextHandlerCollection.handle(ContextHandlerCollection.java: 211) at org .mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java: 114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java: 502) at org.mortbay.jetty.HttpConnection $RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector $Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool $PoolThread.run(BoundedThreadPool.java:442) Questions: 1) Is this known? I didn't see it in the issue treacker. 2) What's the better course of action: a) download source, fix, submit patch, wait for new relase; b) drop phps and use json instead? Thanks
Re: Build Solr to run SolrJS
On Nov 17, 2008, at 11:45 AM, Matthias Epheser wrote: Just noticed that VelocityResponeWriter in trunk is very reduced to my last patch from 2008-07-25. Right, that was intentional for my own simplicity's sake... The crucial difference is the missing translation into a solrj response by specifying the vl.response parameter. This was intended to make template creation more handy, cause the queryResponse is much nicer to navigate. If this conversion is to specific and shouldn't be in VelocityResponseWriter, would it be a problem to create a subclass inside contrib/javascript? I need to understand it a bit more, but no subclass is necessary... we'll patch it into contrib/velocity's VrW like you had it before. Erik
Re: Solr security
On Nov 17, 2008, at 12:06 PM, Matthias Epheser wrote: Ryan McKinley schrieb: however I have found that in any site where stability/load and uptime are a serious concern, this is better handled in a tier in front of java -- typically the loadbalancer / haproxy / whatever -- and managed by people more cautious then me. Full ack. What do you think about the only solr related thing left, the paramter filtering/blocking (eg. rows1000). Is this suitable to do it in a Filter delivered by solr? Of course as an optional alternative. This could be done in a standard ServletFilter -- but that requires mucking with web.xml and may be more difficult if you are worried about it for some Handlers and not others. As eric mentioned earlier, this could be done in a QueryComponent -- the prepare part could just make sure the query parameters are all within reasonable ranges. This seems like something reasonable to add to solr. ryan
Re: Build Solr to run SolrJS
Erik Hatcher schrieb: On Nov 17, 2008, at 11:45 AM, Matthias Epheser wrote: Just noticed that VelocityResponeWriter in trunk is very reduced to my last patch from 2008-07-25. Right, that was intentional for my own simplicity's sake... The crucial difference is the missing translation into a solrj response by specifying the vl.response parameter. This was intended to make template creation more handy, cause the queryResponse is much nicer to navigate. If this conversion is to specific and shouldn't be in VelocityResponseWriter, would it be a problem to create a subclass inside contrib/javascript? I need to understand it a bit more, but no subclass is necessary... we'll patch it into contrib/velocity's VrW like you had it before. The key part is to pass a parameter like vl.response=QueryResponse, so the transformation works like that: object = request.getCore().getResourceLoader().newInstance(className, client.solrj.response.); solrResponse.setResponse(new EmbeddedSolrServer(request.getCore()).getParsedResponse(request, response)); This was done based on api changes from Ryan to generalize the second setResponse part. In the etmplate , there is access to the created response, as well as to the rawResponse. I'll try to add the least necessary stuff to current vrw, test ist against solrjs and post a patch to jira. matthias Erik
Re: Build Solr to run SolrJS
Can you elaborate on the use case for why you need the raw response like that? I vaguely get it, but want to really understand the need here. I'm weary of the EmbeddedSolrServer usage in there, as I want to distill the VrW stuff to be able to use SolrJ's API rather than assume embedded Solr. This way VrW can be separated from core Solr to another tier and template on remote Solr responses. Thoughts on how this feature might play out in that scenario? Erik On Nov 17, 2008, at 1:09 PM, Matthias Epheser wrote: Erik Hatcher schrieb: On Nov 17, 2008, at 11:45 AM, Matthias Epheser wrote: Just noticed that VelocityResponeWriter in trunk is very reduced to my last patch from 2008-07-25. Right, that was intentional for my own simplicity's sake... The crucial difference is the missing translation into a solrj response by specifying the vl.response parameter. This was intended to make template creation more handy, cause the queryResponse is much nicer to navigate. If this conversion is to specific and shouldn't be in VelocityResponseWriter, would it be a problem to create a subclass inside contrib/javascript? I need to understand it a bit more, but no subclass is necessary... we'll patch it into contrib/velocity's VrW like you had it before. The key part is to pass a parameter like vl.response=QueryResponse, so the transformation works like that: object = request.getCore().getResourceLoader().newInstance(className, client.solrj.response.); solrResponse.setResponse(new EmbeddedSolrServer(request.getCore()).getParsedResponse(request, response)); This was done based on api changes from Ryan to generalize the second setResponse part. In the etmplate , there is access to the created response, as well as to the rawResponse. I'll try to add the least necessary stuff to current vrw, test ist against solrjs and post a patch to jira. matthias Erik
Re: Build Solr to run SolrJS
On Nov 17, 2008, at 1:35 PM, Erik Hatcher wrote: Can you elaborate on the use case for why you need the raw response like that? I vaguely get it, but want to really understand the need here. I'm weary of the EmbeddedSolrServer usage in there, as I want to distill the VrW stuff to be able to use SolrJ's API rather than assume embedded Solr. This way VrW can be separated from core Solr to another tier and template on remote Solr responses. Thoughts on how this feature might play out in that scenario? Essentially the function: solrResponse.setResponse(new EmbeddedSolrServer(request.getCore()).getParsedResponse(request, response)); makes the results look as if they came from solrj. If the results did come from solrj, we would not need to set the solrResponse -- they would already be set and in the proper form. ryan
Re: Build Solr to run SolrJS
Ryan McKinley schrieb: On Nov 17, 2008, at 1:35 PM, Erik Hatcher wrote: Can you elaborate on the use case for why you need the raw response like that? I vaguely get it, but want to really understand the need here. I'm weary of the EmbeddedSolrServer usage in there, as I want to distill the VrW stuff to be able to use SolrJ's API rather than assume embedded Solr. This way VrW can be separated from core Solr to another tier and template on remote Solr responses. Thoughts on how this feature might play out in that scenario? After we add the SolrQueryResponse to the templates first, we realized that some convenience methods for iterating the result docs, accessing facets etc. would be fine. The idea was to reuse the existing wrappers (eg. QueryResponse). It makes it much nicer to create templates, because velocity is made to just render things, so code using docsets etc. directly may be very overloaded. Essentially the function: solrResponse.setResponse(new EmbeddedSolrServer(request.getCore()).getParsedResponse(request, response)); makes the results look as if they came from solrj. If the results did come from solrj, we would not need to set the solrResponse -- they would already be set and in the proper form. ryan
Fwd: Software Announcement: LuSql: Database to Lucene indexing
Hello - I wanted to forward this on, since I thought that people here might be able to use this to build indexes. So long as the lucene version in LuSQL matches the version in Solr, it would work fine for indexing - yea? Thanks for your time! Matthew Runo Software Engineer, Zappos.com [EMAIL PROTECTED] - 702-943-7833 Begin forwarded message: From: Glen Newton [EMAIL PROTECTED] Date: November 17, 2008 4:32:18 AM PST To: [EMAIL PROTECTED] Subject: Software Announcement: LuSql: Database to Lucene indexing Reply-To: [EMAIL PROTECTED] LuSql is a simple but powerful tool for building Lucene indexes from relational databases. It is a command-line Java application for the construction of a Lucene index from an arbitrary SQL query of a JDBC-accessible SQL database. It allows a user to control a number of parameters, including the SQL query to use, individual indexing/storage/term-vector nature of fields, analyzer, stop word list, and other tuning parameters. In its default mode it uses threading to take advantage of multiple cores. LuSql can handle complex queries, allows for additional per record sub-queries, and has a plug-in architecture for arbitrary Lucene document manipulation. Its only dependencies are three Apache Commons libraries, the Lucene core itself, and a JDBC driver. LuSql has been extensively tested, including a large 6+ million full-text metadata journal article document collection, producing an 86GB Lucene index in ~13 hours. http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql Glen Newton -- - - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Build Solr to run SolrJS
On Nov 17, 2008, at 2:11 PM, Matthias Epheser wrote: After we add the SolrQueryResponse to the templates first, we realized that some convenience methods for iterating the result docs, accessing facets etc. would be fine. The idea was to reuse the existing wrappers (eg. QueryResponse). It makes it much nicer to create templates, because velocity is made to just render things, so code using docsets etc. directly may be very overloaded. Right, and well understood. What I've put out there is a barebones skeleton, and there are lots of TODOs for these conveniences. I want to get it using SolrJ's API for request/response rather than the more internal stuff we're using now. Erik
Re: Build Solr to run SolrJS
On Nov 17, 2008, at 2:59 PM, Erik Hatcher wrote: On Nov 17, 2008, at 2:11 PM, Matthias Epheser wrote: After we add the SolrQueryResponse to the templates first, we realized that some convenience methods for iterating the result docs, accessing facets etc. would be fine. The idea was to reuse the existing wrappers (eg. QueryResponse). It makes it much nicer to create templates, because velocity is made to just render things, so code using docsets etc. directly may be very overloaded. Right, and well understood. What I've put out there is a barebones skeleton, and there are lots of TODOs for these conveniences. I want to get it using SolrJ's API for request/response rather than the more internal stuff we're using now. I think the 'internal' stuff mode is there because Yonik expressed concern about requiring the conversion of DocList to SolrDocumentList -- since Matthias had already done the work to access docs out of the DocList, I figured we should leave it in, even if I can't imagine using it. (someone may be worried about the performance win of not serializing DocList) ryan
Re: Software Announcement: LuSql: Database to Lucene indexing
Yeah, it'd work, though not only does the version of Lucene need to match, but the field indexing/storage attributes need to jive as well - and that is the trickier part of the equation. But yeah, LuSQL looks slick! Erik On Nov 17, 2008, at 2:17 PM, Matthew Runo wrote: Hello - I wanted to forward this on, since I thought that people here might be able to use this to build indexes. So long as the lucene version in LuSQL matches the version in Solr, it would work fine for indexing - yea? Thanks for your time! Matthew Runo Software Engineer, Zappos.com [EMAIL PROTECTED] - 702-943-7833 Begin forwarded message: From: Glen Newton [EMAIL PROTECTED] Date: November 17, 2008 4:32:18 AM PST To: [EMAIL PROTECTED] Subject: Software Announcement: LuSql: Database to Lucene indexing Reply-To: [EMAIL PROTECTED] LuSql is a simple but powerful tool for building Lucene indexes from relational databases. It is a command-line Java application for the construction of a Lucene index from an arbitrary SQL query of a JDBC-accessible SQL database. It allows a user to control a number of parameters, including the SQL query to use, individual indexing/storage/term-vector nature of fields, analyzer, stop word list, and other tuning parameters. In its default mode it uses threading to take advantage of multiple cores. LuSql can handle complex queries, allows for additional per record sub-queries, and has a plug-in architecture for arbitrary Lucene document manipulation. Its only dependencies are three Apache Commons libraries, the Lucene core itself, and a JDBC driver. LuSql has been extensively tested, including a large 6+ million full-text metadata journal article document collection, producing an 86GB Lucene index in ~13 hours. http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql Glen Newton -- - - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Build Solr to run SolrJS
Erik Hatcher schrieb: On Nov 17, 2008, at 2:11 PM, Matthias Epheser wrote: After we add the SolrQueryResponse to the templates first, we realized that some convenience methods for iterating the result docs, accessing facets etc. would be fine. The idea was to reuse the existing wrappers (eg. QueryResponse). It makes it much nicer to create templates, because velocity is made to just render things, so code using docsets etc. directly may be very overloaded. Right, and well understood. What I've put out there is a barebones skeleton, and there are lots of TODOs for these conveniences. I want to get it using SolrJ's API for request/response rather than the more internal stuff we're using now. Got your point. I just added a new patch at https://issues.apache.org/jira/browse/SOLR-620 that makes solrjs run again. It includes: - support for response wrapping - support for json wrap - adding the v. prefix to all request parameters for consistency reasons. I'm aware that some parts of this features may be achieved in a nicer way. As you know the SolrJ code better, thanks for your thoughts, I'll try also to dig into the SolrJ side to get a better picture. See this patch as a feature list I need for solrjs. matthias Erik
Re: Solr security
There was a patch by Sean Timm you should investigate as well. It limited a query so it would take a maximum of X seconds to execute, and would just return the rows it had found in that time. Feak, Todd wrote: I see value in this in the form of protecting the client from itself. For example, our Solr isn't accessible from the Internet. It's all behind firewalls. But, the client applications can make programming mistakes. I would love the ability to lock them down to a certain number of rows, just in case someone typos and puts in 1000 instead of 100, or the like. Admittedly, testing and QA should catch these things, but sometimes it's nice to put in a few safeguards to stop the obvious mistakes from occurring. -Todd Feak -Original Message- From: Matthias Epheser [mailto:[EMAIL PROTECTED] Sent: Monday, November 17, 2008 9:07 AM To: solr-user@lucene.apache.org Subject: Re: Solr security Ryan McKinley schrieb: however I have found that in any site where stability/load and uptime are a serious concern, this is better handled in a tier in front of java -- typically the loadbalancer / haproxy / whatever -- and managed by people more cautious then me. Full ack. What do you think about the only solr related thing left, the paramter filtering/blocking (eg. rows1000). Is this suitable to do it in a Filter delivered by solr? Of course as an optional alternative. ryan
Re: Solr security
http://issues.apache.org/jira/browse/SOLR-527 (An XML commit only request handler) is pertinent to this discussion as well. -Sean Ian Holsman wrote: There was a patch by Sean Timm you should investigate as well. It limited a query so it would take a maximum of X seconds to execute, and would just return the rows it had found in that time. Feak, Todd wrote: I see value in this in the form of protecting the client from itself. For example, our Solr isn't accessible from the Internet. It's all behind firewalls. But, the client applications can make programming mistakes. I would love the ability to lock them down to a certain number of rows, just in case someone typos and puts in 1000 instead of 100, or the like. Admittedly, testing and QA should catch these things, but sometimes it's nice to put in a few safeguards to stop the obvious mistakes from occurring. -Todd Feak -Original Message- From: Matthias Epheser [mailto:[EMAIL PROTECTED] Sent: Monday, November 17, 2008 9:07 AM To: solr-user@lucene.apache.org Subject: Re: Solr security Ryan McKinley schrieb: however I have found that in any site where stability/load and uptime are a serious concern, this is better handled in a tier in front of java -- typically the loadbalancer / haproxy / whatever -- and managed by people more cautious then me. Full ack. What do you think about the only solr related thing left, the paramter filtering/blocking (eg. rows1000). Is this suitable to do it in a Filter delivered by solr? Of course as an optional alternative. ryan
RE: Solr security
About that read-only switch for Solr: one of the basic HTTP design guidelines is that GET should only return values, and should never change the state of the data. All changes to the data should be made with POST. (In REST style guidelines, PUT, POST, and DELETE.) This prevents you from passing around URLs in email that can destroy the index. The first role of security is to prevent accidents. I would suggest two layers of read-only switch. 1) Open the Lucene index in read-only mode. 2) Allow only search servers to accept GET requests. Lance
Re: Solr security
I believe the Solr replication scripts require POSTing a commit to read in the new index--so at least limited POST capability is required in most scenarios. -Sean Lance Norskog wrote: About that read-only switch for Solr: one of the basic HTTP design guidelines is that GET should only return values, and should never change the state of the data. All changes to the data should be made with POST. (In REST style guidelines, PUT, POST, and DELETE.) This prevents you from passing around URLs in email that can destroy the index. The first role of security is to prevent accidents. I would suggest two layers of read-only switch. 1) Open the Lucene index in read-only mode. 2) Allow only search servers to accept GET requests. Lance
Updating schema.xml without deleting index?
I've tried searching for this answer all over but have found no results thus far. I am trying to add a new field to my schema.xml with a default value of 0. I have a ton of data indexed right now and it would be very hard to retrieve all of the original sources to rebuild my index. So my question is...is there any way to send a command to SOLR that tells it to re-index everything it has and include the new field I added? Thanks, Jeff
Re: Solr security
if thats the case putting apache in front of it would be handy. something like limit POST order deny,allow deny from all allow from 192.168.0.1 /limit might be helpful. Sean Timm wrote: I believe the Solr replication scripts require POSTing a commit to read in the new index--so at least limited POST capability is required in most scenarios. -Sean Lance Norskog wrote: About that read-only switch for Solr: one of the basic HTTP design guidelines is that GET should only return values, and should never change the state of the data. All changes to the data should be made with POST. (In REST style guidelines, PUT, POST, and DELETE.) This prevents you from passing around URLs in email that can destroy the index. The first role of security is to prevent accidents. I would suggest two layers of read-only switch. 1) Open the Lucene index in read-only mode. 2) Allow only search servers to accept GET requests. Lance
Re: Build Solr to run SolrJS
Erik Hatcher schrieb: On Nov 16, 2008, at 1:40 PM, Matthias Epheser wrote: Matthias and Ryan - let's get SolrJS integrated into contrib/velocity. Any objections/reservations? As SolrJS may be used without velocity at all (using eg. ClientSideWidgets), is it possible to put it into contrib/javascript and create a dependency to contrib/velocity for ServerSideWidgets? Sure, contrib/javascript sounds perfect. If that's ok, I'll have a look at the directory structure and the current ant build.xml to make them fit into the common solr structure and build. Awesome, thanks! Just uploaded solrjs.zip to https://issues.apache.org/jira/browse/SOLR-868. It is intended to be extracted in contrib/javascript and supports the following ant targets: * ant dist - creates a single js file and a jar that holds velocity templates. * ant docs - creates js docs. test in browser: doc/index.html * ant example-init - (depends ant dist on solr root) copies the current built of solr.war and solr-velocity.jar to example/testsolr/.. * ant example-start - starts the testsolr server on port 8983 * ant example-import - imports 3000 test data rows (requires a started testserver) Erik
RE: abt Multicore
Any suggestions? -Original Message- From: Nguyen, Joe Sent: Monday, November 17, 2008 9:40 Joe To: 'solr-user@lucene.apache.org' Subject: RE: abt Multicore Are all the documents in the same search space? That is, for a given query, could any of the 10MM docs be returned? If so, I don't think you need to worry about multicore. You may however need to put part of the index on various machines: http://wiki.apache.org/solr/DistributedSearch I also try to make decision whether going with muticore or distributed search. My concern is as follow: Does that mean having a single big schema with lot of fields? Distributed Search requires that each document must have a unique key. In this case, the unique key cannot be a primary key of a table. I wonder how Solr performs in this case (distributed search vs. multicore) 1. Distributed Search a. All documents are in a single index. Indexing a single document would lock the index and affect query performance? b. If multi machines are used, Solr will need to query each machine and merge the result. This also could impact performance. C. Support MoreLikeThis query given a document id. 2. Multicore a. Each table will be associated with a single core. Indexing a single document would lock only a specific core index. Thus,quering documents on other cores won't be impacted. B. Querying documents across multicore must be handle by the caller. C. Can't support MoreLikeThis query since document id from one core has no meaning on other cores. -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, November 17, 2008 6:09 Joe To: solr-user@lucene.apache.org Subject: Re: abt Multicore Are all the documents in the same search space? That is, for a given query, could any of the 10MM docs be returned? If so, I don't think you need to worry about multicore. You may however need to put part of the index on various machines: http://wiki.apache.org/solr/DistributedSearch ryan On Nov 17, 2008, at 3:47 AM, Raghunandan Rao wrote: Hi, I have an app running on weblogic and oracle. Oracle DB is quite huge; say some 10 millions of records. I need to integrate Solr for this and I am planning to use multicore. How can multicore feature can be at the best? -Raghu
Re: Regex Transformer Error
Hi All, Although the HTMLStripStandardTokenizerFactory will remove HTML tags, it will be stored in the index and needed to be removed while searching. In my case the HTML tags has no need at all. So I created HTMLStripTransformer for the DIH to remove the HTML tags and save space on the index. I have used the HTML parser included with Lucene ( org.apache.lucene.demo.html). It is well performing and worked with me (while working with Lucene before moving to Solr) What do you think? Does it worth contribution? My best wishes, Regards, Ahmed On Thu, Nov 6, 2008 at 2:39 AM, Norskog, Lance [EMAIL PROTECTED] wrote: There is a nice HTML stripper inside Solr. solr.HTMLStripStandardTokenizerFactory -Original Message- From: Ahmed Hammad [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 05, 2008 10:43 AM To: solr-user@lucene.apache.org Subject: Re: Regex Transformer Error Hi, It works with the attribute regex=lt;(.|\n)*?gt; Sorry for the disturbance. Regards, ahmd On Wed, Nov 5, 2008 at 8:18 PM, Ahmed Hammad [EMAIL PROTECTED] wrote: Hi, I am using Solr 1.3 data import handler. One of my table fields has html tags, I want to strip it of the field text. So obviously I need the Regex Transformer. I added transformer=RegexTransformer attribute to my entity and a new field with: field sourceColName=content column=content regex=English replaceWith=X/ Every thing works fine. The text is replace without any problem. The provlem happend with my regular experession to strip html tags. So I use regex=(.|\n)*?. Of course the charecters '' and '' are not allowed in XML. I tried the following regex=lt;(.|\n)*?gt; and regex=#3C;(.|\n)*?#3E; but I get the following error: The value of attribute regex associated with an element type field must not contain the '' character. at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source) ... The full stack trace is following: *FATAL: Could not create importer. DataImporter config invalid org.apache.solr.common.SolrException: FATAL: Could not create importer. DataImporter config invalid at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport Handler.java:114) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody (DataImportHandler.java:206) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle rBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter. java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter .java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli cationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi lterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa lve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa lve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja va:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja va:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv e.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java :286) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor .java:857) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.pro cess(Http11AprProtocol.java:565) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:150 9) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Exception occurred while initializing context Processing Document # at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp orter.java:176) at org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.ja va:93) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport Handler.java:106) ... 17 more Caused by: org.xml.sax.SAXParseException: The value of attribute regex associated with an element type field must not contain the '' character. at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unkn own Source) at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp orter.java:166) ... 19 more * *description* *The server encountered an internal error (FATAL: Could not create importer. DataImporter config invalid org.apache.solr.common.SolrException: FATAL: Could not create importer. DataImporter config invalid at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport Handler.java:114) at
Re: Solr security
trouble is, you can also GET /solr/update, even all on the URL, no request body... http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3ESTREAMED%3C/field%3E%3C/doc%3E%3C/add%3Ecommit=true Solr is a bad RESTafarian. Getting warmer! Erik On Nov 17, 2008, at 4:11 PM, Ian Holsman wrote: if thats the case putting apache in front of it would be handy. something like limit POST order deny,allow deny from all allow from 192.168.0.1 /limit might be helpful. Sean Timm wrote: I believe the Solr replication scripts require POSTing a commit to read in the new index--so at least limited POST capability is required in most scenarios. -Sean Lance Norskog wrote: About that read-only switch for Solr: one of the basic HTTP design guidelines is that GET should only return values, and should never change the state of the data. All changes to the data should be made with POST. (In REST style guidelines, PUT, POST, and DELETE.) This prevents you from passing around URLs in email that can destroy the index. The first role of security is to prevent accidents. I would suggest two layers of read-only switch. 1) Open the Lucene index in read-only mode. 2) Allow only search servers to accept GET requests. Lance
Re: Solr security
On Nov 17, 2008, at 4:20 PM, Erik Hatcher wrote: trouble is, you can also GET /solr/update, even all on the URL, no request body... http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3ESTREAMED%3C/field%3E%3C/doc%3E%3C/add%3Ecommit=true Solr is a bad RESTafarian. but with Ian's options in the apache config, this would not work... rather it would only work if stream.body was a POST Getting warmer! Erik On Nov 17, 2008, at 4:11 PM, Ian Holsman wrote: if thats the case putting apache in front of it would be handy. something like limit POST order deny,allow deny from all allow from 192.168.0.1 /limit might be helpful. Sean Timm wrote: I believe the Solr replication scripts require POSTing a commit to read in the new index--so at least limited POST capability is required in most scenarios. -Sean Lance Norskog wrote: About that read-only switch for Solr: one of the basic HTTP design guidelines is that GET should only return values, and should never change the state of the data. All changes to the data should be made with POST. (In REST style guidelines, PUT, POST, and DELETE.) This prevents you from passing around URLs in email that can destroy the index. The first role of security is to prevent accidents. I would suggest two layers of read-only switch. 1) Open the Lucene index in read-only mode. 2) Allow only search servers to accept GET requests. Lance
RE: Updating schema.xml without deleting index?
Don't know whether this would work... Just speculate :-) A. You'll need to create a new schema with the new field or you could use dynamic field in your current schema (assume you already config the default value to 0). B. Add a couple of new documents C. Run optimize script. Since optimize will consolidate all segments into a single segment. At the end, you'll have a single segment which include the new field. Would that work? -Original Message- From: Jeff Lerman [mailto:[EMAIL PROTECTED] Sent: Monday, November 17, 2008 12:45 Joe To: solr-user@lucene.apache.org Subject: Updating schema.xml without deleting index? I've tried searching for this answer all over but have found no results thus far. I am trying to add a new field to my schema.xml with a default value of 0. I have a ton of data indexed right now and it would be very hard to retrieve all of the original sources to rebuild my index. So my question is...is there any way to send a command to SOLR that tells it to re-index everything it has and include the new field I added? Thanks, Jeff
Re: Solr security
Ryan McKinley wrote: On Nov 17, 2008, at 4:20 PM, Erik Hatcher wrote: trouble is, you can also GET /solr/update, even all on the URL, no request body... http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3ESTREAMED%3C/field%3E%3C/doc%3E%3C/add%3Ecommit=true Solr is a bad RESTafarian. but with Ian's options in the apache config, this would not work... rather it would only work if stream.body was a POST location /solr/update order deny,allow deny from all allow from 192.168.0.1 /location ? or perhaps locationmatch.. but you get the picture. Getting warmer! Erik On Nov 17, 2008, at 4:11 PM, Ian Holsman wrote: if thats the case putting apache in front of it would be handy. something like limit POST order deny,allow deny from all allow from 192.168.0.1 /limit might be helpful. Sean Timm wrote: I believe the Solr replication scripts require POSTing a commit to read in the new index--so at least limited POST capability is required in most scenarios. -Sean Lance Norskog wrote: About that read-only switch for Solr: one of the basic HTTP design guidelines is that GET should only return values, and should never change the state of the data. All changes to the data should be made with POST. (In REST style guidelines, PUT, POST, and DELETE.) This prevents you from passing around URLs in email that can destroy the index. The first role of security is to prevent accidents. I would suggest two layers of read-only switch. 1) Open the Lucene index in read-only mode. 2) Allow only search servers to accept GET requests. Lance
Re: sole 1.3: bug in phps response writer
Hi Alok, I don't think it's a known issue and 2. a) sounds like the best and most appreciated approach! :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch From: Alok Dhir [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Monday, November 17, 2008 12:36:25 PM Subject: sole 1.3: bug in phps response writer Distributed queries: curl 'http://devxen0:8983/solr/core0/select?shards=search3:0,search3:8983/solr/core2version=2.2start=0rows=10q=instance%3Arit%5C-csm.symplicity.com+AND+label%3ALoginwt=php' curl 'http://devxen0:8983/solr/core0/select?shards=search3:0,search3:8983/solr/core2version=2.2start=0rows=10q=instance%3Arit%5C-csm.symplicity.com+AND+label%3ALoginwt=xml curl 'http://devxen0:8983/solr/core0/select?shards=search3:0,search3:8983/solr/core2version=2.2start=0rows=10q=instance%3Arit%5C-csm.symplicity.com+AND+label%3ALoginwt=json'' All work fine, providing identical results in their respective formats (note the change in the wt param). curl 'http://devxen0:8983/solr/core0/select?shards=search3:8983/solr/core0,search3:8983/solr/core2version=2.2start=0rows=10q=instance%3Arit%5C-csm.symplicity.com+AND+label%3ALoginwt=phps' fails with: java.lang.IllegalArgumentException: Map size must not be negative at org.apache.solr.request.PHPSerializedWriter.writeMapOpener(PHPSerializedResponseWriter.java:195) at org.apache.solr.request.JSONWriter.writeSolrDocument(JSONResponseWriter.java:392) at org.apache.solr.request.JSONWriter.writeSolrDocumentList(JSONResponseWriter.java:547) at org.apache.solr.request.TextResponseWriter.writeVal(TextResponseWriter.java:147) at org.apache.solr.request.JSONWriter.writeNamedListAsMapMangled(JSONResponseWriter.java:150) at org.apache.solr.request.PHPSerializedWriter.writeNamedList(PHPSerializedResponseWriter.java:71) at org.apache.solr.request.PHPSerializedWriter.writeResponse(PHPSerializedResponseWriter.java:66) at org.apache.solr.request.PHPSerializedResponseWriter.write(PHPSerializedResponseWriter.java:47) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Questions: 1) Is this known? I didn't see it in the issue treacker. 2) What's the better course of action: a) download source, fix, submit patch, wait for new relase; b) drop phps and use json instead? Thanks
Query Response Doc Score - Int Value
Hello, I am currently performing a query to a Solr index I've set up and I'm trying to 1) sort on the score and 2) sort on the date_created (a custom field I've added). The sort command looks like: sort=score+desc,created_date+desc. The gist of it is that I will 1) first return the most relevant results then 2) within those results, return the most recent results. However, the issue I have is that the score is a decimal value that is far to precise (e.g. 2.3518934 vs 2.2173865) and will therefore never collide and trigger the secondary sort on the date. The question I am asking is if anyone knows a way to produce a score that is more coarse, or if it is possible to force the score to return as an integer. That way I could have the results collide on the score more often and therefore sort on the date as well. Thanks! -Derek
Re: sole 1.3: bug in phps response writer
i find url not same as the others -- regards j.L
Re: Using properties from core configuration in data-config.xml
nope . It is not possible as of now. the placeholders are not aware of the core properties. Is it possible to pass the values as request params? Request parameters can be accessed . You can raise an issue and we can address this separately On Mon, Nov 17, 2008 at 7:57 PM, [EMAIL PROTECTED] wrote: Hello, is it possible to use properties from core configuration in data-config.xml? I want to define the baseDir for DataImportHandler. I tried the following configuration: *** solr.xml *** solr persistent=false cores adminPath='null' core name=core0 instanceDir=/opt/solr/cores/core0 property name=solrDataDir value=/opt/solr/cores/core0/data / property name=xmlDataDir value=/home/xml/core0 / /core ... /cores /solr *** data-config.xml *** dataConfig dataSource type=FileDataSource / document entity name=xmlFile processor=FileListEntityProcessor baseDir=${xmlDataDir} fileName=id-.*\.xml rootEntity=false dataSource=null entity name=data pk=id url=${xmlFile.fileAbsolutePath} processor=XPathEntityProcessor ... /dataConfig But this is the result: ... Nov 17, 2008 1:50:08 PM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Nov 17, 2008 1:50:08 PM org.apache.solr.core.SolrCore execute INFO: [posts-politics] webapp=/solr path=/dataimport params={optimize=truecommit=truecommand=full-importqt=/dataimportwt=javabinversion=2.2} status=0 QTime=66 Nov 17, 2008 1:50:08 PM org.apache.solr.core.SolrCore execute INFO: [posts-politics] webapp=/solr path=/dataimport params={qt=/dataimportwt=javabinversion=2.2} status=0 QTime=0 Nov 17, 2008 1:50:08 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [posts-politics] REMOVING ALL DOCUMENTS FROM INDEX Nov 17, 2008 1:50:08 PM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: 'baseDir' should point to a directory Processing Document # 1 at org.apache.solr.handler.dataimport.FileListEntityProcessor.init(FileListEntityProcessor.java:81) ... I tried also to configure all dataimport settings in solrconfig.xml, but I don't know how to do this exactly. Among other things, I tried this format: *** solrconfig.xml *** ... requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults lst name=datasource str name=typeFileDataSource/str lst name=document lst name=entity str name=namexmlFile/str str name=processorFileListEntityProcessor/str str name=baseDir${xmlDataDir}/str str name=fileNameid-.*\.xml/str str name=rootEntityfalse/str str name=dataSourcenull/str lst name=entity str name=namedata/str str name=pkid/str str name=url${xmlFile.fileAbsolutePath}/str ... /requestHandler ... But all my tests (with different dataimport formats in solrconfig.xml) failed: ... INFO: Reusing parent classloader Nov 17, 2008 2:18:14 PM org.apache.solr.common.SolrException log SEVERE: Error in solrconfig.xml:org.apache.solr.common.SolrException: No system property or default value specified for xmlFile.fileAbsolutePath at org.apache.solr.common.util.DOMUtil.substituteProperty(DOMUtil.java:311) at org.apache.solr.common.util.DOMUtil.substituteProperties(DOMUtil.java:264) ... Thanks again for your excellent support! Gisto -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer -- --Noble Paul
Re: Regex Transformer Error
On Tue, Nov 18, 2008 at 2:49 AM, Ahmed Hammad [EMAIL PROTECTED] wrote: Hi All, Although the HTMLStripStandardTokenizerFactory will remove HTML tags, it will be stored in the index and needed to be removed while searching. In my case the HTML tags has no need at all. So I created HTMLStripTransformer for the DIH to remove the HTML tags and save space on the index. I have used the HTML parser included with Lucene ( org.apache.lucene.demo.html). It is well performing and worked with me (while working with Lucene before moving to Solr) What do you think? Does it worth contribution? Yes. You can contribute this new transformer as an enhancement . My best wishes, Regards, Ahmed On Thu, Nov 6, 2008 at 2:39 AM, Norskog, Lance [EMAIL PROTECTED] wrote: There is a nice HTML stripper inside Solr. solr.HTMLStripStandardTokenizerFactory -Original Message- From: Ahmed Hammad [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 05, 2008 10:43 AM To: solr-user@lucene.apache.org Subject: Re: Regex Transformer Error Hi, It works with the attribute regex=lt;(.|\n)*?gt; Sorry for the disturbance. Regards, ahmd On Wed, Nov 5, 2008 at 8:18 PM, Ahmed Hammad [EMAIL PROTECTED] wrote: Hi, I am using Solr 1.3 data import handler. One of my table fields has html tags, I want to strip it of the field text. So obviously I need the Regex Transformer. I added transformer=RegexTransformer attribute to my entity and a new field with: field sourceColName=content column=content regex=English replaceWith=X/ Every thing works fine. The text is replace without any problem. The provlem happend with my regular experession to strip html tags. So I use regex=(.|\n)*?. Of course the charecters '' and '' are not allowed in XML. I tried the following regex=lt;(.|\n)*?gt; and regex=#3C;(.|\n)*?#3E; but I get the following error: The value of attribute regex associated with an element type field must not contain the '' character. at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source) ... The full stack trace is following: *FATAL: Could not create importer. DataImporter config invalid org.apache.solr.common.SolrException: FATAL: Could not create importer. DataImporter config invalid at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport Handler.java:114) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody (DataImportHandler.java:206) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle rBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter. java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter .java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli cationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi lterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa lve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa lve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja va:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja va:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv e.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java :286) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor .java:857) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.pro cess(Http11AprProtocol.java:565) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:150 9) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Exception occurred while initializing context Processing Document # at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp orter.java:176) at org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.ja va:93) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport Handler.java:106) ... 17 more Caused by: org.xml.sax.SAXParseException: The value of attribute regex associated with an element type field must not contain the '' character. at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unkn own Source) at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp orter.java:166) ... 19 more * *description* *The server encountered an internal error (FATAL: Could not create importer. DataImporter config invalid org.apache.solr.common.SolrException: FATAL: Could not create importer.
Re: Query Response Doc Score - Int Value
A function query is the likely candidate - no such quantization function exists, but it would be relatively easy to write one. -Yonik On Mon, Nov 17, 2008 at 8:17 PM, Derek Springer [EMAIL PROTECTED] wrote: Hello, I am currently performing a query to a Solr index I've set up and I'm trying to 1) sort on the score and 2) sort on the date_created (a custom field I've added). The sort command looks like: sort=score+desc,created_date+desc. The gist of it is that I will 1) first return the most relevant results then 2) within those results, return the most recent results. However, the issue I have is that the score is a decimal value that is far to precise (e.g. 2.3518934 vs 2.2173865) and will therefore never collide and trigger the secondary sort on the date. The question I am asking is if anyone knows a way to produce a score that is more coarse, or if it is possible to force the score to return as an integer. That way I could have the results collide on the score more often and therefore sort on the date as well. Thanks! -Derek
Re: abt Multicore
Some high level thoughts: On Mon, Nov 17, 2008 at 11:10 PM, Nguyen, Joe [EMAIL PROTECTED]wrote: Are all the documents in the same search space? That is, for a given query, could any of the 10MM docs be returned? If so, I don't think you need to worry about multicore. You may however need to put part of the index on various machines: http://wiki.apache.org/solr/DistributedSearch I also try to make decision whether going with muticore or distributed search. My concern is as follow: Does that mean having a single big schema with lot of fields? Yes and that's the use-case behind multi-valued fields. De-normalizing and avoiding joins helps to scale. Distributed Search requires that each document must have a unique key. In this case, the unique key cannot be a primary key of a table. I wonder how Solr performs in this case (distributed search vs. multicore) 1. Distributed Search a. All documents are in a single index. Indexing a single document would lock the index and affect query performance? Indexing does not lock out searchers. Solr is designed to be queried regardless of indexing. However, depending on your machine's performance and your configuration, you may see slow queries during commits/auto-warming. Also, in distributed search, you have different Solr instances handling disjoint sets of data. Indexing on one instance does not affect the rest. b. If multi machines are used, Solr will need to query each machine and merge the result. This also could impact performance. Yes, but in most scenarios where distributed search is required, it is just not possible to use a single box for the while index. If you set out to write similar kind of querying for multi-cores, it will be difficult to optimize it as well as Solr's implementation. C. Support MoreLikeThis query given a document id. MoreLikeThis is not implemented for distributed environments (yet). 2. Multicore a. Each table will be associated with a single core. Indexing a single document would lock only a specific core index. Thus,quering documents on other cores won't be impacted. With multi-core, all cores are on a single box, you may see slow queries on other cores too (again, it depends on your box's strength). B. Querying documents across multicore must be handle by the caller. That is not a use-case for which Lucene/Solr were designed. Joins are discouraged most of the times. C. Can't support MoreLikeThis query since document id from one core has no meaning on other cores. MoreLikeThis makes no sense in this case because the document structure (schema) is totally different. -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, November 17, 2008 6:09 Joe To: solr-user@lucene.apache.org Subject: Re: abt Multicore Are all the documents in the same search space? That is, for a given query, could any of the 10MM docs be returned? If so, I don't think you need to worry about multicore. You may however need to put part of the index on various machines: http://wiki.apache.org/solr/DistributedSearch ryan On Nov 17, 2008, at 3:47 AM, Raghunandan Rao wrote: Hi, I have an app running on weblogic and oracle. Oracle DB is quite huge; say some 10 millions of records. I need to integrate Solr for this and I am planning to use multicore. How can multicore feature can be at the best? -Raghu -- Regards, Shalin Shekhar Mangar.
Re: Using properties from core configuration in data-config.xml
There may be one way to do this. Add your property in the invariant section of solrconfig's DataImportHandler element. For example, add this section: lst name=invariants str name=xmlDataDir${xmlDataDir}/str /lst Then you can use it as ${dataimporter.request.xmlDataDir} in your data-config to access this. On Tue, Nov 18, 2008 at 9:17 AM, Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] wrote: nope . It is not possible as of now. the placeholders are not aware of the core properties. Is it possible to pass the values as request params? Request parameters can be accessed . You can raise an issue and we can address this separately On Mon, Nov 17, 2008 at 7:57 PM, [EMAIL PROTECTED] wrote: Hello, is it possible to use properties from core configuration in data-config.xml? I want to define the baseDir for DataImportHandler. I tried the following configuration: *** solr.xml *** solr persistent=false cores adminPath='null' core name=core0 instanceDir=/opt/solr/cores/core0 property name=solrDataDir value=/opt/solr/cores/core0/data / property name=xmlDataDir value=/home/xml/core0 / /core ... /cores /solr *** data-config.xml *** dataConfig dataSource type=FileDataSource / document entity name=xmlFile processor=FileListEntityProcessor baseDir=${xmlDataDir} fileName=id-.*\.xml rootEntity=false dataSource=null entity name=data pk=id url=${xmlFile.fileAbsolutePath} processor=XPathEntityProcessor ... /dataConfig But this is the result: ... Nov 17, 2008 1:50:08 PM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Nov 17, 2008 1:50:08 PM org.apache.solr.core.SolrCore execute INFO: [posts-politics] webapp=/solr path=/dataimport params={optimize=truecommit=truecommand=full-importqt=/dataimportwt=javabinversion=2.2} status=0 QTime=66 Nov 17, 2008 1:50:08 PM org.apache.solr.core.SolrCore execute INFO: [posts-politics] webapp=/solr path=/dataimport params={qt=/dataimportwt=javabinversion=2.2} status=0 QTime=0 Nov 17, 2008 1:50:08 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [posts-politics] REMOVING ALL DOCUMENTS FROM INDEX Nov 17, 2008 1:50:08 PM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: 'baseDir' should point to a directory Processing Document # 1 at org.apache.solr.handler.dataimport.FileListEntityProcessor.init(FileListEntityProcessor.java:81) ... I tried also to configure all dataimport settings in solrconfig.xml, but I don't know how to do this exactly. Among other things, I tried this format: *** solrconfig.xml *** ... requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults lst name=datasource str name=typeFileDataSource/str lst name=document lst name=entity str name=namexmlFile/str str name=processorFileListEntityProcessor/str str name=baseDir${xmlDataDir}/str str name=fileNameid-.*\.xml/str str name=rootEntityfalse/str str name=dataSourcenull/str lst name=entity str name=namedata/str str name=pkid/str str name=url${xmlFile.fileAbsolutePath}/str ... /requestHandler ... But all my tests (with different dataimport formats in solrconfig.xml) failed: ... INFO: Reusing parent classloader Nov 17, 2008 2:18:14 PM org.apache.solr.common.SolrException log SEVERE: Error in solrconfig.xml:org.apache.solr.common.SolrException: No system property or default value specified for xmlFile.fileAbsolutePath at org.apache.solr.common.util.DOMUtil.substituteProperty(DOMUtil.java:311) at org.apache.solr.common.util.DOMUtil.substituteProperties(DOMUtil.java:264) ... Thanks again for your excellent support! Gisto -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer -- --Noble Paul -- Regards, Shalin Shekhar Mangar.
Re: Solr security
If the user is using the new java Solr replication then he can get rid of the /update and /update/csv handlers altogether. So the slaves are completely read-only --Noble On Tue, Nov 18, 2008 at 2:14 AM, Sean Timm [EMAIL PROTECTED] wrote: I believe the Solr replication scripts require POSTing a commit to read in the new index--so at least limited POST capability is required in most scenarios. -Sean Lance Norskog wrote: About that read-only switch for Solr: one of the basic HTTP design guidelines is that GET should only return values, and should never change the state of the data. All changes to the data should be made with POST. (In REST style guidelines, PUT, POST, and DELETE.) This prevents you from passing around URLs in email that can destroy the index. The first role of security is to prevent accidents. I would suggest two layers of read-only switch. 1) Open the Lucene index in read-only mode. 2) Allow only search servers to accept GET requests. Lance -- --Noble Paul
Re: Solr security
: Full ack. What do you think about the only solr related thing left, the : paramter filtering/blocking (eg. rows1000). Is this suitable to do it in a : Filter delivered by solr? Of course as an optional alternative. : As eric mentioned earlier, this could be done in a QueryComponent -- the : prepare part could just make sure the query parameters are all within : reasonable ranges. This seems like something reasonable to add to solr. i don't even see it requiring a new component -- the existing QueryComponent could treat this similar to the way the DismaxQParser deals with q and q.alt ... add two new params: start.max and rows.max that default to some very large values; QueryComponent respects start rows only as long as they don't exceed the corrisponding max; peoples that want ot lock down their ports can make them invariants for the handlers that are exposed. -Hoss
Re: Query Response Doc Score - Int Value
Thanks for the heads up. Can anyone point me to (or provide me with) an example of writing a function query? -Derek On Mon, Nov 17, 2008 at 8:17 PM, Yonik Seeley [EMAIL PROTECTED] wrote: A function query is the likely candidate - no such quantization function exists, but it would be relatively easy to write one. -Yonik On Mon, Nov 17, 2008 at 8:17 PM, Derek Springer [EMAIL PROTECTED] wrote: Hello, I am currently performing a query to a Solr index I've set up and I'm trying to 1) sort on the score and 2) sort on the date_created (a custom field I've added). The sort command looks like: sort=score+desc,created_date+desc. The gist of it is that I will 1) first return the most relevant results then 2) within those results, return the most recent results. However, the issue I have is that the score is a decimal value that is far to precise (e.g. 2.3518934 vs 2.2173865) and will therefore never collide and trigger the secondary sort on the date. The question I am asking is if anyone knows a way to produce a score that is more coarse, or if it is possible to force the score to return as an integer. That way I could have the results collide on the score more often and therefore sort on the date as well. Thanks! -Derek -- Derek B. Springer Software Developer Mahalo.com, Inc. 902 Colorado Ave., Santa Monica, CA 90401 [EMAIL PROTECTED]
Use SOLR like the MySQL LIKE
Hello. The data: I have a dataset containing ~500.000 documents. In each document there is an email, a name and an user ID. The problem: I would like to be able to search in it, but it should be like the MySQL LIKE. So when a user enters the search term: carsten, then the query looks like: name:(carsten) OR name:(carsten*) OR email:(carsten) OR email:(carsten*) OR userid:(carsten) OR userid:(carsten*) Then it should match: carsten l carsten larsen Carsten Larsen Carsten CARSTEN etc. And when the user enters the term: carsten l the query looks like: name:(carsten l) OR name:(carsten l*) OR email:(carsten l) OR email:(carsten l*) OR userid:(carsten l) OR userid:(carsten l*) Then it should match: carsten l carsten larsen Carsten Larsen Or written to the MySQL syntax: ... WHERE `name` LIKE 'carsten%' OR `email` LIKE 'carsten%' OR `userid` LIKE 'carsten%'... I know that I need to use the solr.LowerCaseTokenizerFactory on my name and email field, to ensure case insentitive behavior. The problem seems to be the wildcards and the whitespaces. -- View this message in context: http://www.nabble.com/Use-SOLR-like-the-%22MySQL-LIKE%22-tp20554732p20554732.html Sent from the Solr - User mailing list archive at Nabble.com.