Re: Reporting tools
On 9 March 2012 09:05, Donald Organ dor...@donaldorgan.com wrote: Are there any reporting tools out there? So I can analyzer search term frequency, filter frequency, etc? Do not have direct experience of any Solr reporting tool, but please see the Solr StatsComponent: http://wiki.apache.org/solr/StatsComponent This should provide you with data on the Solr index. Regards, Gora
Re: Reporting tools
as Gora says there is the stats component you can take advantage of or you could also use JMX directly [1] or LucidGaze [2][3] or commercial services like [4] or [5] (these are the ones I know but there may be also others), each of them with different level/type of service. Tommaso [1] : http://wiki.apache.org/solr/SolrJmx [2] : http://www.lucidimagination.com/blog/2009/08/24/lucid-gaze-for-lucene/ [3] : http://www.chrisumbel.com/article/monitoring_solr_lucidgaze [4] : http://sematext.com/search-analytics/index.html [5] : http://newrelic.com/ 2012/3/9 Donald Organ dor...@donaldorgan.com Are there any reporting tools out there? So I can analyzer search term frequency, filter frequency, etc?
Re: docBoost with fq search
Hi Ahmet, thanks for the answer. I'm really suprised because I always thought docBoost as a kind of sorting tool. And I used in that way, I'm giving big boost to the documents I want back first in search. Do you think there is a trick to force the usage of docBoost in my special case? Gian Marco On Wed, Mar 7, 2012 at 2:51 PM, Ahmet Arslan iori...@yahoo.com wrote: --- On Wed, 3/7/12, Gian Marco Tagliani gm.tagli...@gmail.com wrote: From: Gian Marco Tagliani gm.tagli...@gmail.com Subject: docBoost with fq search To: solr-user@lucene.apache.org Date: Wednesday, March 7, 2012, 3:11 PM Hi All, I'm seeing strange behavior with my Solr (version 3.4). For searching I'm using the q and the fq params. At index-time I'm adding a docBoost to each document. When I perform a search with both q and fq params everything works. For the search with q=*:* and something in the fq, it seems to me that the docBoost in not taken into consideration. Is that possible? Yes possible. FilterQuery (fq) does not contribute to score. It is not used in score calculation. MatchAllDocsQuery (*:*) is a fast way to return all docs. Adding fl=scoredebugQuery=on will show that all docs will get constant score of 1.0.
Re: docBoost with fq search
Hi Gian Marco, I don't know if it's possible to exploit documents' boost values from function queries (see http://wiki.apache.org/solr/FunctionQuery), but if you store your boost in a search-able numeric field, you could either : do q=*:* AND _val_:your_boost_field if you're using default query; or q=*:*defType=edismaxbf=your_boost_field if you're using edismax. That will give scores to a MatchAllDocsQuery (*:*) . Hope this helps, -- Tanguy Le 09/03/2012 10:25, Gian Marco Tagliani a écrit : Hi Ahmet, thanks for the answer. I'm really suprised because I always thought docBoost as a kind of sorting tool. And I used in that way, I'm giving big boost to the documents I want back first in search. Do you think there is a trick to force the usage of docBoost in my special case? Gian Marco On Wed, Mar 7, 2012 at 2:51 PM, Ahmet Arslaniori...@yahoo.com wrote: --- On Wed, 3/7/12, Gian Marco Taglianigm.tagli...@gmail.com wrote: From: Gian Marco Taglianigm.tagli...@gmail.com Subject: docBoost with fq search To: solr-user@lucene.apache.org Date: Wednesday, March 7, 2012, 3:11 PM Hi All, I'm seeing strange behavior with my Solr (version 3.4). For searching I'm using the q and the fq params. At index-time I'm adding a docBoost to each document. When I perform a search with both q and fq params everything works. For the search with q=*:* and something in the fq, it seems to me that the docBoost in not taken into consideration. Is that possible? Yes possible. FilterQuery (fq) does not contribute to score. It is not used in score calculation. MatchAllDocsQuery (*:*) is a fast way to return all docs. Adding fl=scoredebugQuery=on will show that all docs will get constant score of 1.0.
Re: Reporting tools
Are there any reporting tools out there? So I can analyzer search term frequency, filter frequency, etc? You might be interested in this : http://www.sematext.com/search-analytics/index.html
Re: indexing bigdata
It very much depends on your data and also what query features you will use. How many fields, the size of each field, how many unique values per field, how many fields are stored vs. only indexed, etc. I have a system with 3+ billion does, and each instance (each index core) has 120million docs and it flies. But the documents are tiny only 3 fields each, and the search is very simple single keyword match. On another system we only have 7 million docs per instance and it is slower because documents are much much larger with many more fields, and we do a lot of faceting and other advanced search features. Also other factors such as what type of features you will use for search (faceting, field collapsing, wildcard queries, etc.) can all increase search time vs. just simple keyword search. Unfortunately it is one of those things you need to try it out to really get an answer IMO. On Mar 8, 2012, at 11:39 PM, Sharath Jagannath wrote: Ok, My bad. I should have put it in a better way. Is it good idea to have all the 30M docs on a single instance, or should I consider distributed set-up. I have synthesized the data and the have configured schema and have made suitable changes to the config. Have tested out with a smaller data-set on my laptop and have a good work flow set-up. I do not have a big machine and test it out. Wanted to make sure I have insight in either option I have before I decide to spin-up an amazon instance. Thanks, Sharath On Thu, Mar 8, 2012 at 6:18 PM, Erick Erickson erickerick...@gmail.comwrote: Your question is really unanswerable, there are about a zillion factors that could influence the answer. I can index 5-7K docs/second so it's efficient. Others can index only a fraction of that. It all depends... Try it and see is about the only way to answer. Best Erick On Thu, Mar 8, 2012 at 1:35 PM, Sharath Jagannath shotsonclo...@gmail.com wrote: Is indexing around 30 Million documents in a single solr instance efficient? Has somebody experimented it? Planning to use it for an autosuggest feature I am implementing, so expecting the response in few milliseconds. Should I be looking at sharding? Thanks, Sharath
Re: Geolocation in SOLR with PHP application
A quick, bump, I could really do with some input on this please. -- View this message in context: http://lucene.472066.n3.nabble.com/Geolocation-in-SOLR-with-PHP-application-tp3807120p3812364.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reporting tools
(12/03/09 12:35), Donald Organ wrote: Are there any reporting tools out there? So I can analyzer search term frequency, filter frequency, etc? You may be interested in: Free Query Log Visualizer for Apache Solr http://soleami.com/ koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: docBoost with fq search
if you store your boost in a search-able numeric field... You can simply sort by that field too. q=*:*sort=your_boost_field desc
Multithreaded DIH giving Operation not allowed after ResultSet closed solr 3.5
When i try running a multi threaded DIH in solr 3.5 I get the following error Operation not allowed after ResultSet closed . I have multiple entities mapped to fields, after the first query finishes i get this error for every other query thats been mentioned in my data-config.xml file. I have mentioned the first entity as the root entity and have given the threads parameter as 4. I can attach the file if required if you need to understand better. Any help would be appreciated. Regards, Rohit K
Multicore -Create new Core request errors
Hello, When I issue this query to create a new Solr Core , I get the error message HTTP Status 500 - Can't find resource 'solrconfig.xml' in classpath or '/home/searchuser/searchinstances/multi_core_prototype/solr/conf/ http:// server_ip:port/multi_core_prototype/admin/cores?action=CREATEname=coreXinstanceDir=/home/searchuser/searchinstances/multi_core_prototype/solr/coreX I believe that the schema and and solrcongfig are optional. I have the default cores - core0 and core1 in solr1.3 version. what should be the path of solrconfig ,shld it refer to path of the schema in existing core and can I expect to see the conf folder in the new core? Regards Sujatha
Re: Multithreaded DIH giving Operation not allowed after ResultSet closed solr 3.5
Hello, AFAIK DIH is not multi-threaded at all. see https://issues.apache.org/jira/browse/SOLR-3011 Regards On Fri, Mar 9, 2012 at 4:22 PM, Rohit Khanna getafix@gmail.com wrote: When i try running a multi threaded DIH in solr 3.5 I get the following error Operation not allowed after ResultSet closed . I have multiple entities mapped to fields, after the first query finishes i get this error for every other query thats been mentioned in my data-config.xml file. I have mentioned the first entity as the root entity and have given the threads parameter as 4. I can attach the file if required if you need to understand better. Any help would be appreciated. Regards, Rohit K -- Sincerely yours Mikhail Khludnev Lucid Certified Apache Lucene/Solr Developer Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Geolocation in SOLR with PHP application
Hi, Take a look at http://wiki.apache.org/solr/SpatialSearch Then from php, you need to pass the right parameters as described in the link above. On Fri, Mar 9, 2012 at 8:00 AM, Spadez james_will...@hotmail.com wrote: A quick, bump, I could really do with some input on this please. -- View this message in context: http://lucene.472066.n3.nabble.com/Geolocation-in-SOLR-with-PHP-application-tp3807120p3812364.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Stemmer Question
I'd be very interested to see how you did this if it is available. Does this seem like something useful to the community at large? I PMed it to you. Filter is not a big deal. Just modified from {@link org.apache.lucene.wordnet.SynonymTokenFilter}. If requested, I can provide it publicly too.
RE: Solr DIH and $deleteDocById
This (almost) sounds like https://issues.apache.org/jira/browse/SOLR-2492 which was fixed in Solr 3.4 .. Are you on an earlier version? But maybe not, because you're seeing the # deleted documents increment, and prior to this bug fix (I think) the deleted counter wasn't getting incremented either. Perhaps this is a related bug that only happens when the deletes are added via a transformer? Try a query like this without a transformer: select uniqueID as '$deleteDocById' from table where uniqueID = '1-devpeter-1'; Does this work? If so, you've probably stumbled on a new bug related to SOLR-2492. In any case, the workaround (probably) is to manually issue a commit after doing your deletes. Or, combine your deletes with add/updates in the same DIH run and it should commit automatically as configured. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Peter Boudreau [mailto:pe...@makeshop.jp] Sent: Friday, March 09, 2012 2:22 AM To: solr-user@lucene.apache.org Subject: Solr DIH and $deleteDocById Hello everyone, I've got Solr DIH up and running with no problems as far as importing data, but I'm now trying to add some functionality to our delta import to delete invalid records. The special command $deleteDocById seems to provide what I'm looking for, and just for testing purposes until I get things working, I setup a simple transformer to delete just one document with a specific ID: script ![CDATA[ function deleteBadDocs(row) { var uniqueID = row.get('unique_id'); if(uniqueID == '1-devpeter-1') { row.put('$deleteDocById', uniqueID); } return row; } ]] /script When I run DIH with this, sure enough, it tells me that 1 document was deleted: Indexing completed. Added/Updated: 4755 documents. Deleted 1 documents. But then when I search the index, the document is still there. I've been googling this for a while now, and found a number of references saying that you need to commit or optimize after this in order for the deletes to take effect, but I was under the impression that DIH both commits and optimizes by default, so shouldn't it be getting committed and optimized automatically by DIH? I even tried implicitly setting the commit= and optimize= flags to true, but still, the deleted document was still in the index when I searched. I also tried restarting Solr, but the deleted document was still there. Could anyone help me understand why this document which is being reported as deleted still shows up in the index? Also, there is one thing which I'm unclear on after reading the Solr wiki: $deleteDocById : Delete a doc from Solr with this id. The value has to be the uniqueKey value of the document. Note that this command can only delete docs already committed to the index. I was starting to think that maybe $deleteDocById was only preventing documents from entering the index, and not deleting existing documents which were already in the index, but if I understand this correctly, $deleteDocById should be able to delete a document which was already in the index *before* running DIH, right? Any help would be very much appreciated. Thanks in advance, Peter
does solr have a mechanism for intercepting requests - before they are handed off to a request handler
hello all, does solr have a mechanism that could intercept a request (before it is handed off to a request handler). the intent (from the business) is to send in a generic request - then pre-parse the url and send it off to a specific request handler. thank you, mark -- View this message in context: http://lucene.472066.n3.nabble.com/does-solr-have-a-mechanism-for-intercepting-requests-before-they-are-handed-off-to-a-request-handler-tp3813255p3813255.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to rank an exact match higher?
Here's one way to do it using dismax. 1. You'll have two fields. title_text which is has a type of TextField title_string which has type String. This is an exact match field. 2. Set the dismax qf=title_string^10 title_text^1 You could even make this better by doing also handling infix searches. Create a field title_ngram which uses the ngram type. Set dismax qf = title_string^10 title_text^5 title_ngram^1 -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-rank-an-exact-match-higher-tp3802871p3813327.html Sent from the Solr - User mailing list archive at Nabble.com.
Lucene vs Solr design decision
Hi everybody, Let's say we have a system with billions of small documents (average of 2-3 fields). and each document belongs to JUST ONE user and searches are user specific, meaning that when we search for something, we just look into documents of that user. On the other hand we need to see the newly added documents as soon as they are added to the indexes. Now I think we have two solutions: 1. Use Lucene directly and create a separate index file for each user 2. Use Solr and store all of the users' data all together in one HUGE index file the benefit of using Lucene is that each commit() will take less time comparing to the case that we use Solr. Is there any suggested solution for cases like this? Thanks -- Alireza Salimi Java EE Developer
Re: Lucene vs Solr design decision
Solr has cores which are independent search indexes. You could create a separate core per user. -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813489.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Lucene vs Solr design decision
Sorry I didn't mention that, the number of users can be millions! Meaning that millions of cores! So I'm not sure if it's a good idea. On Fri, Mar 9, 2012 at 1:35 PM, Lan dung@gmail.com wrote: Solr has cores which are independent search indexes. You could create a separate core per user. -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813489.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alireza Salimi Java EE Developer
DIH - FileListEntityProcessor reading from Multiple Disk Directories
All, I have an application that has RDF files in multiple subdirectories under a root directory. I'm using the DIH with a FileListEntityProcessor to load the index. All worked fine when the files were in a single directory, but I can't seem to figure out how to make a single data-config.xml read multiple directories. The baseDir attribute seems to allow only a single absolute path. I tried multiple document elements with a different baseDir for each FileListEntityProcessor, but it only executed the first one. Is there an easy way to do this, short of running multiple imports and changing baseDir for each? Thanks, Mike Mike Rawlins Sr. Software Engineer Chair, ASC X12 Technical Assessment Subcommittee 18111 Preston Road, Suite 600 Dallas, TX 75252 +1 972.643.3101 direct mike.rawl...@gxs.commailto:mike.rawlins@mike.rawl...@inovis.com www.gxs.comhttp://www.inovis.com/ GXS Bloghttp://blogs.inovis.com/ [GXS_2color_pos]
Re: Lucene vs Solr design decision
Solr has no limitation on the number of cores. It's limited by your hardware, inodes and how many files you could keep open. I think even if you went the Lucene route you would run into same hardware limits. -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Lucene vs Solr design decision
millions of cores will not work... ...yet. -glen On Fri, Mar 9, 2012 at 1:46 PM, Lan dung@gmail.com wrote: Solr has no limitation on the number of cores. It's limited by your hardware, inodes and how many files you could keep open. I think even if you went the Lucene route you would run into same hardware limits. -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html Sent from the Solr - User mailing list archive at Nabble.com. -- - http://zzzoot.blogspot.com/ -
Re: Lucene vs Solr design decision
probably, and besides that, how can I use the features that SolrCloud provides (i.e. high availability and distribution)? The other solution would be to use SolrCloud and keep all of the users' information in single collection and use NRT. But on the other hand the frequency of updates on that big collection will be high. Do you think it makes sense? On Fri, Mar 9, 2012 at 2:02 PM, Glen Newton glen.new...@gmail.com wrote: millions of cores will not work... ...yet. -glen On Fri, Mar 9, 2012 at 1:46 PM, Lan dung@gmail.com wrote: Solr has no limitation on the number of cores. It's limited by your hardware, inodes and how many files you could keep open. I think even if you went the Lucene route you would run into same hardware limits. -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html Sent from the Solr - User mailing list archive at Nabble.com. -- - http://zzzoot.blogspot.com/ - -- Alireza Salimi Java EE Developer
Re: Lucene vs Solr design decision
Split up index into say 100 cores, and then route each search to a specific core by some mod operator on the user id: core_number = userid % num_cores core_name = core+core_number That way each index core is relatively small (maybe 100 million docs or less). On Mar 9, 2012, at 2:02 PM, Glen Newton wrote: millions of cores will not work... ...yet. -glen On Fri, Mar 9, 2012 at 1:46 PM, Lan dung@gmail.com wrote: Solr has no limitation on the number of cores. It's limited by your hardware, inodes and how many files you could keep open. I think even if you went the Lucene route you would run into same hardware limits. -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html Sent from the Solr - User mailing list archive at Nabble.com. -- - http://zzzoot.blogspot.com/ -
Re: Lucene vs Solr design decision
This solution makes sense, but I still don't know if I can use solrCloud with this configuration or not. On Fri, Mar 9, 2012 at 2:06 PM, Robert Stewart bstewart...@gmail.comwrote: Split up index into say 100 cores, and then route each search to a specific core by some mod operator on the user id: core_number = userid % num_cores core_name = core+core_number That way each index core is relatively small (maybe 100 million docs or less). On Mar 9, 2012, at 2:02 PM, Glen Newton wrote: millions of cores will not work... ...yet. -glen On Fri, Mar 9, 2012 at 1:46 PM, Lan dung@gmail.com wrote: Solr has no limitation on the number of cores. It's limited by your hardware, inodes and how many files you could keep open. I think even if you went the Lucene route you would run into same hardware limits. -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html Sent from the Solr - User mailing list archive at Nabble.com. -- - http://zzzoot.blogspot.com/ - -- Alireza Salimi Java EE Developer
RE: DIH - FileListEntityProcessor reading from Multiple Disk Directories
Did you try setting baseDir to the root directory and recursive to true ? (see http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor for more information). James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 From: mike.rawl...@gxs.com [mailto:mike.rawl...@gxs.com] Sent: Friday, March 09, 2012 12:44 PM To: solr-user@lucene.apache.org Subject: DIH - FileListEntityProcessor reading from Multiple Disk Directories All, I have an application that has RDF files in multiple subdirectories under a root directory. I'm using the DIH with a FileListEntityProcessor to load the index. All worked fine when the files were in a single directory, but I can't seem to figure out how to make a single data-config.xml read multiple directories. The baseDir attribute seems to allow only a single absolute path. I tried multiple document elements with a different baseDir for each FileListEntityProcessor, but it only executed the first one. Is there an easy way to do this, short of running multiple imports and changing baseDir for each? Thanks, Mike Mike Rawlins Sr. Software Engineer Chair, ASC X12 Technical Assessment Subcommittee 18111 Preston Road, Suite 600 Dallas, TX 75252 +1 972.643.3101 direct mike.rawl...@gxs.commailto:mike.rawlins@mike.rawl...@inovis.com www.gxs.comhttp://www.inovis.com/ GXS Bloghttp://blogs.inovis.com/ [cid:image001.gif@01CCFDF2.39D86E20]
Re: Lucene vs Solr design decision
On the other hand, I'm aware of the fact that if I go with Lucene approach, failover is something that I will have to support manually! which is a nightmare! On Fri, Mar 9, 2012 at 2:13 PM, Alireza Salimi alireza.sal...@gmail.comwrote: This solution makes sense, but I still don't know if I can use solrCloud with this configuration or not. On Fri, Mar 9, 2012 at 2:06 PM, Robert Stewart bstewart...@gmail.comwrote: Split up index into say 100 cores, and then route each search to a specific core by some mod operator on the user id: core_number = userid % num_cores core_name = core+core_number That way each index core is relatively small (maybe 100 million docs or less). On Mar 9, 2012, at 2:02 PM, Glen Newton wrote: millions of cores will not work... ...yet. -glen On Fri, Mar 9, 2012 at 1:46 PM, Lan dung@gmail.com wrote: Solr has no limitation on the number of cores. It's limited by your hardware, inodes and how many files you could keep open. I think even if you went the Lucene route you would run into same hardware limits. -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html Sent from the Solr - User mailing list archive at Nabble.com. -- - http://zzzoot.blogspot.com/ - -- Alireza Salimi Java EE Developer -- Alireza Salimi Java EE Developer
Re: Upgrade solr
Take a look at the solr/CHANGES.txt file. Each release has an Upgrading from section, the one you're interested in is Upgrading from Solr 1.4 in the 3.1.0 section, and then the ones that are in subsequent sections. Of course I'd try it on a copy of my index first... If at all possible, the easiest way is to re-index your data. Best Erick On Fri, Mar 9, 2012 at 4:10 AM, Abhishek tiwari abhishek.tiwari@gmail.com wrote: Can some help me how to Upgrade my solr from 1.4? what step we need to take..
Re: Stemmer Question
Ok, so I'm digging through the code and I noticed in org.apache.lucene.analysis.synonym.SynonymFilter there are mentions of a keepOrig attribute. Doing some googling led me to http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters which speaks of an attribute preserveOriginal=1 on solr.WordDelimiterFilterFactory. So it seems like I can get the functionality I am looking for by setting preserveOriginal, is that correct? On Fri, Mar 9, 2012 at 9:53 AM, Ahmet Arslan iori...@yahoo.com wrote: I'd be very interested to see how you did this if it is available. Does this seem like something useful to the community at large? I PMed it to you. Filter is not a big deal. Just modified from {@link org.apache.lucene.wordnet.SynonymTokenFilter}. If requested, I can provide it publicly too.
Re: Stemmer Question
Further digging leads me to believe this is not the case. The Synonym Filter supports this, but the Stemming Filter does not. Ahmet, Would you be willing to provide your filter as well? I wonder if we can make it aware of the preserveOriginal attribute on WordDelimterFilterFactory? On Fri, Mar 9, 2012 at 2:27 PM, Jamie Johnson jej2...@gmail.com wrote: Ok, so I'm digging through the code and I noticed in org.apache.lucene.analysis.synonym.SynonymFilter there are mentions of a keepOrig attribute. Doing some googling led me to http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters which speaks of an attribute preserveOriginal=1 on solr.WordDelimiterFilterFactory. So it seems like I can get the functionality I am looking for by setting preserveOriginal, is that correct? On Fri, Mar 9, 2012 at 9:53 AM, Ahmet Arslan iori...@yahoo.com wrote: I'd be very interested to see how you did this if it is available. Does this seem like something useful to the community at large? I PMed it to you. Filter is not a big deal. Just modified from {@link org.apache.lucene.wordnet.SynonymTokenFilter}. If requested, I can provide it publicly too.
Re: Time Stats
The answer is so easy. Just need to create an index with each visit. In this way I could use faceted date search to create time statistics. flats for rent new york at 1/12/2011 = bounce_rate=48.6% flats for rent new york at 1/1/2012 = bounce_rate=49.7% flats for rent new york at 1/2/2012 = bounce_rate=46.4% date:[1/12/2011 - 1/1/2012] flats for rent new york at 1/12/2011 = bounce_rate=48.6% flats for rent new york at 1/1/2012 = bounce_rate=49.7% mean=49.15% date:[1/1/2012 - 1/2/2012] flats for rent new york at 1/1/2012 = bounce_rate=49.7% flats for rent new york at 1/2/2012 = bounce_rate=46.4% mean=49.05% With my initial approach I would save some disk and memory space. I'm still wondering if it is possible. 2012/2/27 Raimon Bosch raimon.bo...@gmail.com Anyone up to provide an answer? The idea is have a kind of CustomInteger compound by an array of timestamps. The value shown in this field would be based in the date range that you're sending. Biggest problem will be that this field would be in all the documents on your solr index so you need to calculate this number in real-time. 2012/2/26 Raimon Bosch raimon.bo...@gmail.com Hi, Today I was playing with StatsComponent just to extract some statistics from my index. I'm using a solr index to store user searches. Basically what I did is to aggregate data from accesslog into my solr index. So now I can see average bounce rate for a group of user searches and see which ones are performing better in google. Now I would like to see the evolution of this stats throught time. For that I would need to have a field with a different values throught time i.e. flats for rent new york at 1/12/2011 = bounce_rate=48.6% flats for rent new york at 1/1/2012 = bounce_rate=49.7% flats for rent new york at 1/2/2012 = bounce_rate=46.4% There is any solr type field that could fit to solve this? Thanks in advance, Raimon Bosch.
Re: Time Stats
second mean is 48.05%... 2012/3/9 Raimon Bosch raimon.bo...@gmail.com The answer is so easy. Just need to create an index with each visit. In this way I could use faceted date search to create time statistics. flats for rent new york at 1/12/2011 = bounce_rate=48.6% flats for rent new york at 1/1/2012 = bounce_rate=49.7% flats for rent new york at 1/2/2012 = bounce_rate=46.4% date:[1/12/2011 - 1/1/2012] flats for rent new york at 1/12/2011 = bounce_rate=48.6% flats for rent new york at 1/1/2012 = bounce_rate=49.7% mean=49.15% date:[1/1/2012 - 1/2/2012] flats for rent new york at 1/1/2012 = bounce_rate=49.7% flats for rent new york at 1/2/2012 = bounce_rate=46.4% mean=49.05% With my initial approach I would save some disk and memory space. I'm still wondering if it is possible. 2012/2/27 Raimon Bosch raimon.bo...@gmail.com Anyone up to provide an answer? The idea is have a kind of CustomInteger compound by an array of timestamps. The value shown in this field would be based in the date range that you're sending. Biggest problem will be that this field would be in all the documents on your solr index so you need to calculate this number in real-time. 2012/2/26 Raimon Bosch raimon.bo...@gmail.com Hi, Today I was playing with StatsComponent just to extract some statistics from my index. I'm using a solr index to store user searches. Basically what I did is to aggregate data from accesslog into my solr index. So now I can see average bounce rate for a group of user searches and see which ones are performing better in google. Now I would like to see the evolution of this stats throught time. For that I would need to have a field with a different values throught time i.e. flats for rent new york at 1/12/2011 = bounce_rate=48.6% flats for rent new york at 1/1/2012 = bounce_rate=49.7% flats for rent new york at 1/2/2012 = bounce_rate=46.4% There is any solr type field that could fit to solve this? Thanks in advance, Raimon Bosch.
Knowing which fields matched a search
When searching across multiple fields, is there a way to identify which field(s) resulted in a match without using highlighting or stored fields?
Re: does solr have a mechanism for intercepting requests - before they are handed off to a request handler
I'm doing something like that by hacking SolrRequestParsers, I tried to find more legal way but haven't found it http://mail-archives.apache.org/mod_mbox/lucene-dev/201202.mbox/%3CCAF=Pa597RpLjVWZbM=0aktjhpnea4m931j0s1s4bda4qe+t...@mail.gmail.com%3E I added solrRequestParsers into solrconfig.xml https://github.com/m-khl/solr-patches/commit/f92018818b20d79b01d795f2c52446b499023dd8#diff-4 Also, have you considered j2ee's webapp servlet filters? On Fri, Mar 9, 2012 at 9:11 PM, geeky2 gee...@hotmail.com wrote: hello all, does solr have a mechanism that could intercept a request (before it is handed off to a request handler). the intent (from the business) is to send in a generic request - then pre-parse the url and send it off to a specific request handler. thank you, mark -- View this message in context: http://lucene.472066.n3.nabble.com/does-solr-have-a-mechanism-for-intercepting-requests-before-they-are-handed-off-to-a-request-handler-tp3813255p3813255.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Lucid Certified Apache Lucene/Solr Developer Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
RE: DIH - FileListEntityProcessor reading from Multiple Disk Directories
I knew there had to be an easy way. That was it. Thanks for the tip! Mike Rawlins Sr. Software Engineer Chair, ASC X12 Technical Assessment Subcommittee 18111 Preston Road, Suite 600 Dallas, TX 75252 +1 972.643.3101 direct mike.rawl...@gxs.com www.gxs.com GXS Blog -Original Message- From: Dyer, James [mailto:james.d...@ingrambook.com] Sent: Friday, March 09, 2012 1:14 PM To: solr-user@lucene.apache.org Subject: RE: DIH - FileListEntityProcessor reading from Multiple Disk Directories Did you try setting baseDir to the root directory and recursive to true ? (see http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor for more information). James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 From: mike.rawl...@gxs.com [mailto:mike.rawl...@gxs.com] Sent: Friday, March 09, 2012 12:44 PM To: solr-user@lucene.apache.org Subject: DIH - FileListEntityProcessor reading from Multiple Disk Directories All, I have an application that has RDF files in multiple subdirectories under a root directory. I'm using the DIH with a FileListEntityProcessor to load the index. All worked fine when the files were in a single directory, but I can't seem to figure out how to make a single data-config.xml read multiple directories. The baseDir attribute seems to allow only a single absolute path. I tried multiple document elements with a different baseDir for each FileListEntityProcessor, but it only executed the first one. Is there an easy way to do this, short of running multiple imports and changing baseDir for each? Thanks, Mike Mike Rawlins Sr. Software Engineer Chair, ASC X12 Technical Assessment Subcommittee 18111 Preston Road, Suite 600 Dallas, TX 75252 +1 972.643.3101 direct mike.rawl...@gxs.commailto:mike.rawlins@mike.rawl...@inovis.com www.gxs.comhttp://www.inovis.com/ GXS Bloghttp://blogs.inovis.com/ [cid:image001.gif@01CCFDF2.39D86E20]
Re: Stemmer Question
So I've thrown something together fairly quickly which is based on what Ahmet had sent that I believe will preserve the original token as well as the stemmed version. I didn't go as far as weighting them differently using the payloads however. I am not sure how to use the preserveOriginal attribute from WordDelimeterFilterFactory, can anyone provide guidance on that? On Fri, Mar 9, 2012 at 2:53 PM, Jamie Johnson jej2...@gmail.com wrote: Further digging leads me to believe this is not the case. The Synonym Filter supports this, but the Stemming Filter does not. Ahmet, Would you be willing to provide your filter as well? I wonder if we can make it aware of the preserveOriginal attribute on WordDelimterFilterFactory? On Fri, Mar 9, 2012 at 2:27 PM, Jamie Johnson jej2...@gmail.com wrote: Ok, so I'm digging through the code and I noticed in org.apache.lucene.analysis.synonym.SynonymFilter there are mentions of a keepOrig attribute. Doing some googling led me to http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters which speaks of an attribute preserveOriginal=1 on solr.WordDelimiterFilterFactory. So it seems like I can get the functionality I am looking for by setting preserveOriginal, is that correct? On Fri, Mar 9, 2012 at 9:53 AM, Ahmet Arslan iori...@yahoo.com wrote: I'd be very interested to see how you did this if it is available. Does this seem like something useful to the community at large? I PMed it to you. Filter is not a big deal. Just modified from {@link org.apache.lucene.wordnet.SynonymTokenFilter}. If requested, I can provide it publicly too.
Re: Highlighting text field when query is for string field
Or is it because query is on keyword field and I expect matching keywords to be highlighted on excerpts field? Any insights would help a lot. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-text-field-when-query-is-for-string-field-tp3475334p3814159.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to Index Custom XML structure
You could setup a ManifoldCF job to fetch the XMLs and then setup a new SolrOutputConnection for /solr/update/xslt?tr=myStyleSheet.xsl where myStyleSheet.xsl is the stylesheet to use for that kind of XML. See http://wiki.apache.org/solr/XsltUpdateRequestHandler -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 7. mars 2012, at 14:04, Erick Erickson wrote: Well, I'm ManifoldCF ignorant, so I'll have to defer on this one Best Erick On Tue, Mar 6, 2012 at 12:24 PM, Anupam Bhattacharya anupam...@gmail.com wrote: Thanks Erick, for the prompt response, Both the suggestions will be useful for a one time indexing activity. Since DIH will be one time process of indexing the repository thus it is of no use in my case.Writing a standalone Java program utilizing SolrJ will again be a one time indexing process. I want to write a separate Handler which will be called by ManifoldCF Job to create indexes in SOLR. In my case the repository is Documentum Content server. I found some relevant link at this url.. https://community.emc.com/docs/DOC-6520 which is quite similar to my requirement. I modified the code to parse the XML and added that into the document properties Although this works fine when i try to test it with my CURL program with parameters but when the same handler is called from ManifoldCF job the job gets terminated within few minutes. Not sure the reason for that. The handler is written similar to /update/extract which is ExtractingRequestHandler. Is ExtractingRequestHandler capable of extracting tag name and values using some of its defined attributes like capture, captureAttr, extractOnly etc ? which can be added into the document indexes.. On Tue, Feb 28, 2012 at 8:26 AM, Erick Erickson erickerick...@gmail.comwrote: You might be able to do something with the XSL Transformer step in DIH. It might also be easier to just write a SolrJ program to parse the XML and construct a SolrInputDocument to send to Solr. It's really pretty straightforward. Best Erick On Sun, Feb 26, 2012 at 11:31 PM, Anupam Bhattacharya anupam...@gmail.com wrote: Hi, I am using ManifoldCF to Crawl data from Documentum repository. I am able to successfully read the metadata/properties for the defined document types in Documentum using the out-of-the box Documentum Connector in ManifoldCF. Unfortunately, there is one XML file also present which consists of a custom XML structure which I need to read and fetch the element values and add it for indexing in lucene through SOLR. Is there any mechanism to index any XML structure document in SOLR ? I checked the SOLR CELL framework which support below stucture.. add doc field name=id9885A004/field field name=nameCanon PowerShot SD500/field field name=categorycamera/field field name=features3x optical zoom/field field name=featuresaluminum case/field field name=weight6.4/field field name=price329.95/field /doc doc field name=id9885A003/field field name=nameCanon PowerShot SD504/field field name=categorycamera1/field field name=features3x optical zoom1/field field name=featuresaluminum case1/field field name=weight6.41/field field name=price329.956/field /doc /add my Custom XML structure is of the following format.. from which I need to read *subject * *abstract *field for indexing. I checked TIKA project but I couldn't find any useful stuff. ?xml version=1.0 encoding=UTF-8? RECORD doc_id1/doc_id abstractThis is an abstract./abstract subjectText Subject/subject availability / indexing index_group/index_group keyterms/keyterms keyterms/keyterms /indexing publication_date/publication_date physical_storage / log_entry / legal_category / legal_category_notes / citation_only/citation_only citation_only_desc / export_control / export_control_desc / /RECORD Appreciate any help on this. Regards Anupam -- Thanks Regards Anupam Bhattacharya
Xml representation of indexed document
Hi all, I'm doing data import using DIH in solr 3.5. I'm curious to know whether it is see the xml representation of indexed data from the browser. Is it possible? I just want to make sure these data is correctly indexed with correct value or for debugging purpose. -- Chamnap
Re: Xml representation of indexed document
You can use Luke to view Lucene Indexes. Anupam On Sat, Mar 10, 2012 at 12:27 PM, Chamnap Chhorn chamnapchh...@gmail.comwrote: Hi all, I'm doing data import using DIH in solr 3.5. I'm curious to know whether it is see the xml representation of indexed data from the browser. Is it possible? I just want to make sure these data is correctly indexed with correct value or for debugging purpose. -- Chamnap -- Thanks Regards Anupam Bhattacharya