Re: Reporting tools

2012-03-09 Thread Gora Mohanty
On 9 March 2012 09:05, Donald Organ dor...@donaldorgan.com wrote:
 Are there any reporting tools out there?  So I can analyzer search term
 frequency, filter frequency,  etc?

Do not have direct experience of any Solr reporting
tool, but please see the Solr StatsComponent:
http://wiki.apache.org/solr/StatsComponent

This should provide you with data on the Solr
index.

Regards,
Gora


Re: Reporting tools

2012-03-09 Thread Tommaso Teofili
as Gora says there is the stats component you can take advantage of or you
could also use JMX directly [1] or LucidGaze [2][3] or commercial services
like [4] or [5] (these are the ones I know but there may be also others),
each of them with different level/type of service.

Tommaso

[1] : http://wiki.apache.org/solr/SolrJmx
[2] : http://www.lucidimagination.com/blog/2009/08/24/lucid-gaze-for-lucene/
[3] : http://www.chrisumbel.com/article/monitoring_solr_lucidgaze
[4] : http://sematext.com/search-analytics/index.html
[5] : http://newrelic.com/


2012/3/9 Donald Organ dor...@donaldorgan.com

 Are there any reporting tools out there?  So I can analyzer search term
 frequency, filter frequency,  etc?



Re: docBoost with fq search

2012-03-09 Thread Gian Marco Tagliani
Hi Ahmet,
thanks for the answer.

I'm really suprised because I always thought docBoost as a kind of sorting
tool.
And I used in that way, I'm giving big boost to the documents I want back
first in search.



Do you think there is a trick to force the usage of docBoost in my special
case?


Gian Marco


On Wed, Mar 7, 2012 at 2:51 PM, Ahmet Arslan iori...@yahoo.com wrote:



 --- On Wed, 3/7/12, Gian Marco Tagliani gm.tagli...@gmail.com wrote:

  From: Gian Marco Tagliani gm.tagli...@gmail.com
  Subject: docBoost with fq search
  To: solr-user@lucene.apache.org
  Date: Wednesday, March 7, 2012, 3:11 PM
  Hi All,
  I'm seeing strange behavior with my Solr (version 3.4).
 
  For searching I'm using the q and the fq params.
  At index-time I'm adding a docBoost to each document.
 
  When I perform a search with both q and fq params
  everything works.
  For the search with q=*:* and something in the fq, it
  seems to me that the docBoost in not taken into
  consideration.
 
  Is that possible?

 Yes possible.

 FilterQuery (fq) does not contribute to score. It is not used in score
 calculation.

 MatchAllDocsQuery (*:*) is a fast way to return all docs. Adding
 fl=scoredebugQuery=on will show that all docs will get constant score of
 1.0.



Re: docBoost with fq search

2012-03-09 Thread Tanguy Moal

Hi Gian Marco,

I don't know if it's possible to exploit documents' boost values from 
function queries (see http://wiki.apache.org/solr/FunctionQuery), but if 
you store your boost in a search-able numeric field, you could either :

do

q=*:* AND _val_:your_boost_field

if you're using default query;

or

q=*:*defType=edismaxbf=your_boost_field

if you're using edismax.

That will give scores to a MatchAllDocsQuery (*:*) .

Hope this helps,
--
Tanguy

Le 09/03/2012 10:25, Gian Marco Tagliani a écrit :

Hi Ahmet,
thanks for the answer.

I'm really suprised because I always thought docBoost as a kind of sorting
tool.
And I used in that way, I'm giving big boost to the documents I want back
first in search.



Do you think there is a trick to force the usage of docBoost in my special
case?


Gian Marco


On Wed, Mar 7, 2012 at 2:51 PM, Ahmet Arslaniori...@yahoo.com  wrote:



--- On Wed, 3/7/12, Gian Marco Taglianigm.tagli...@gmail.com  wrote:


From: Gian Marco Taglianigm.tagli...@gmail.com
Subject: docBoost with fq search
To: solr-user@lucene.apache.org
Date: Wednesday, March 7, 2012, 3:11 PM
Hi All,
I'm seeing strange behavior with my Solr (version 3.4).

For searching I'm using the q and the fq params.
At index-time I'm adding a docBoost to each document.

When I perform a search with both q and fq params
everything works.
For the search with q=*:* and something in the fq, it
seems to me that the docBoost in not taken into
consideration.

Is that possible?

Yes possible.

FilterQuery (fq) does not contribute to score. It is not used in score
calculation.

MatchAllDocsQuery (*:*) is a fast way to return all docs. Adding
fl=scoredebugQuery=on will show that all docs will get constant score of
1.0.





Re: Reporting tools

2012-03-09 Thread Ahmet Arslan
 Are there any reporting tools out
 there?  So I can analyzer search term
 frequency, filter frequency,  etc?

You might be interested in this :
http://www.sematext.com/search-analytics/index.html


Re: indexing bigdata

2012-03-09 Thread Robert Stewart
It very much depends on your data and also what query features you will use.  
How many fields, the size of each field, how many unique values per field, how 
many fields are stored vs. only indexed, etc.  I have a system with 3+ billion 
does, and each instance (each index core) has 120million docs and it flies.  
But the documents are tiny only 3 fields each, and the search is very simple 
single keyword match.  On another system we only have 7 million docs per 
instance and it is slower because documents are much much larger with many more 
fields, and we do a lot of faceting and other advanced search features.

Also other factors such as what type of features you will use for search 
(faceting, field collapsing, wildcard queries, etc.) can all increase search 
time vs. just simple keyword search.

Unfortunately it is one of those things you need to try it out to really get an 
answer IMO.


On Mar 8, 2012, at 11:39 PM, Sharath Jagannath wrote:

 Ok, My bad. I should have put it in a better way.
 Is it good idea to have all the 30M docs on a single instance, or should I
 consider distributed set-up.
 I have synthesized the data and the have configured schema and have made
 suitable changes to the config. Have tested out with a smaller data-set on
 my laptop and have a good work flow set-up.
 
 I do not have a big machine and test it out.
 Wanted to make sure I have insight in either option I have before I decide
 to spin-up an amazon instance.
 
 Thanks,
 Sharath
 
 On Thu, Mar 8, 2012 at 6:18 PM, Erick Erickson erickerick...@gmail.comwrote:
 
 Your question is really unanswerable, there are about a zillion
 factors that could influence the answer. I can index 5-7K docs/second
 so it's efficient. Others can index only a fraction of that. It all
 depends...
 
 Try it and see is about the only way to answer.
 
 Best
 Erick
 
 On Thu, Mar 8, 2012 at 1:35 PM, Sharath Jagannath
 shotsonclo...@gmail.com wrote:
 Is indexing around 30 Million documents in a single solr instance
 efficient?
 Has somebody experimented it? Planning to use it for an autosuggest
 feature
 I am implementing, so expecting the response in few milliseconds.
 Should I be looking at sharding?
 
 Thanks,
 Sharath
 



Re: Geolocation in SOLR with PHP application

2012-03-09 Thread Spadez
A quick, bump, I could really do with some input on this please.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Geolocation-in-SOLR-with-PHP-application-tp3807120p3812364.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reporting tools

2012-03-09 Thread Koji Sekiguchi

(12/03/09 12:35), Donald Organ wrote:

Are there any reporting tools out there?  So I can analyzer search term
frequency, filter frequency,  etc?


You may be interested in:

Free Query Log Visualizer for Apache Solr
http://soleami.com/

koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/


Re: docBoost with fq search

2012-03-09 Thread Ahmet Arslan

 if you store your boost in a search-able numeric field...

You can simply sort by that field too. q=*:*sort=your_boost_field desc



Multithreaded DIH giving Operation not allowed after ResultSet closed solr 3.5

2012-03-09 Thread Rohit Khanna
When i try running a multi threaded DIH in solr 3.5 I get the following
error Operation not allowed after ResultSet closed .

I have multiple entities mapped to fields, after the first query finishes i
get this error for every other query thats been mentioned in my
data-config.xml file.
I have mentioned the first entity as the root entity and have given the
threads parameter as 4. I can attach the file if required if you need to
understand better.

Any help would be appreciated.

Regards,
Rohit K


Multicore -Create new Core request errors

2012-03-09 Thread Sujatha Arun
Hello,

When I issue this query to create a new Solr Core , I get the error message
HTTP Status 500 - Can't find resource 'solrconfig.xml' in classpath or
'/home/searchuser/searchinstances/multi_core_prototype/solr/conf/

http://
server_ip:port/multi_core_prototype/admin/cores?action=CREATEname=coreXinstanceDir=/home/searchuser/searchinstances/multi_core_prototype/solr/coreX

I believe that the schema and and solrcongfig are optional.

I have the default cores - core0 and core1 in solr1.3 version. what should
be the path of solrconfig ,shld it refer to path of the schema in existing
core and can I expect to see the conf folder in the new core?

Regards
Sujatha


Re: Multithreaded DIH giving Operation not allowed after ResultSet closed solr 3.5

2012-03-09 Thread Mikhail Khludnev
Hello,

AFAIK DIH is not multi-threaded at all. see
https://issues.apache.org/jira/browse/SOLR-3011

Regards

On Fri, Mar 9, 2012 at 4:22 PM, Rohit Khanna getafix@gmail.com wrote:

 When i try running a multi threaded DIH in solr 3.5 I get the following
 error Operation not allowed after ResultSet closed .

 I have multiple entities mapped to fields, after the first query finishes i
 get this error for every other query thats been mentioned in my
 data-config.xml file.
 I have mentioned the first entity as the root entity and have given the
 threads parameter as 4. I can attach the file if required if you need to
 understand better.

 Any help would be appreciated.

 Regards,
 Rohit K




-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Geolocation in SOLR with PHP application

2012-03-09 Thread Adolfo Castro Menna
Hi,

Take a look at http://wiki.apache.org/solr/SpatialSearch
Then from php, you need to pass the right parameters as described in the
link above.

On Fri, Mar 9, 2012 at 8:00 AM, Spadez james_will...@hotmail.com wrote:

 A quick, bump, I could really do with some input on this please.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Geolocation-in-SOLR-with-PHP-application-tp3807120p3812364.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Stemmer Question

2012-03-09 Thread Ahmet Arslan
 I'd be very interested to see how you
 did this if it is available. Does
 this seem like something useful to the community at large?

I PMed it to you. Filter is not a big deal. Just modified from {@link 
org.apache.lucene.wordnet.SynonymTokenFilter}. If requested,  I can provide it 
publicly too.


RE: Solr DIH and $deleteDocById

2012-03-09 Thread Dyer, James
This (almost) sounds like https://issues.apache.org/jira/browse/SOLR-2492 which 
was fixed in Solr 3.4 .. Are you on an earlier version?

But maybe not, because you're seeing the # deleted documents increment, and 
prior to this bug fix (I think) the deleted counter wasn't getting incremented 
either.

Perhaps this is a related bug that only happens when the deletes are added via 
a transformer?  Try a query like this without a transformer:

select uniqueID as '$deleteDocById' from table where uniqueID = '1-devpeter-1';

Does this work?  If so, you've probably stumbled on a new bug related to 
SOLR-2492.

In any case, the workaround (probably) is to manually issue a commit after 
doing your deletes.  Or, combine your deletes with add/updates in the same DIH 
run and it should commit automatically as configured.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Peter Boudreau [mailto:pe...@makeshop.jp] 
Sent: Friday, March 09, 2012 2:22 AM
To: solr-user@lucene.apache.org
Subject: Solr DIH and $deleteDocById

Hello everyone,

I've got Solr DIH up and running with no problems as far as importing data, but 
I'm now trying to add some functionality to our delta import to delete invalid 
records.

The special command $deleteDocById seems to provide what I'm looking for, and 
just for testing purposes until I get things working, I setup a simple 
transformer to delete just one document with a specific ID:

script
![CDATA[ 
function deleteBadDocs(row) {
var uniqueID = row.get('unique_id');
if(uniqueID == '1-devpeter-1') { 
row.put('$deleteDocById', uniqueID); 
}
return row; 
}
]]
/script

When I run DIH with this, sure enough, it tells me that 1 document was deleted:

Indexing completed. Added/Updated: 4755 documents. Deleted 1 documents. 

But then when I search the index, the document is still there.  I've been 
googling this for a while now, and found a number of references saying that you 
need to commit or optimize after this in order for the deletes to take effect, 
but I was under the impression that DIH both commits and optimizes by default, 
so shouldn't it be getting committed and optimized automatically by DIH?  I 
even tried implicitly setting the commit= and optimize= flags to true, but 
still, the deleted document was still in the index when I searched.  I also 
tried restarting Solr, but the deleted document was still there.

Could anyone help me understand why this document which is being reported as 
deleted still shows up in the index?

Also, there is one thing which I'm unclear on after reading the Solr wiki:

$deleteDocById : Delete a doc from Solr with this id. The value has to be the 
uniqueKey value of the document. Note that this command can only delete docs 
already committed to the index. 

I was starting to think that maybe $deleteDocById was only preventing documents 
from entering the index, and not deleting existing documents which were already 
in the index, but if I understand this correctly, $deleteDocById should be able 
to delete a document which was already in the index *before* running DIH, right?

Any help would be very much appreciated.

Thanks in advance,

Peter


does solr have a mechanism for intercepting requests - before they are handed off to a request handler

2012-03-09 Thread geeky2
hello all,

does solr have a mechanism that could intercept a request (before it is
handed off to a request handler).

the intent (from the business) is to send in a generic request - then
pre-parse the url and send it off to a specific request handler.

thank you,
mark 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/does-solr-have-a-mechanism-for-intercepting-requests-before-they-are-handed-off-to-a-request-handler-tp3813255p3813255.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to rank an exact match higher?

2012-03-09 Thread Lan
Here's one way to do it using dismax.

1. You'll have two fields. 
title_text which is has a type of TextField
title_string which has type String. This is an exact match field.

2. Set the dismax qf=title_string^10 title_text^1


You could even make this better by doing also handling infix searches.
Create a field title_ngram which uses the ngram type. Set dismax qf =
title_string^10 title_text^5 title_ngram^1







--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-rank-an-exact-match-higher-tp3802871p3813327.html
Sent from the Solr - User mailing list archive at Nabble.com.


Lucene vs Solr design decision

2012-03-09 Thread Alireza Salimi
Hi everybody,

Let's say we have a system with billions of small documents (average of 2-3
fields).
and each document belongs to JUST ONE user
and searches are user specific, meaning that when we search
for something, we just look into documents of that user.

On the other hand we need to see the newly added documents
as soon as they are added to the indexes.

Now I think we have two solutions:
1. Use Lucene directly and create a separate index file for each user
2. Use Solr and store all of the users' data all together in one HUGE index
file

the benefit of using Lucene is that each commit() will take less time
comparing to the case that we use Solr.

Is there any suggested solution for cases like this?

Thanks

-- 
Alireza Salimi
Java EE Developer


Re: Lucene vs Solr design decision

2012-03-09 Thread Lan
Solr has cores which are independent search indexes. You could create a
separate core per user. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813489.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Lucene vs Solr design decision

2012-03-09 Thread Alireza Salimi
Sorry I didn't mention that, the number of users can be millions!
Meaning that millions of cores! So I'm not sure if it's a good idea.

On Fri, Mar 9, 2012 at 1:35 PM, Lan dung@gmail.com wrote:

 Solr has cores which are independent search indexes. You could create a
 separate core per user.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813489.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Alireza Salimi
Java EE Developer


DIH - FileListEntityProcessor reading from Multiple Disk Directories

2012-03-09 Thread mike.rawlins
All,

I have an application that has RDF files in multiple subdirectories under a 
root directory. I'm using the DIH with a FileListEntityProcessor to load the 
index. All worked fine when the files were in a single directory, but I can't 
seem to figure out how to make a single data-config.xml read multiple 
directories.

The baseDir attribute seems to allow only a single absolute path. I tried 
multiple document elements with a different baseDir for each 
FileListEntityProcessor, but it only executed the first one.

Is there an easy way to do this, short of running multiple imports and changing 
baseDir for each?

Thanks,

Mike


Mike Rawlins
Sr. Software Engineer
Chair, ASC X12 Technical Assessment Subcommittee
18111 Preston Road, Suite 600
Dallas, TX 75252
+1 972.643.3101 direct
mike.rawl...@gxs.commailto:mike.rawlins@mike.rawl...@inovis.com
www.gxs.comhttp://www.inovis.com/
GXS Bloghttp://blogs.inovis.com/
[GXS_2color_pos]



Re: Lucene vs Solr design decision

2012-03-09 Thread Lan
Solr has no limitation on the number of cores. It's limited by your hardware,
inodes and how many files you could keep open.

I think even if you went the Lucene route you would run into same hardware
limits.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Lucene vs Solr design decision

2012-03-09 Thread Glen Newton
millions of cores will not work...
...yet.

-glen

On Fri, Mar 9, 2012 at 1:46 PM, Lan dung@gmail.com wrote:
 Solr has no limitation on the number of cores. It's limited by your hardware,
 inodes and how many files you could keep open.

 I think even if you went the Lucene route you would run into same hardware
 limits.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
-
http://zzzoot.blogspot.com/
-


Re: Lucene vs Solr design decision

2012-03-09 Thread Alireza Salimi
probably, and besides that, how can I use the features that SolrCloud
provides (i.e. high availability and distribution)?

The other solution would be to use SolrCloud and keep all of the users'
information in single collection and use NRT. But on the other hand
the frequency of updates on that big collection will be high.

Do you think it makes sense?

On Fri, Mar 9, 2012 at 2:02 PM, Glen Newton glen.new...@gmail.com wrote:

 millions of cores will not work...
 ...yet.

 -glen

 On Fri, Mar 9, 2012 at 1:46 PM, Lan dung@gmail.com wrote:
  Solr has no limitation on the number of cores. It's limited by your
 hardware,
  inodes and how many files you could keep open.
 
  I think even if you went the Lucene route you would run into same
 hardware
  limits.
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html
  Sent from the Solr - User mailing list archive at Nabble.com.



 --
 -
 http://zzzoot.blogspot.com/
 -




-- 
Alireza Salimi
Java EE Developer


Re: Lucene vs Solr design decision

2012-03-09 Thread Robert Stewart
Split up index into say 100 cores, and then route each search to a specific 
core by some mod operator on the user id:

core_number = userid % num_cores

core_name = core+core_number

That way each index core is relatively small (maybe 100 million docs or less).


On Mar 9, 2012, at 2:02 PM, Glen Newton wrote:

 millions of cores will not work...
 ...yet.
 
 -glen
 
 On Fri, Mar 9, 2012 at 1:46 PM, Lan dung@gmail.com wrote:
 Solr has no limitation on the number of cores. It's limited by your hardware,
 inodes and how many files you could keep open.
 
 I think even if you went the Lucene route you would run into same hardware
 limits.
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 -- 
 -
 http://zzzoot.blogspot.com/
 -



Re: Lucene vs Solr design decision

2012-03-09 Thread Alireza Salimi
This solution makes sense, but I still don't know if I can use solrCloud
with
this configuration or not.

On Fri, Mar 9, 2012 at 2:06 PM, Robert Stewart bstewart...@gmail.comwrote:

 Split up index into say 100 cores, and then route each search to a
 specific core by some mod operator on the user id:

 core_number = userid % num_cores

 core_name = core+core_number

 That way each index core is relatively small (maybe 100 million docs or
 less).


 On Mar 9, 2012, at 2:02 PM, Glen Newton wrote:

  millions of cores will not work...
  ...yet.
 
  -glen
 
  On Fri, Mar 9, 2012 at 1:46 PM, Lan dung@gmail.com wrote:
  Solr has no limitation on the number of cores. It's limited by your
 hardware,
  inodes and how many files you could keep open.
 
  I think even if you went the Lucene route you would run into same
 hardware
  limits.
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
  --
  -
  http://zzzoot.blogspot.com/
  -




-- 
Alireza Salimi
Java EE Developer


RE: DIH - FileListEntityProcessor reading from Multiple Disk Directories

2012-03-09 Thread Dyer, James
Did you try setting baseDir to the root directory and recursive to true ?  
(see http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor for 
more information).

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

From: mike.rawl...@gxs.com [mailto:mike.rawl...@gxs.com]
Sent: Friday, March 09, 2012 12:44 PM
To: solr-user@lucene.apache.org
Subject: DIH - FileListEntityProcessor reading from Multiple Disk Directories

All,

I have an application that has RDF files in multiple subdirectories under a 
root directory. I'm using the DIH with a FileListEntityProcessor to load the 
index. All worked fine when the files were in a single directory, but I can't 
seem to figure out how to make a single data-config.xml read multiple 
directories.

The baseDir attribute seems to allow only a single absolute path. I tried 
multiple document elements with a different baseDir for each 
FileListEntityProcessor, but it only executed the first one.

Is there an easy way to do this, short of running multiple imports and changing 
baseDir for each?

Thanks,

Mike


Mike Rawlins
Sr. Software Engineer
Chair, ASC X12 Technical Assessment Subcommittee
18111 Preston Road, Suite 600
Dallas, TX 75252
+1 972.643.3101 direct
mike.rawl...@gxs.commailto:mike.rawlins@mike.rawl...@inovis.com
www.gxs.comhttp://www.inovis.com/
GXS Bloghttp://blogs.inovis.com/
[cid:image001.gif@01CCFDF2.39D86E20]



Re: Lucene vs Solr design decision

2012-03-09 Thread Alireza Salimi
On the other hand, I'm aware of the fact that if I go with Lucene approach,
failover is something that I will have to support manually! which is a
nightmare!

On Fri, Mar 9, 2012 at 2:13 PM, Alireza Salimi alireza.sal...@gmail.comwrote:

 This solution makes sense, but I still don't know if I can use solrCloud
 with
 this configuration or not.

 On Fri, Mar 9, 2012 at 2:06 PM, Robert Stewart bstewart...@gmail.comwrote:

 Split up index into say 100 cores, and then route each search to a
 specific core by some mod operator on the user id:

 core_number = userid % num_cores

 core_name = core+core_number

 That way each index core is relatively small (maybe 100 million docs or
 less).


 On Mar 9, 2012, at 2:02 PM, Glen Newton wrote:

  millions of cores will not work...
  ...yet.
 
  -glen
 
  On Fri, Mar 9, 2012 at 1:46 PM, Lan dung@gmail.com wrote:
  Solr has no limitation on the number of cores. It's limited by your
 hardware,
  inodes and how many files you could keep open.
 
  I think even if you went the Lucene route you would run into same
 hardware
  limits.
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
  --
  -
  http://zzzoot.blogspot.com/
  -




 --
 Alireza Salimi
 Java EE Developer





-- 
Alireza Salimi
Java EE Developer


Re: Upgrade solr

2012-03-09 Thread Erick Erickson
Take a look at the solr/CHANGES.txt file. Each release has
an Upgrading from  section, the one you're interested in
is Upgrading from Solr 1.4 in the 3.1.0 section, and then the
ones that are in subsequent sections.

Of course I'd try it on a copy of my index first...

If at all possible, the easiest way is to re-index your data.

Best
Erick

On Fri, Mar 9, 2012 at 4:10 AM, Abhishek tiwari
abhishek.tiwari@gmail.com wrote:
 Can some help me how to Upgrade my solr from 1.4? what step we need to
 take..


Re: Stemmer Question

2012-03-09 Thread Jamie Johnson
Ok, so I'm digging through the code and I noticed in
org.apache.lucene.analysis.synonym.SynonymFilter there are mentions of
a keepOrig attribute.  Doing some googling led me to
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters which
speaks of an attribute preserveOriginal=1 on
solr.WordDelimiterFilterFactory.  So it seems like I can get the
functionality I am looking for by setting preserveOriginal, is that
correct?


On Fri, Mar 9, 2012 at 9:53 AM, Ahmet Arslan iori...@yahoo.com wrote:
 I'd be very interested to see how you
 did this if it is available. Does
 this seem like something useful to the community at large?

 I PMed it to you. Filter is not a big deal. Just modified from {@link 
 org.apache.lucene.wordnet.SynonymTokenFilter}. If requested,  I can provide 
 it publicly too.


Re: Stemmer Question

2012-03-09 Thread Jamie Johnson
Further digging leads me to believe this is not the case.  The Synonym
Filter supports this, but the Stemming Filter does not.

Ahmet,

Would you be willing to provide your filter as well?  I wonder if we
can make it aware of the preserveOriginal attribute on
WordDelimterFilterFactory?


On Fri, Mar 9, 2012 at 2:27 PM, Jamie Johnson jej2...@gmail.com wrote:
 Ok, so I'm digging through the code and I noticed in
 org.apache.lucene.analysis.synonym.SynonymFilter there are mentions of
 a keepOrig attribute.  Doing some googling led me to
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters which
 speaks of an attribute preserveOriginal=1 on
 solr.WordDelimiterFilterFactory.  So it seems like I can get the
 functionality I am looking for by setting preserveOriginal, is that
 correct?


 On Fri, Mar 9, 2012 at 9:53 AM, Ahmet Arslan iori...@yahoo.com wrote:
 I'd be very interested to see how you
 did this if it is available. Does
 this seem like something useful to the community at large?

 I PMed it to you. Filter is not a big deal. Just modified from {@link 
 org.apache.lucene.wordnet.SynonymTokenFilter}. If requested,  I can provide 
 it publicly too.


Re: Time Stats

2012-03-09 Thread Raimon Bosch
The answer is so easy. Just need to create an index with each visit. In
this way I could use faceted date search to create time statistics.

flats for rent new york at 1/12/2011 = bounce_rate=48.6%
flats for rent new york at 1/1/2012 = bounce_rate=49.7%
flats for rent new york at 1/2/2012 = bounce_rate=46.4%

date:[1/12/2011 - 1/1/2012]
flats for rent new york at 1/12/2011 = bounce_rate=48.6%
flats for rent new york at 1/1/2012 = bounce_rate=49.7%
mean=49.15%

date:[1/1/2012 - 1/2/2012]
flats for rent new york at 1/1/2012 = bounce_rate=49.7%
flats for rent new york at 1/2/2012 = bounce_rate=46.4%
mean=49.05%

With my initial approach I would save some disk and memory space. I'm still
wondering if it is possible.

2012/2/27 Raimon Bosch raimon.bo...@gmail.com


 Anyone up to provide an answer?

 The idea is have a kind of CustomInteger compound by an array of
 timestamps. The value shown in this field would be based in the date range
 that you're sending.

 Biggest problem will be that this field would be in all the documents on
 your solr index so you need to calculate this number in real-time.


 2012/2/26 Raimon Bosch raimon.bo...@gmail.com


 Hi,

 Today I was playing with StatsComponent just to extract some statistics
 from my index. I'm using a solr index to store user searches. Basically
 what I did is to aggregate data from accesslog into my solr index. So now I
 can see average bounce rate for a group of user searches and see which ones
 are performing better in google.

 Now I would like to see the evolution of this stats throught time. For
 that I would need to have a field with a different values throught time i.e.

 flats for rent new york at 1/12/2011 = bounce_rate=48.6%
 flats for rent new york at 1/1/2012 = bounce_rate=49.7%
 flats for rent new york at 1/2/2012 = bounce_rate=46.4%

 There is any solr type field that could fit to solve this?

 Thanks in advance,
 Raimon Bosch.





Re: Time Stats

2012-03-09 Thread Raimon Bosch
second mean is 48.05%...

2012/3/9 Raimon Bosch raimon.bo...@gmail.com

 The answer is so easy. Just need to create an index with each visit. In
 this way I could use faceted date search to create time statistics.

 flats for rent new york at 1/12/2011 = bounce_rate=48.6%
 flats for rent new york at 1/1/2012 = bounce_rate=49.7%
 flats for rent new york at 1/2/2012 = bounce_rate=46.4%

 date:[1/12/2011 - 1/1/2012]
 flats for rent new york at 1/12/2011 = bounce_rate=48.6%
 flats for rent new york at 1/1/2012 = bounce_rate=49.7%
 mean=49.15%

 date:[1/1/2012 - 1/2/2012]
 flats for rent new york at 1/1/2012 = bounce_rate=49.7%
 flats for rent new york at 1/2/2012 = bounce_rate=46.4%
 mean=49.05%

 With my initial approach I would save some disk and memory space. I'm
 still wondering if it is possible.

 2012/2/27 Raimon Bosch raimon.bo...@gmail.com


 Anyone up to provide an answer?

 The idea is have a kind of CustomInteger compound by an array of
 timestamps. The value shown in this field would be based in the date range
 that you're sending.

 Biggest problem will be that this field would be in all the documents on
 your solr index so you need to calculate this number in real-time.


 2012/2/26 Raimon Bosch raimon.bo...@gmail.com


 Hi,

 Today I was playing with StatsComponent just to extract some statistics
 from my index. I'm using a solr index to store user searches. Basically
 what I did is to aggregate data from accesslog into my solr index. So now I
 can see average bounce rate for a group of user searches and see which ones
 are performing better in google.

 Now I would like to see the evolution of this stats throught time. For
 that I would need to have a field with a different values throught time i.e.

 flats for rent new york at 1/12/2011 = bounce_rate=48.6%
 flats for rent new york at 1/1/2012 = bounce_rate=49.7%
 flats for rent new york at 1/2/2012 = bounce_rate=46.4%

 There is any solr type field that could fit to solve this?

 Thanks in advance,
 Raimon Bosch.






Knowing which fields matched a search

2012-03-09 Thread Russell Black
When searching across multiple fields, is there a way to identify which 
field(s) resulted in a match without using highlighting or stored fields?

Re: does solr have a mechanism for intercepting requests - before they are handed off to a request handler

2012-03-09 Thread Mikhail Khludnev
I'm doing something like that by hacking SolrRequestParsers, I tried to
find more legal way but haven't found it
http://mail-archives.apache.org/mod_mbox/lucene-dev/201202.mbox/%3CCAF=Pa597RpLjVWZbM=0aktjhpnea4m931j0s1s4bda4qe+t...@mail.gmail.com%3E

I added solrRequestParsers into solrconfig.xml
https://github.com/m-khl/solr-patches/commit/f92018818b20d79b01d795f2c52446b499023dd8#diff-4

Also, have you considered j2ee's webapp servlet filters?

On Fri, Mar 9, 2012 at 9:11 PM, geeky2 gee...@hotmail.com wrote:

 hello all,

 does solr have a mechanism that could intercept a request (before it is
 handed off to a request handler).

 the intent (from the business) is to send in a generic request - then
 pre-parse the url and send it off to a specific request handler.

 thank you,
 mark

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/does-solr-have-a-mechanism-for-intercepting-requests-before-they-are-handed-off-to-a-request-handler-tp3813255p3813255.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


RE: DIH - FileListEntityProcessor reading from Multiple Disk Directories

2012-03-09 Thread mike.rawlins
I knew there had to be an easy way. That was it. Thanks for the tip!


Mike Rawlins
Sr. Software Engineer
Chair, ASC X12 Technical Assessment Subcommittee
18111 Preston Road, Suite 600
Dallas, TX 75252
+1 972.643.3101 direct
mike.rawl...@gxs.com
www.gxs.com 
GXS Blog


-Original Message-
From: Dyer, James [mailto:james.d...@ingrambook.com] 
Sent: Friday, March 09, 2012 1:14 PM
To: solr-user@lucene.apache.org
Subject: RE: DIH - FileListEntityProcessor reading from Multiple Disk 
Directories

Did you try setting baseDir to the root directory and recursive to true ?  
(see http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor for 
more information).

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

From: mike.rawl...@gxs.com [mailto:mike.rawl...@gxs.com]
Sent: Friday, March 09, 2012 12:44 PM
To: solr-user@lucene.apache.org
Subject: DIH - FileListEntityProcessor reading from Multiple Disk Directories

All,

I have an application that has RDF files in multiple subdirectories under a 
root directory. I'm using the DIH with a FileListEntityProcessor to load the 
index. All worked fine when the files were in a single directory, but I can't 
seem to figure out how to make a single data-config.xml read multiple 
directories.

The baseDir attribute seems to allow only a single absolute path. I tried 
multiple document elements with a different baseDir for each 
FileListEntityProcessor, but it only executed the first one.

Is there an easy way to do this, short of running multiple imports and changing 
baseDir for each?

Thanks,

Mike


Mike Rawlins
Sr. Software Engineer
Chair, ASC X12 Technical Assessment Subcommittee
18111 Preston Road, Suite 600
Dallas, TX 75252
+1 972.643.3101 direct
mike.rawl...@gxs.commailto:mike.rawlins@mike.rawl...@inovis.com
www.gxs.comhttp://www.inovis.com/
GXS Bloghttp://blogs.inovis.com/
[cid:image001.gif@01CCFDF2.39D86E20]



Re: Stemmer Question

2012-03-09 Thread Jamie Johnson
So I've thrown something together fairly quickly which is based on
what Ahmet had sent that I believe will preserve the original token as
well as the stemmed version.  I didn't go as far as weighting them
differently using the payloads however.  I am not sure how to use the
preserveOriginal attribute from WordDelimeterFilterFactory, can anyone
provide guidance on that?

On Fri, Mar 9, 2012 at 2:53 PM, Jamie Johnson jej2...@gmail.com wrote:
 Further digging leads me to believe this is not the case.  The Synonym
 Filter supports this, but the Stemming Filter does not.

 Ahmet,

 Would you be willing to provide your filter as well?  I wonder if we
 can make it aware of the preserveOriginal attribute on
 WordDelimterFilterFactory?


 On Fri, Mar 9, 2012 at 2:27 PM, Jamie Johnson jej2...@gmail.com wrote:
 Ok, so I'm digging through the code and I noticed in
 org.apache.lucene.analysis.synonym.SynonymFilter there are mentions of
 a keepOrig attribute.  Doing some googling led me to
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters which
 speaks of an attribute preserveOriginal=1 on
 solr.WordDelimiterFilterFactory.  So it seems like I can get the
 functionality I am looking for by setting preserveOriginal, is that
 correct?


 On Fri, Mar 9, 2012 at 9:53 AM, Ahmet Arslan iori...@yahoo.com wrote:
 I'd be very interested to see how you
 did this if it is available. Does
 this seem like something useful to the community at large?

 I PMed it to you. Filter is not a big deal. Just modified from {@link 
 org.apache.lucene.wordnet.SynonymTokenFilter}. If requested,  I can provide 
 it publicly too.


Re: Highlighting text field when query is for string field

2012-03-09 Thread solrdude
Or is it because query is on keyword field and I expect matching keywords
to be highlighted on excerpts field? Any insights would help a lot.

Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-text-field-when-query-is-for-string-field-tp3475334p3814159.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to Index Custom XML structure

2012-03-09 Thread Jan Høydahl
You could setup a ManifoldCF job to fetch the XMLs and then setup a new 
SolrOutputConnection for /solr/update/xslt?tr=myStyleSheet.xsl where 
myStyleSheet.xsl is the stylesheet to use for that kind of XML. See 
http://wiki.apache.org/solr/XsltUpdateRequestHandler

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 7. mars 2012, at 14:04, Erick Erickson wrote:

 Well, I'm ManifoldCF ignorant, so I'll have to defer on this one
 
 Best
 Erick
 
 On Tue, Mar 6, 2012 at 12:24 PM, Anupam Bhattacharya
 anupam...@gmail.com wrote:
 Thanks Erick, for the prompt response,
 
 Both the suggestions will be useful for a one time indexing activity. Since
 DIH will be one time process of indexing the repository thus it is of no
 use in my case.Writing a standalone Java program utilizing SolrJ will again
 be a one time indexing process.
 
 I want to write a separate Handler which will be called by ManifoldCF Job
 to create indexes in SOLR. In my case the repository is Documentum Content
 server. I found some relevant link at this url..
 https://community.emc.com/docs/DOC-6520 which is quite similar to my
 requirement.
 
 I modified the code to parse the XML and added that into the document
 properties Although this works fine when i try to test it with my CURL
 program with parameters but when the same handler is called from ManifoldCF
 job the job gets terminated within few minutes. Not sure the reason for
 that. The handler is written similar to /update/extract which is
 ExtractingRequestHandler.
 
 Is ExtractingRequestHandler capable of extracting tag name and values using
 some of its defined attributes like capture, captureAttr, extractOnly etc ?
 which can be added into the document indexes..
 
 
 On Tue, Feb 28, 2012 at 8:26 AM, Erick Erickson 
 erickerick...@gmail.comwrote:
 
 You might be able to do something with the XSL Transformer step in DIH.
 
 It might also be easier to just write a SolrJ program to parse the XML and
 construct a SolrInputDocument to send to Solr. It's really pretty
 straightforward.
 
 Best
 Erick
 
 On Sun, Feb 26, 2012 at 11:31 PM, Anupam Bhattacharya
 anupam...@gmail.com wrote:
 Hi,
 
 I am using ManifoldCF to Crawl data from Documentum repository. I am able
 to successfully read the metadata/properties for the defined document
 types
 in Documentum using the out-of-the box Documentum Connector in
 ManifoldCF.
 Unfortunately, there is one XML file also present which consists of a
 custom XML structure which I need to read and fetch the element values
 and
 add it for indexing in lucene through SOLR.
 
 Is there any mechanism to index any XML structure document in SOLR ?
 
 I checked the SOLR CELL framework which support below stucture..
 
 add
  doc
field name=id9885A004/field
field name=nameCanon PowerShot SD500/field
field name=categorycamera/field
field name=features3x optical zoom/field
field name=featuresaluminum case/field
field name=weight6.4/field
field name=price329.95/field
  /doc
  doc
field name=id9885A003/field
field name=nameCanon PowerShot SD504/field
field name=categorycamera1/field
field name=features3x optical zoom1/field
field name=featuresaluminum case1/field
field name=weight6.41/field
field name=price329.956/field
  /doc
 /add
 
  my Custom XML structure is of the following format.. from which I need
 to
 read *subject * *abstract *field for indexing. I checked TIKA project
 but
 I couldn't find any useful stuff.
 
 ?xml version=1.0 encoding=UTF-8?
 RECORD
 doc_id1/doc_id
 abstractThis is an abstract./abstract
 subjectText Subject/subject
 availability /
 indexing
 index_group/index_group
 keyterms/keyterms
 keyterms/keyterms
 /indexing
 publication_date/publication_date
 physical_storage /
 log_entry /
 legal_category /
 legal_category_notes /
 citation_only/citation_only
 citation_only_desc /
 export_control /
 export_control_desc /
 /RECORD
 
 Appreciate any help on this.
 
 Regards
 Anupam
 
 
 
 
 --
 Thanks  Regards
 Anupam Bhattacharya



Xml representation of indexed document

2012-03-09 Thread Chamnap Chhorn
Hi all,

I'm doing data import using DIH in solr 3.5. I'm curious to know whether it
is see the xml representation of indexed data from the browser. Is it
possible?
I just want to make sure these data is correctly indexed with correct value
or for debugging purpose.

-- 
Chamnap


Re: Xml representation of indexed document

2012-03-09 Thread Anupam Bhattacharya
You can use Luke to view Lucene Indexes.

Anupam

On Sat, Mar 10, 2012 at 12:27 PM, Chamnap Chhorn chamnapchh...@gmail.comwrote:

 Hi all,

 I'm doing data import using DIH in solr 3.5. I'm curious to know whether it
 is see the xml representation of indexed data from the browser. Is it
 possible?
 I just want to make sure these data is correctly indexed with correct value
 or for debugging purpose.

 --
 Chamnap




-- 
Thanks  Regards
Anupam Bhattacharya