RE: XML data in solr field

2010-03-17 Thread Nair, Manas
Thankyou Tommy. But the real problem here is that the xml is dynamic and the 
element names will be different in different docs which means that there will 
be a lot of field names to be added in schema if I were to index those xml 
nodes separately.
Is it possible to have nested indexing (xml within xml) in solr without the 
overhead of adding all those inner xml nodes as actual fields in solr schema?
 
Manas



From: Tommy Chheng [mailto:tommy.chh...@gmail.com]
Sent: Tue 3/16/2010 5:05 PM
To: solr-user@lucene.apache.org
Subject: Re: XML data in solr field




  Do you have the option of just importing each xml node as a
field/value when you add the document?

That'll let you do the search easily. If you need to store the raw XML,
you can use an extra field.

Tommy Chheng
Programmer and UC Irvine Graduate Student
Twitter @tommychheng
http://tommy.chheng.com http://tommy.chheng.com/ 


On 3/16/10 12:59 PM, Nair, Manas wrote:
 Hello Experts,

 I need help on this issue of mine. I am unsure if this scenario is possible.
 I have a field in my solr document namedinputxml, the value of which is a 
 xml string as below. This xml structure is within the inputxml field value. I 
 needed help on searching this xml structure i.e. if I search  for Venue, I 
 should get Radio City Music Hall as the result and not the complete tag 
 likeVenue value=Radio City Music Hall /. Is this supported in solr?? If 
 it is, how can this be implemented??

 root
 Venue value=Radio City Music Hall /
 Link value=http://bit.ly/Rndab; /
 LinkText value=En savoir + /
 Address value=New-York, USA /
 /root

 Any help is appreciated. I donot need the tag name in the result, instead I 
 need the tag value.

 Thanks in advance,
 Manas Nair





RE: Issue in search

2010-03-17 Thread Nair, Manas
You could write yourr query like
q=filedname1:searchValue AND fieldName2:value OR fieldName3: Value
 
Regards,
Manas



From: Suram [mailto:reactive...@yahoo.com]
Sent: Wed 3/17/2010 12:44 AM
To: solr-user@lucene.apache.org
Subject: Issue in search




In solr how can perform AND, OR, NOT search while querying the data
--
View this message in context: 
http://old.nabble.com/Issue-in-search-tp27927828p27927828.html
Sent from the Solr - User mailing list archive at Nabble.com.





Weired behaviour for certain search terms

2010-03-17 Thread Akash Sahu

Solr is behaving a bit weirdly for some of the search terms. EG:
co-ownership, co ownership.
It works fine with terms like quasi-delict, non-interference etc.

The issue is, its not return any excerpts in highlighting key of the
result dictionary. My search query is something like this:
http://192.168.1.50:8080/solr/core_SFS/select?q=content:(co-ownership)+AND+permauid:(AAAE1292-rw)hl=truehl.fl=contenthl.requireFieldMatch=truehl.fragsize=600hl.usePhraseHighlighter=truefacet=truefacet.field=permauidfacet.field=info_ownerfacet.sort=truefacet.mincount=1facet.limit=-1wt=pythonsort=promulgation_date_igprs_date+ascstart=0rows=200fl=uid,permauid

but when i search for terms like quasi-delict, non-interference, it gives me
proper excerpts.

I am using solr1.4 with python.

Any help will the highly appreciated. Thanks




-- 
View this message in context: 
http://old.nabble.com/Weired-behaviour-for-certain-search-terms-tp27927995p27927995.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr query parser doesn't invoke analyzer for simple term query?

2010-03-17 Thread Marco Martinez
Hello,

You can see what happen (which analyzer are used for this field and which is
the output of the analyzers) with this search using the analysis page of the
solr default web page. I assume you are using the same analyzers and
tokenizers in indexing and searching for this field in your schema.

Regards,


Marco Martínez Bautista



2010/3/17 Teruhiko Kurosaka k...@basistech.com

 It seems that Solr's query parser doesn't pass a single term query
 to the Analyzer for the field. For example, if I give it
 2001年 (year 2001 in Japanese), the searcher returns 0 hits
 but if I quote them with double-quotes, it returns hits.
 In this experiment, I configured schema.xml so that
 the field in question will use the morphological Analyzer
 my company makes that is capable of splitting 2001年
 into two tokens 2001 and 年.  I am guessing that this
 Analyzer is called ONLY IF the term is a phrase.
 Is my observation correct?

 If so, is there any configuration parameter that I can tweak
 to force any query for the text fields be processed by
 the Analyzer?

 One might ask why users won't put space between 2001 and 年.
 Well if they are clearly two separate words, people do that.
 But 年 works more like a suffix in this case, and in many
 Japanese speaker's mind, 2001年 seems like one token, so
 many people won't.  (Remember Japanese don't use spaces
 in normal writing.)  Forcing to use Analyzer would also
 be useful for compound word handling often desirable
 for languages like German.

 
 Teruhiko Kuro Kurosaka
 RLP + Lucene  Solr = powerful search for global contents




Re: APR setup

2010-03-17 Thread Paul Libbrecht
I think I know many sites that ignore this warning... using mod_proxy  
is quite an easier method in comparison to this. Maybe you are aiming  
at millions of queries per second, then you should consider that. I  
wonder if it makes sense before.



paul


Le 17-mars-10 à 04:36, blargy a écrit :



[java] INFO: The APR based Apache Tomcat Native library which allows  
optimal

performance in production environments was not found on the
java.library.path:
.:/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/ 
java


What the heck is this and why is it recommended for production  
settings?

Anyone?




Will Solr fit our needs?

2010-03-17 Thread Moritz Mädler
Hi List,

we are running a marketplace which has about a comparable functionality like 
ebay (auctions, fixed-price items etc).
The items are placed on the market by users who want to sell their goods.

Currently we are using Sphinx as an indexing engine, but, as Sphinx returns 
only document ids we have to make a
database-query to fetch the data to display. This massively decreases 
performance as we have to do two requests to
display data. 

I heard that Solr is able to return a complete dataset and we hope a switch to 
Solr can boost perfomance. 
A critical question is left and i was not able to find a solution for it in the 
docs: Is it possible to update attributes directly in the
index? 
An example for better illustration:
We have an index which holds all the auctions (containing auctionid, auction 
title) with its current prices(field: current_price). When a user places a new 
bid, 
is it possible to update the attribute 'current_price' directly in the index so 
that we can fetch the current_price from Solr and not from the database?

I hope you understood my problem. It would be kind if someone can point me to 
the right direction.

Thanks alot!

Moritz

Solr 1.4 - Stemmer expansion

2010-03-17 Thread Saïd Radhouani
I'm using the SnowballPorterFilterFactory for stemming French words. Some
words are not reconginized by this stemmer; I wonder wether, like synonyms
processing, the stemmers have the option of expansion.

Thanks.


Re: Will Solr fit our needs?

2010-03-17 Thread Lukáš Vlček
Hi,

Solr is running on top of Lucene and as far as I know Lucene knows only one
approach how to update the document field content: that is delete first and
then (re)index with new values.
However, saying this it does not mean you can not implement what you need.
Take a look at ParallelReader API
http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/index/ParallelReader.html

I am not sure if this functionality is directly exposed via Solr API. Try
digging mail lists (search-lucene.com or lucidimagination.com can be of good
help while you can narrow search to Solr only:
http://search-lucene.com/?q=ParallelReaderfc_project=Solr or
http://www.lucidimagination.com/search/?q=ParallelReader#/p:solr). For
example the following mail thread seems to be relevant:
http://search-lucene.com/m/iT2hMvtDt5 (though it is bit dated)

Do you really use only one physical index for all auctions? If yes then you
might consider using ParallelReader but if the index is large then I am not
sure about the performance. If you are planning to partition your index then
it can get more complex but faster.

Regards,
Lukas

On Wed, Mar 17, 2010 at 9:49 AM, Moritz Mädler m...@moritz-maedler.dewrote:

 Hi List,

 we are running a marketplace which has about a comparable functionality
 like ebay (auctions, fixed-price items etc).
 The items are placed on the market by users who want to sell their goods.

 Currently we are using Sphinx as an indexing engine, but, as Sphinx returns
 only document ids we have to make a
 database-query to fetch the data to display. This massively decreases
 performance as we have to do two requests to
 display data.

 I heard that Solr is able to return a complete dataset and we hope a switch
 to Solr can boost perfomance.
 A critical question is left and i was not able to find a solution for it in
 the docs: Is it possible to update attributes directly in the
 index?
 An example for better illustration:
 We have an index which holds all the auctions (containing auctionid,
 auction title) with its current prices(field: current_price). When a user
 places a new bid,
 is it possible to update the attribute 'current_price' directly in the
 index so that we can fetch the current_price from Solr and not from the
 database?

 I hope you understood my problem. It would be kind if someone can point me
 to the right direction.

 Thanks alot!

 Moritz


Re: Will Solr fit our needs?

2010-03-17 Thread Lukáš Vlček
Having been thinking about your questions again and I think that if you are
expecting that the price value will be changing a lot, especially when
talking about auctions then you should consider not storing the actual price
into the full text index but into some fast datastore. Some kind of scalable
in-memory hash map with journal based backup would do this job better I
think.

Just my 2 cents.

Regards,
Lukas

On Wed, Mar 17, 2010 at 10:36 AM, Lukáš Vlček lukas.vl...@gmail.com wrote:

 Hi,

 Solr is running on top of Lucene and as far as I know Lucene knows only one
 approach how to update the document field content: that is delete first and
 then (re)index with new values.
 However, saying this it does not mean you can not implement what you need.
 Take a look at ParallelReader API
 http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/index/ParallelReader.html

 I am not sure if this functionality is directly exposed via Solr API. Try
 digging mail lists (search-lucene.com or lucidimagination.com can be of
 good help while you can narrow search to Solr only:
 http://search-lucene.com/?q=ParallelReaderfc_project=Solr or
 http://www.lucidimagination.com/search/?q=ParallelReader#/p:solr). For
 example the following mail thread seems to be relevant:
 http://search-lucene.com/m/iT2hMvtDt5 (though it is bit dated)

 Do you really use only one physical index for all auctions? If yes then you
 might consider using ParallelReader but if the index is large then I am not
 sure about the performance. If you are planning to partition your index then
 it can get more complex but faster.

 Regards,
 Lukas


 On Wed, Mar 17, 2010 at 9:49 AM, Moritz Mädler m...@moritz-maedler.dewrote:

 Hi List,

 we are running a marketplace which has about a comparable functionality
 like ebay (auctions, fixed-price items etc).
 The items are placed on the market by users who want to sell their goods.

 Currently we are using Sphinx as an indexing engine, but, as Sphinx
 returns only document ids we have to make a
 database-query to fetch the data to display. This massively decreases
 performance as we have to do two requests to
 display data.

 I heard that Solr is able to return a complete dataset and we hope a
 switch to Solr can boost perfomance.
 A critical question is left and i was not able to find a solution for it
 in the docs: Is it possible to update attributes directly in the
 index?
 An example for better illustration:
 We have an index which holds all the auctions (containing auctionid,
 auction title) with its current prices(field: current_price). When a user
 places a new bid,
 is it possible to update the attribute 'current_price' directly in the
 index so that we can fetch the current_price from Solr and not from the
 database?

 I hope you understood my problem. It would be kind if someone can point me
 to the right direction.

 Thanks alot!

 Moritz





Re: Stopwords

2010-03-17 Thread Ahmet Arslan

 I was reading Scaling Lucen and Solr
 (http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr/)
 and I came across the section StopWords. 
 
 In there it mentioned that its not recommended to remove
 stop words at index
 time. Why is this the case? Don't all the extraneous
 stopwords bloat the
 index and lead to less relevant results? Can someone please
 explain this to
 me. Thanks

There were a discussion about stopwords (remove them, not to remove them, or 
index them with CommonGramsFilterFactory) and good references in this thread.

http://search-lucene.com/m/QvJtF1mIPP22/When+Stopword+Lists+Make+the+Difference 


  


Re: Weired behaviour for certain search terms

2010-03-17 Thread Ahmet Arslan

 Solr is behaving a bit weirdly for some of the search
 terms. EG:
 co-ownership, co ownership.
 It works fine with terms like quasi-delict,
 non-interference etc.
 
 The issue is, its not return any excerpts in highlighting
 key of the
 result dictionary. My search query is something like this:
 http://192.168.1.50:8080/solr/core_SFS/select?q=content:(co-ownership)+AND+permauid:(AAAE1292-rw)hl=truehl.fl=contenthl.requireFieldMatch=truehl.fragsize=600hl.usePhraseHighlighter=truefacet=truefacet.field=permauidfacet.field=info_ownerfacet.sort=truefacet.mincount=1facet.limit=-1wt=pythonsort=promulgation_date_igprs_date+ascstart=0rows=200fl=uid,permauid
 
 but when i search for terms like quasi-delict,
 non-interference, it gives me
 proper excerpts.


If the problem is only empty snippets (numFound  0) then adding 
hl.maxAnalyzedChars=-1 can help.



  


Re: SQL and $deleteDocById

2010-03-17 Thread Lukas Kahwe Smith

On 16.03.2010, at 15:42, Lukas Kahwe Smith wrote:

 Hi,
 
 I am trying to use $deleteDocById to delete rows based on an SQL query in my 
 db-data-config.xml. The following tag is a top level tag in the document 
 tag.
 
entity name=company_del query=SELECT e.id AS `$deleteDocById` ROM 
 deletedentity AS e/

thats obviously a typo from trying to simplify the example .. should be FROM

 However it seems like its only fetching the rows, its not actually issuing 
 any index deletes.


I can see that the special case handler is triggered when looking at the 
console, but no actual deletes are happening as I can verify via luke or just 
trying a query

INFO: [core1] webapp=/solr path=/dataimport 
params={command=full-importclean=false} status=0 QTime=7 
Mar 17, 2010 11:29:15 AM org.apache.solr.handler.dataimport.SolrWriter 
readIndexerProperties
INFO: Read dataimport.properties
Mar 17, 2010 11:29:15 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 
call
INFO: Creating a connection for entity company_del with URL: 
jdbc:mysql://localhost/xxx
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 
call
INFO: Time taken for getConnection(): 809
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 1
Mar 17, 2010 11:29:16 AM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1

commit{dir=/Users/lsmith/htdocs/liip/xxx/trunk/jetty/solr/core1/data_test/index,segFN=segments_9,version=1268742459863,generation=9,filenames=[_8.nrm,
 segments_9, _8.tis, _8.prx, _8.fnm, _8.tii, _8.frq, _8.fdx, _8.fdt]
Mar 17, 2010 11:29:16 AM org.apache.solr.core.SolrDeletionPolicy updateCommits
INFO: newest commit = 1268742459863
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 2
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 3
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 4
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 5
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 6
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 7
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 8
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 9
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 10
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 11
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 12
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 13
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 14
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 15
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 16
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 17
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 18
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 19
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 20
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 21
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 22
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 23
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 24
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 25
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 26
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 27
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 28
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 29
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 30
Mar 17, 2010 11:29:16 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc

Re: Will Solr fit our needs?

2010-03-17 Thread Krzysztof Grodzicki
Hi Mortiz,

You can take a look on the project ZOIE -
http://code.google.com/p/zoie/. I think it's that what are you looking
for.

br
Krzysztof

On Wed, Mar 17, 2010 at 9:49 AM, Moritz Mädler m...@moritz-maedler.de wrote:
 Hi List,

 we are running a marketplace which has about a comparable functionality like 
 ebay (auctions, fixed-price items etc).
 The items are placed on the market by users who want to sell their goods.

 Currently we are using Sphinx as an indexing engine, but, as Sphinx returns 
 only document ids we have to make a
 database-query to fetch the data to display. This massively decreases 
 performance as we have to do two requests to
 display data.

 I heard that Solr is able to return a complete dataset and we hope a switch 
 to Solr can boost perfomance.
 A critical question is left and i was not able to find a solution for it in 
 the docs: Is it possible to update attributes directly in the
 index?
 An example for better illustration:
 We have an index which holds all the auctions (containing auctionid, auction 
 title) with its current prices(field: current_price). When a user places a 
 new bid,
 is it possible to update the attribute 'current_price' directly in the index 
 so that we can fetch the current_price from Solr and not from the database?

 I hope you understood my problem. It would be kind if someone can point me to 
 the right direction.

 Thanks alot!

 Moritz


Re: Will Solr fit our needs?

2010-03-17 Thread Geert-Jan Brits
If you dont' plan on filtering/ sorting and/or faceting on fast-changing
fields it would be better to store them outside of solr/lucene in my
opinion.

If you must: for indexing-performance reasons you will probably end up with
maintaining seperate indices (1 for slow-changing/static fields and 1 for
fast-changing-fields) .
You frequently commit the fast-changing -index to incorporate the changes
in current_price. Afterwards you have 2 options I believe:

1. use parallelreader to query the seperate indices directly. Afaik, this is
not (completely) integrated in Solr... I wouldn't recommend it.
2. after you commit the fast-changing-index, merge with the static-index.
You're left with 1 fresh index, which you can push to your slave-servers.
(all this in regular interverals)

Disadvatages:
- In any way, you must be very careful with maintaining multiple parallel
indexes with the purpose of treating them as one. For instance document
inserts must be done exactly in the same order, otherwise the indices go
'out-of-sync' and are unusable.
- higher maintenance
- there is always a time-window in which the current_price values are stale.
If that's within reqs that's ok.

The other path, which I recommend, would be to store the current_price
outside of solr (like you're currently doing) but instead of using a
relational db, try looking into persistent key-value stores. Many of them
exist and a lot of progress has been made in the last couple of years. For
simple key-lookups (what you need as far as I can tell) they really blow
every relational db out of the water (considering the same hardware of
course)

We're currently using Tokyo Cabinet with the server-frontend Tokyo Tyrant
and seeing almost a 5x increased in lookup performance compared to our
previous kv-store memcachedDB which is based on BerkelyDB. Memcachedb was
already several times faster than our mysql-setup (although not optimally
tuned) .

to sum things up: use the best tools for what they were meant to do.

- index/search -- solr/ lucene without a doubt.

- kv-lookup -- consensus is still forming, and a lot of players (with a lot
of different types of functionality) but if all you need is simple
key-value-lookup, I would go for Tokyo Cabinet (TC) / Tyrant at the moment.
 Please note that TC and competitors aren't just some code/ hobby projects
but are usually born out of a real need at huge websites / social networks
such as TC which is born from mixi  (big social network in Japan) . So at
least you're in good company..

for kv-stores I would suggest to begin your research at:
http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/
(beginning
2009)
http://randomfoo.net/2009/04/20/some-notes-on-distributed-key-stores (half
2009)
and get a feel of the kv-playing field.

Hope this (pretty long) post helps,
Geert-Jan


2010/3/17 Krzysztof Grodzicki krzysztof.grodzi...@iterate.pl

 Hi Mortiz,

 You can take a look on the project ZOIE -
 http://code.google.com/p/zoie/. I think it's that what are you looking
 for.

 br
 Krzysztof

 On Wed, Mar 17, 2010 at 9:49 AM, Moritz Mädler m...@moritz-maedler.de
 wrote:
  Hi List,
 
  we are running a marketplace which has about a comparable functionality
 like ebay (auctions, fixed-price items etc).
  The items are placed on the market by users who want to sell their goods.
 
  Currently we are using Sphinx as an indexing engine, but, as Sphinx
 returns only document ids we have to make a
  database-query to fetch the data to display. This massively decreases
 performance as we have to do two requests to
  display data.
 
  I heard that Solr is able to return a complete dataset and we hope a
 switch to Solr can boost perfomance.
  A critical question is left and i was not able to find a solution for it
 in the docs: Is it possible to update attributes directly in the
  index?
  An example for better illustration:
  We have an index which holds all the auctions (containing auctionid,
 auction title) with its current prices(field: current_price). When a user
 places a new bid,
  is it possible to update the attribute 'current_price' directly in the
 index so that we can fetch the current_price from Solr and not from the
 database?
 
  I hope you understood my problem. It would be kind if someone can point
 me to the right direction.
 
  Thanks alot!
 
  Moritz



Re: SQL and $deleteDocById

2010-03-17 Thread Lukas Kahwe Smith

On 17.03.2010, at 11:36, Lukas Kahwe Smith wrote:

 
 On 16.03.2010, at 15:42, Lukas Kahwe Smith wrote:
 
 Hi,
 
 I am trying to use $deleteDocById to delete rows based on an SQL query in my 
 db-data-config.xml. The following tag is a top level tag in the document 
 tag.
 
   entity name=company_del query=SELECT e.id AS `$deleteDocById` ROM 
 deletedentity AS e/


I have managed to get things working with a different approach now:

entity name=entity pk=id query=SELECT e.id, e.name FROM entity AS e 
deletedPkQuery=SELECT e.id FROM deletedentity AS e/

regards,
Lukas Kahwe Smith
m...@pooteeweet.org





London open-source search social - 6th April

2010-03-17 Thread Richard Marr
Hi all,

We're meeting up at the Elgin just by Ladbroke Grove on the 6th for a
bit of relaxed chat about search, and related technology. Come along,
we're nice.
http://www.meetup.com/london-search-social/calendar/12781861/

It's a regular event, so if you want prior warning about future
meetups you can sign up here:
http://www.meetup.com/london-search-social/

Cheers,

Rich


Re: XML data in solr field

2010-03-17 Thread Walter Underwood
Have you considered an XML database? Because this is exactly what they are 
designed to do.

eXist is open source, or you can use Mark Logic (my employer), which is much 
faster and more scalable. We do give out free academic and community licenses 
for Mark Logic.

wunder

On Mar 16, 2010, at 11:04 PM, Nair, Manas wrote:

 Thankyou Tommy. But the real problem here is that the xml is dynamic and the 
 element names will be different in different docs which means that there will 
 be a lot of field names to be added in schema if I were to index those xml 
 nodes separately.
 Is it possible to have nested indexing (xml within xml) in solr without the 
 overhead of adding all those inner xml nodes as actual fields in solr schema?
 
 Manas
 
 
 
 From: Tommy Chheng [mailto:tommy.chh...@gmail.com]
 Sent: Tue 3/16/2010 5:05 PM
 To: solr-user@lucene.apache.org
 Subject: Re: XML data in solr field
 
 
 
 
  Do you have the option of just importing each xml node as a
 field/value when you add the document?
 
 That'll let you do the search easily. If you need to store the raw XML,
 you can use an extra field.
 
 Tommy Chheng
 Programmer and UC Irvine Graduate Student
 Twitter @tommychheng
 http://tommy.chheng.com http://tommy.chheng.com/ 
 
 
 On 3/16/10 12:59 PM, Nair, Manas wrote:
 Hello Experts,
 
 I need help on this issue of mine. I am unsure if this scenario is possible.
 I have a field in my solr document namedinputxml, the value of which is a 
 xml string as below. This xml structure is within the inputxml field value. 
 I needed help on searching this xml structure i.e. if I search  for Venue, I 
 should get Radio City Music Hall as the result and not the complete tag 
 likeVenue value=Radio City Music Hall /. Is this supported in solr?? If 
 it is, how can this be implemented??
 
 root
 Venue value=Radio City Music Hall /
 Link value=http://bit.ly/Rndab; /
 LinkText value=En savoir + /
 Address value=New-York, USA /
 /root
 
 Any help is appreciated. I donot need the tag name in the result, instead I 
 need the tag value.
 
 Thanks in advance,
 Manas Nair
 







Re: Solr 1.4 - Stemmer expansion

2010-03-17 Thread Saïd Radhouani
The configuration is correct and it works perfectly for French. So far, all
the French words I tried got stemmed correctly; except the word studios.
This is why I thought about expansion,  perhaps I might need it for other
words.

Thanks,
-Saïd


2010/3/17 Erick Erickson erickerick...@gmail.com

 Did you specify language=French? Did you re-index
 after specifying this? Can you give some examples of
 unrecognized words? Did you look in your index to see what
 was actually indexed via the admin pages and/or Luke?
 Did you use debugQuery=on to see how your search
 was parsed? Could you post your schema definitions for
 the field in question so folks can look at it?

 We need some details in order to actually be helpful G...

 Best
 Erick

 On Wed, Mar 17, 2010 at 5:05 AM, Saïd Radhouani r.steve@gmail.com
 wrote:

  I'm using the SnowballPorterFilterFactory for stemming French words. Some
  words are not reconginized by this stemmer; I wonder wether, like
 synonyms
  processing, the stemmers have the option of expansion.
 
  Thanks.
 



RE: PDFBox/Tika Performance Issues

2010-03-17 Thread Giovanni Fernandez-Kincade
Hmm. Unfortunately that didn't work. Same problem - Solr doesn't report an 
error, but the data doesn't get extracted. Using the same PDF with my previous 
/Lib contents works fine.

Any other ideas? 

These are the jar files I have in my /Lib

apache-solr-cell-1.4-dev.jar
asm-3.1.jar
bcmail-jdk15-1.45.jar
bcprov-jdk15-1.45.jar
commons-codec-1.3.jar
commons-compress-1.0.jar
commons-io-1.4.jar
commons-lang-2.1.jar
commons-logging-1.1.1.jar
dom4j-1.6.1.jar
fontbox-1.0.0.jar
geronimo-stax-api_1.0_spec-1.0.1.jar
hamcrest-core-1.1.jar
icu4j-3.8.jar
jempbox-1.0.0.jar
junit-3.8.1.jar
log4j-1.2.14.jar
lucene-core-2.9.1-dev.jar
lucene-misc-2.9.1-dev.jar
metadata-extractor-2.4.0-beta-1.jar
mockito-core-1.7.jar
nekohtml-1.9.9.jar
objenesis-1.0.jar
ooxml-schemas-1.0.jar
pdfbox-1.0.0.jar
poi-3.6.jar
poi-ooxml-3.6.jar
poi-ooxml-schemas-3.6.jar
poi-scratchpad-3.6.jar
tagsoup-1.2.jar
tika-core-0.7-SNAPSHOT.jar
tika-parsers-0.7-SNAPSHOT.jar
xercesImpl-2.8.1.jar
xml-apis-1.0.b2.jar
xmlbeans-2.3.0.jar

-Original Message-
From: Mattmann, Chris A (388J) [mailto:chris.a.mattm...@jpl.nasa.gov] 
Sent: Tuesday, March 16, 2010 11:50 PM
To: solr-user@lucene.apache.org
Subject: Re: PDFBox/Tika Performance Issues

Hi Giovanni,

Comments below:

 I'm pretty unclear on how to patch the Tika 0.7-trunk on our Solr instance.
 This is what I've tried so far (which was really just me guessing):
 
 
 
 1. Got the latest version of the trunk code from
 http://svn.apache.org/repos/asf/lucene/tika/trunk
 
 2. Built this using Maven (mvn install)
 

On track so far.

 3. I took the resulting tika-app-0.7-SNAPSHOT.jar, copied it to the /Lib
 folder for my Solr Core, and renamed it to the name of the existing Tika Jar
 (tika-0.3.jar).

I don't think you need to do this (w.r.t to the renaming). I think what you
need to do is to drop:

tika-core-0.7-SNAPSHOT.jar
tika-parsers-0.7-SNAPSHOT.jar

Into your Solr core /lib folder. Also you should make sure to take the
updated PDFBox 1.0.0 jar (you can get this by typing mvn:copy-dependencies
in the tika-parsers project, see here:
http://maven.apache.org/plugins/maven-dependency-plugin/copy-dependencies-mo
jo.html), along with the rest of the jar deps for tika-parsers and drop them
in there as well. Then, make sure to remove the existing tika-0.3.jar, as
well as any of the existing parser lib jar files and replace them with the
new deps.

A bunch of manual labor yes, but you're on the bleeding edge, so c'est la
vie, right? :) The alternative is to wait for Tika 0.7 to be released and
then for Solr to upgrade to it.

 
 4. Then I bounced my servlet server and tried indexing a document. The
 document was successfully indexed, and there were no errors logged as a
 result, but the PDF data does not appear to have been extracted (the field I
 used for map.content had an empty-string as a value).

I think probably has to do with the lib deps. Try what I mentioned above and
let's go from there.

Cheers,
Chris

 -Original Message-
 From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com]
 Sent: Tuesday, March 16, 2010 5:41 PM
 To: solr-user@lucene.apache.org
 Subject: RE: PDFBox/Tika Performance Issues
 
 
 
 Thanks Chris!
 
 
 
 I'll try the patch.
 
 
 
 -Original Message-
 
 From: Mattmann, Chris A (388J) [mailto:chris.a.mattm...@jpl.nasa.gov]
 
 Sent: Tuesday, March 16, 2010 5:37 PM
 
 To: solr-user@lucene.apache.org
 
 Subject: Re: PDFBox/Tika Performance Issues
 
 
 
 Guys, I think this is an issue with PDFBOX and the version that Tika 0.6
 depends on. Tika 0.7-trunk upgraded to PDFBox 1.0.0 (see [1]), so it may
 include a fix for the problem you're seeing.
 
 
 
 See this discussion [2] on how to patch Tika to use the new PDFBox if you
 can't wait for the 0.7 release which should happen soon (hopefully next few
 weeks).
 
 
 
 Cheers,
 
 Chris
 
 
 
 [1] http://issues.apache.org/jira/browse/TIKA-380
 
 [2] http://www.mail-archive.com/tika-u...@lucene.apache.org/msg00302.html
 
 
 
 
 
 On 3/16/10 2:31 PM, Giovanni Fernandez-Kincade
 gfernandez-kinc...@capitaliq.com wrote:
 
 
 
 Originally 16 (the number of CPUs on the machine), but even with 5 threads
 it's not looking so hot.
 
 
 
 -Original Message-
 
 From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant Ingersoll
 
 Sent: Tuesday, March 16, 2010 5:15 PM
 
 To: solr-user@lucene.apache.org
 
 Subject: Re: PDFBox/Tika Performance Issues
 
 
 
 Hmm, that is an ugly thing in PDFBox.  We should probably take this over to
 the PDFBox project.  How many threads are you indexing with?
 
 
 
 FWIW, for that many documents, I might consider using Tika on the client side
 to save on a lot of network traffic.
 
 
 
 -Grant
 
 
 
 On Mar 16, 2010, at 4:37 PM, Giovanni Fernandez-Kincade wrote:
 
 
 
 I've been trying to bulk index about 11 million PDFs, and while profiling our
 Solr instance, I noticed that all of the threads that are processing indexing
 requests are constantly blocking 

Re: Stopwords

2010-03-17 Thread Glen Newton
That discussion cites a paper via a URL:
http://doc.rero.ch/lm.php?url#16;00,43,4,20091218142456-GY/Dolamic_Ljiljana__When_Stopword_Lists_Make_the_Difference_20091218.pdf

Unfortunately when I go to this URL I get:
 L'accès à ce document est limité.

But I tracked down the paper. Here is its reference (which may require
a subscription: sorry):
US: http://dx.doi.org/10.1002/asi.21186
AU: Ljiljana Dolamic
AU: Jacques Savoy
TI: When stopword lists make the difference
SO: Journal of the American Society for Information Science and Technology
VL: 61
NO: 1
PG: 200-203
YR: 2010
CP: © 2009 ASIST
ON: 1532-2890
PN: 1532-2882
AD: Computer Science Department, University of Neuchâtel, 2009
Neuchâtel, Switzerland
DOI: 10.1002/asi.21186

-Glen

On 17 March 2010 06:02, Ahmet Arslan iori...@yahoo.com wrote:

 I was reading Scaling Lucen and Solr
 (http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr/)
 and I came across the section StopWords.

 In there it mentioned that its not recommended to remove
 stop words at index
 time. Why is this the case? Don't all the extraneous
 stopwords bloat the
 index and lead to less relevant results? Can someone please
 explain this to
 me. Thanks

 There were a discussion about stopwords (remove them, not to remove them, or 
 index them with CommonGramsFilterFactory) and good references in this thread.

 http://search-lucene.com/m/QvJtF1mIPP22/When+Stopword+Lists+Make+the+Difference







-- 

-


RE: Indexing CLOB Column in Oracle

2010-03-17 Thread Craig Christman
To convert an XMLTYPE to CLOB use the getClobVal() method like this:

SELECT d.XML.getClobVal() FROM DOC d WHERE d.ARCHIVE_ID = '${doc.ARCHIVE_ID}'


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org]
Sent: Tuesday, March 16, 2010 7:37 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexing CLOB Column in Oracle

Disclaimer:  My Oracle experience is miniscule at best.  I am also a
beginner at Solr, so grab yourself the proverbial grain of salt.

I googled a bit on CLOB.  One page I found mentioned setting up a view
to return the data type you want.  Can you use the functions described
on these pages in either the Solr query or a view?

http://www.oradev.com/dbms_lob.jsp
http://www.dba-oracle.com/t_dbms_lob.htm
http://www.praetoriate.com/dbms_packages/ddp_dbms_lob.htm

I also was trying to find a way to convert from xmltype directly to a
string in a query, but that quickly got way over my level of
understanding.  I saw hints that it is possible, though.

Shawn

On 3/16/2010 4:59 PM, Neil Chaudhuri wrote:
 Since my original thread was straying to a new topic, I thought it made sense 
 to create a new thread of discussion.

 I am using the DataImportHandler to index 3 fields in a table: an id, a date, 
 and the text of a document. This is an Oracle database, and the document is 
 an XML document stored as Oracle's xmltype data type, which is an instance of 
 oracle.sql.OPAQUE. Still, it is nothing more than a fancy clob.




Re: spanish solr tutorial

2010-03-17 Thread Grant Ingersoll
Very nice.  I'd suggest adding a link to the wiki near the tutorial link.

-Grant

On Mar 16, 2010, at 11:44 PM, Juan Pedro Danculovic wrote:

 Hi all, we translated the Solr tutorial to Spanish due to a client's
 request. For all you Spanish speakers/readers out there, you can have a look
 at it:
 
 http://www.linebee.com/?p=155
 
 We hope this can expand the usage of the project and lower the language
 barrier to non-english speakers.
 
 Thanks
 
 Juan Danculovic
 CTO - www.linebee.com




Re: Stopwords

2010-03-17 Thread Anthony Serfes

They apparently moved it .. it's here now:
http://doc.rero.ch/lm.php?url=1000,43,4,20091218142456-GY/Dolamic_Ljiljana_-_When_Stopword_Lists_Make_the_Difference_20091218.pdf


--
From: Glen Newton glen.new...@gmail.com
Sent: Wednesday, March 17, 2010 11:13 AM
To: solr-user@lucene.apache.org
Subject: Re: Stopwords


That discussion cites a paper via a URL:
http://doc.rero.ch/lm.php?url#16;00,43,4,20091218142456-GY/Dolamic_Ljiljana__When_Stopword_Lists_Make_the_Difference_20091218.pdf

Unfortunately when I go to this URL I get:
L'accès à ce document est limité.

But I tracked down the paper. Here is its reference (which may require
a subscription: sorry):
US: http://dx.doi.org/10.1002/asi.21186
AU: Ljiljana Dolamic
AU: Jacques Savoy
TI: When stopword lists make the difference
SO: Journal of the American Society for Information Science and Technology
VL: 61
NO: 1
PG: 200-203
YR: 2010
CP: © 2009 ASIST
ON: 1532-2890
PN: 1532-2882
AD: Computer Science Department, University of Neuchâtel, 2009
Neuchâtel, Switzerland
DOI: 10.1002/asi.21186

-Glen

On 17 March 2010 06:02, Ahmet Arslan iori...@yahoo.com wrote:



I was reading Scaling Lucen and Solr
(http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr/)
and I came across the section StopWords.

In there it mentioned that its not recommended to remove
stop words at index
time. Why is this the case? Don't all the extraneous
stopwords bloat the
index and lead to less relevant results? Can someone please
explain this to
me. Thanks


There were a discussion about stopwords (remove them, not to remove them, 
or index them with CommonGramsFilterFactory) and good references in this 
thread.


http://search-lucene.com/m/QvJtF1mIPP22/When+Stopword+Lists+Make+the+Difference








--

- 




Re: Stopwords

2010-03-17 Thread Grant Ingersoll

On Mar 16, 2010, at 9:51 PM, blargy wrote:

 
 I was reading Scaling Lucen and Solr
 (http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr/)
 and I came across the section StopWords. 
 
 In there it mentioned that its not recommended to remove stop words at index
 time. Why is this the case? Don't all the extraneous stopwords bloat the
 index and lead to less relevant results? Can someone please explain this to
 me. Thanks

Yes and no.  Putting our historian hat on, stop words were often seen as 
contributing very little to scores and also taking up a lot of room on disk 
back in the days when disk was very precious.  Times, as they say, have 
changed.  Disk is cheap, so that is no longer a concern.  

Think about stop words a little bit from a language perspective, while it is 
true that they are of little value in search, they are not of no value (if 
they are of no value in a language, one could argue that the word shouldn't 
even exist, right?).  This is especially true when the user enters a query that 
is entirely stop words (for instance, there is a band called The THE).  Thus, 
the trick becomes knowing when to use stop words and when not to.  If you 
remove them at indexing time, you have no choice, as the information is lost, 
so that is why more and more people keep them during indexing and then deal 
with them at query time.  Turns out, stop words are often also useful as part 
of phrases.  Consider the following two documents:

1. The President of the United States went to China last week.
2. Joe is the President.  The United States is investigating him for corruption.

If the user enters the query The President of the United States and stop 
words are removed at indexing and search time, then both documents will match, 
whereas with stop words, the first is the only (and correct) match at least 
based on my intent.

To deal with them at query time, you need an intelligent query parser that:
1. Recognizes when the query is all stop words
2. Keeps stop words as part of phrases

Unfortunately, none of the existing Solr Query Parsers address these two things.

HTH,
Grant


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search



Re: Stopwords

2010-03-17 Thread Robert Muir
On Wed, Mar 17, 2010 at 11:48 AM, Grant Ingersoll gsing...@apache.org wrote:

 Yes and no.  Putting our historian hat on, stop words were often seen as 
 contributing very little to scores and also taking up a lot of room on disk 
 back in the days when disk was very precious.  Times, as they say, have 
 changed.  Disk is cheap, so that is no longer a concern.


Yes, and the take-away from the Dolamic and Savoy paper is that,
performance-aside, removing stopwords is still a necessary evil for
good relevance, at least for some languages.

Ideally we wouldn't have to remove information to have good relevance,
and a good step forward would be to support relevance-ranking
algorithms such as the BM25* mentioned in the paper, that provide good
relevance without the need to remove stopwords.

For now, at least the CommonGrams solution is available in Solr that
provides an alternative which can address both concerns (performance
and relevance) to some degree.

-- 
Robert Muir
rcm...@gmail.com


Re: Stopwords

2010-03-17 Thread Mark Miller

On 03/17/2010 12:03 PM, Robert Muir wrote:

On Wed, Mar 17, 2010 at 11:48 AM, Grant Ingersollgsing...@apache.org  wrote:

   

Yes and no.  Putting our historian hat on, stop words were often seen as 
contributing very little to scores and also taking up a lot of room on disk 
back in the days when disk was very precious.  Times, as they say, have 
changed.  Disk is cheap, so that is no longer a concern.

 

Yes, and the take-away from the Dolamic and Savoy paper is that,
performance-aside, removing stopwords is still a necessary evil for
good relevance, at least for some languages.

Ideally we wouldn't have to remove information to have good relevance,
and a good step forward would be to support relevance-ranking
algorithms such as the BM25* mentioned in the paper, that provide good
relevance without the need to remove stopwords.

For now, at least the CommonGrams solution is available in Solr that
provides an alternative which can address both concerns (performance
and relevance) to some degree.

   


In general I prefer to have the option of removing stopwords at query 
time (common grams solution aside).


Too many times have I removed stopwords and had user complaints about 
phrase and proximity queries, and no server downtime to reindex and fix 
the issue.


It was never fun supporting Librarians.

--
- Mark

http://www.lucidimagination.com





Re: Exception encountered during replication on slave....Any clues?

2010-03-17 Thread JavaGuy84

Hi William,

We are facing the same issue as yourself.. just thought of checking if you
had already resolve this issue?

Thanks,
Barani


William Pierce-3 wrote:
 
 Folks:
 
 I am seeing this exception in my logs that is causing my replication to
 fail.I start with  a clean slate (empty data directory).  I index the
 data on the postingsmaster using the dataimport handler and it succeeds. 
 When the replication slave attempts to replicate it encounters this error. 
 
 Dec 7, 2009 9:20:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 SEVERE: Master at: http://localhost/postingsmaster/replication is not
 available. Index fetch failed. Exception: Invalid version or the data in
 not in 'javabin' format
 
 Any clues as to what I should look for to debug this further?  
 
 Replication is enabled as follows:
 
 The postingsmaster solrconfig.xml looks as follows:
 
 requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
   !--Replicate on 'optimize' it can also be  'commit' --
   str name=replicateAftercommit/str
   !--If configuration files need to be replicated give the names here
 . comma separated --
   str name=confFiles/str
 /lst
   /requestHandler
 
 The postings slave solrconfig.xml looks as follows:
 
 requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=slave
 !--fully qualified url for the replication handler of master --
 str
 name=masterUrlhttp://localhost/postingsmaster/replication/str  
 !--Interval in which the slave should poll master .Format is
 HH:mm:ss . If this is absent slave does not poll automatically. 
  But a snappull can be triggered from the admin or the http API
 --
 str name=pollInterval00:05:00/str  
  /lst
   /requestHandler
 
 
 Thanks,
 
 - Bill
 
 
 
 

-- 
View this message in context: 
http://old.nabble.com/Exception-encountered-during-replication-on-slaveAny-clues--tp26684769p27933575.html
Sent from the Solr - User mailing list archive at Nabble.com.



Replication failed due to HTTP PROXY?

2010-03-17 Thread JavaGuy84

Hi,

One of my collegue back in India is not able to replicate the index present
in the Servers (USA).

I am now thinking if this is due to any proxy related issue? He is getting
the below metioned error message

Is there a way to configure PROXY in SOLR config files?

Server logs
INFO: [] Registered new searcher searc...@edf730 main
Mar 17, 2010 8:38:06 PM org.apache.solr.handler.ReplicationHandler
getReplicatio
nDetails
WARNING: Exception while invoking 'details' method for replication on master
org.apache.commons.httpclient.ConnectTimeoutException: The host did not
accept t
he connection within timeout of 5000 ms
at
org.apache.commons.httpclient.protocol.ReflectionSocketFactory.create
Socket(ReflectionSocketFactory.java:155)
at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.c
reateSocket(DefaultProtocolSocketFactory.java:125)
at
org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java
:707)
at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Http
ConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361)
at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(Htt
pMethodDirector.java:387)
at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMe
thodDirector.java:171)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.jav
a:397)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.jav
a:323)
at
org.apache.solr.handler.SnapPuller.getNamedListResponse(SnapPuller.ja
va:193)
at
org.apache.solr.handler.SnapPuller.getCommandResponse(SnapPuller.java
:188)
at
org.apache.solr.handler.ReplicationHandler.getReplicationDetails(Repl
icationHandler.java:581)
at
org.apache.solr.handler.ReplicationHandler.handleRequestBody(Replicat
ionHandler.java:180)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
erBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.jsp.admin.replication.index_jsp.executeCommand(org.apache.
jsp.admin.replication.index_jsp:50)
at
org.apache.jsp.admin.replication.index_jsp._jspService(org.apache.jsp
.admin.replication.index_jsp:231)
at
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:80)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper
.java:373)
at
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:4
64)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:358)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487
)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
67)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
a:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
81)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
12)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)

at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268)
at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
r.java:264)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
Handler.java:1089)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
65)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
a:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
81)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
12)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)

at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand
lerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1
39)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:50
2)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpCo
nnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.
java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool
.java:442)
Caused by: 

related search

2010-03-17 Thread Suram

How can i make related search in solr.if i search ipod i need to get answer
like ipodsuffle,ipodnano,ipone with out using morelikethis option
-- 
View this message in context: 
http://old.nabble.com/related-search-tp27933778p27933778.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr RAM Requirements

2010-03-17 Thread Tom Burton-West

Hi Chak

Rather than comparing the overall size of your index to the RAM available
for the OS disk cache, you might want to look at particular files. For
example if you allow phrase queries, than the size of the *prx files is
relevant, if you don't, you can look at the size of your *frq files.   You
also might want to take a look at the free memory when you start up Solr and
then watch as it fills up as you get more queries (or send cache-warming
queries).   

Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search





KaktuChakarabati wrote:
 
 My question was mainly about the fact there seems to be two different
 aspects to the solr RAM usage: in-process and out-process. 
 By that I mean, yes i know the many different parameters/caches to do with
 solr in-process memory usage and related culprits, however I also
 understand that as for actual index access (posting list, positional index
 etc), solr mostly delegates the access/caching of this to the OS/disk
 cache. 
 So I guess my question is more about that: namely, what would be a good
 way to calculate an overall ram requirement profile for a server running
 solr? 
 
-- 
View this message in context: 
http://old.nabble.com/Solr-RAM-Requirements-tp27924551p27933779.html
Sent from the Solr - User mailing list archive at Nabble.com.



Querying multiple fields with the MoreLikeThis handler and mlt.fl

2010-03-17 Thread Alf Eaton
I'm wondering if there's been any progress on an issue described a
year or so ago in More details on my MoreLikeThis mlt.qf boosting
problem http://markmail.org/thread/nmabm5ly3wk2nqyy,
where it was pointed out that the MoreLikeThis handler only queries
one field for each of the interesting terms that it finds in the
input text.

I was hoping that using
/mlt?mlt.fl=title+textmlt.qf=title^2+text^0.5mlt.interestingTerms=detailsstream.body=tony+blair
would produce
title:tony^2 title:blair^2 text:tony^0.5 text:blair^0.5
but it actually produces just
text:tony^0.5 text:blair^0.5

i.e. despite including the title field in both mlt.qf and mlt.fl, it
only searches the text field.

If I set mlt.fl=title, it produces
title:tony^2 title:blair^2
so it is having an effect, just not the one I'm hoping for...

As it stands, in Solr 1.4, the MoreLikeThis result set from the query
above doesn't produce the document with title Tony Blair as the
first result, which would seem appropriate given the input text tony
blair and a boost on the title field.

alf


XPath Processing Applied to Clob

2010-03-17 Thread Neil Chaudhuri
I am using the DataImportHandler to index 3 fields in a table: an id, a date, 
and the text of a document. This is an Oracle database, and the document is an 
XML document stored as Oracle's xmltype data type. Since this is nothing more 
than a fancy CLOB, I am using the ClobTransformer to extract the actual XML. 
However, I don't want to index/store all the XML but instead just the XML 
within a set of tags. The XPath itself is trivial, but it seems like the 
XPathEntityProcessor only works for XML file content rather than the output of 
a Transformer.

Here is what I currently have that fails:


document

entity name=doc query=SELECT d.EFFECTIVE_DT, d.ARCHIVE_ID, 
d.XML.getClobVal() AS TEXT FROM DOC d transformer=ClobTransformer

field column=EFFECTIVE_DT name=effectiveDate /

field column=ARCHIVE_ID name=id /

field column=TEXT name=text clob=true
entity name=text processor=XPathEntityProcessor 
forEach=/MESSAGE url=${doc.text}
field column=body xpath=//BODY/

/entity

/entity

/document


Is there an easy way to do this without writing my own custom transformer?

Thanks.


Trouble getting results from Dismax query

2010-03-17 Thread Alex Thurlow
I'm trying to use the Dismax request handler, and thanks to the list, I 
fixed one problem, which was the existing configs in solrconfig.xml.  
I'm now just not getting any result from the query though.  I changed 
the dismax section in solrconfig.xml to this:


requestHandler name=dismax class=solr.SearchHandler 
lst name=defaults
str name=defTypedismax/str
str name=echoParamsexplicit/str
float name=tie0.01/float

int name=ps100/int
str name=q.alt*:*/str

str name=hl.flartist title/str

str name=f.name.hl.fragsize0/str

str name=f.name.hl.alternateFieldtitle/str
str name=f.text.hl.fragmenterregex/str
/lst
/requestHandler

I'm using solr-php-client for my query, and the code looks like this:
$params['qt'] = 'dismax';
$params['qf'] = 'title^100 artist^150 description^5 tags^'.$tagboost.' 
artist_title^500 featuring_artist^20 collaborators^50';

$params['pf'] = 'artist_title';

 $query = title:$search artist:$search description:$search 
tags:$search +type:$type artist_title:$search featuring_artist:$search 
collaborators:$search;

$response = $solr-search( $query, 0, 30 ,$params);

The raw query ends up as this:
/solr/select?qt=dismaxqf=title%5E100+artist%5E150+description%5E5+tags%5E10+artist_title%5E500+featuring_artist%5E20+collaborators%5E50pf=artist_titleq=title%3Aakon+artist%3Aakon+description%3Aakon+tags%3Aakon+%2Btype%3Avideo+artist_title%3Aakon+featuring_artist%3Aakon+collaborators%3Aakonversion=1.2wt=jsonjson.nl=mapstart=0rows=30

Responseheader is this:
 {responseHeader:{status:0,QTime:9,params:{pf:artist_title,start:0,q:title:akon artist:akon description:akon tags:akon +type:video artist_title:akon featuring_artist:akon 
collaborators:akon,qf:title^100 artist^150 description^5 tags^10 artist_title^500 featuring_artist^20 
collaborators^50,json.nl:map,qt:dismax,wt:json,version:1.2,rows:30}},response:{numFound:0,start:0,docs:[]}}

If I remove the qt=dismax, I get results like I should.  Can anyone shed 
some light?


Thanks,
Alex




Re: Trouble getting results from Dismax query

2010-03-17 Thread Erik Hatcher


On Mar 17, 2010, at 3:38 PM, Alex Thurlow wrote:

I'm trying to use the Dismax request handler, and thanks to the  
list, I fixed one problem, which was the existing configs in  
solrconfig.xml.  I'm now just not getting any result from the query  
though.  I changed the dismax section in solrconfig.xml to this:


$query = title:$search artist:$search description:$search tags: 
$search +type:$type artist_title:$search featuring_artist:$search  
collaborators:$search;

$response = $solr-search( $query, 0, 30 ,$params);

The raw query ends up as this:
/solr/select?qt=dismaxqf=title%5E100+artist%5E150+description 
%5E5+tags%5E10+artist_title%5E500+featuring_artist%5E20+collaborators 
%5E50pf=artist_titleq=title%3Aakon+artist%3Aakon+description%3Aakon 
+tags%3Aakon+%2Btype%3Avideo+artist_title%3Aakon+featuring_artist 
%3Aakon+collaborators 
%3Aakonversion=1.2wt=jsonjson.nl=mapstart=0rows=30


The dismax parser does not support fielded queries, so title$search...  
etc is not parsing as you expect.  qf/pf control the fields searched.   
If you need fielded searches like you're attempting, you'll need to  
overhaul how you're doing the parsing.


You'll likely also want to tune the mm parameter.

If I remove the qt=dismax, I get results like I should.  Can anyone  
shed some light?


Right, because the default is to use the SolrQueryParser, which  
supports fielded syntax.


Erik



RE: XPath Processing Applied to Clob

2010-03-17 Thread Neil Chaudhuri
Incidentally, I tried adding this:

datasource name=f type=FieldReaderDataSource /
document
entity dataSource=f processor=XPathEntityProcessor 
dataField=d.text forEach=/MESSAGE
  field column=body xpath=//BODY/
/entity
/document

But this didn't seem to change anything.

Any insight is appreciated.

Thanks.



From: Neil Chaudhuri
Sent: Wednesday, March 17, 2010 3:24 PM
To: solr-user@lucene.apache.org
Subject: XPath Processing Applied to Clob

I am using the DataImportHandler to index 3 fields in a table: an id, a date, 
and the text of a document. This is an Oracle database, and the document is an 
XML document stored as Oracle's xmltype data type. Since this is nothing more 
than a fancy CLOB, I am using the ClobTransformer to extract the actual XML. 
However, I don't want to index/store all the XML but instead just the XML 
within a set of tags. The XPath itself is trivial, but it seems like the 
XPathEntityProcessor only works for XML file content rather than the output of 
a Transformer.

Here is what I currently have that fails:


document

entity name=doc query=SELECT d.EFFECTIVE_DT, d.ARCHIVE_ID, 
d.XML.getClobVal() AS TEXT FROM DOC d transformer=ClobTransformer

field column=EFFECTIVE_DT name=effectiveDate /

field column=ARCHIVE_ID name=id /

field column=TEXT name=text clob=true
entity name=text processor=XPathEntityProcessor 
forEach=/MESSAGE url=${doc.text}
field column=body xpath=//BODY/

/entity

/entity

/document


Is there an easy way to do this without writing my own custom transformer?

Thanks.


Re: XPath Processing Applied to Clob

2010-03-17 Thread Lance Norskog
The XPath parser in the DIH is a limited implementation. The unit test
program is the only enumeration (that I can find) of what it handles:

http://svn.apache.org/repos/asf/lucene/solr/trunk/contrib/dataimporthandler/src/test/java/org/apache/solr/handler/dataimport/TestXPathRecordReader.java

//BODY in fact is not allowed, and should throw an Exception. Or at
least some kind of error message. Perhaps there is one in the logs?


On Wed, Mar 17, 2010 at 2:45 PM, Neil Chaudhuri
nchaudh...@potomacfusion.com wrote:
 Incidentally, I tried adding this:

 datasource name=f type=FieldReaderDataSource /
 document
        entity dataSource=f processor=XPathEntityProcessor 
 dataField=d.text forEach=/MESSAGE
                  field column=body xpath=//BODY/
        /entity
 /document

 But this didn't seem to change anything.

 Any insight is appreciated.

 Thanks.



 From: Neil Chaudhuri
 Sent: Wednesday, March 17, 2010 3:24 PM
 To: solr-user@lucene.apache.org
 Subject: XPath Processing Applied to Clob

 I am using the DataImportHandler to index 3 fields in a table: an id, a date, 
 and the text of a document. This is an Oracle database, and the document is 
 an XML document stored as Oracle's xmltype data type. Since this is nothing 
 more than a fancy CLOB, I am using the ClobTransformer to extract the actual 
 XML. However, I don't want to index/store all the XML but instead just the 
 XML within a set of tags. The XPath itself is trivial, but it seems like the 
 XPathEntityProcessor only works for XML file content rather than the output 
 of a Transformer.

 Here is what I currently have that fails:


 document

        entity name=doc query=SELECT d.EFFECTIVE_DT, d.ARCHIVE_ID, 
 d.XML.getClobVal() AS TEXT FROM DOC d transformer=ClobTransformer

            field column=EFFECTIVE_DT name=effectiveDate /

            field column=ARCHIVE_ID name=id /

            field column=TEXT name=text clob=true
            entity name=text processor=XPathEntityProcessor 
 forEach=/MESSAGE url=${doc.text}
                field column=body xpath=//BODY/

            /entity

        /entity

 /document


 Is there an easy way to do this without writing my own custom transformer?

 Thanks.




-- 
Lance Norskog
goks...@gmail.com


Re: Will Solr fit our needs?

2010-03-17 Thread Lance Norskog
Another option is the ExternalFileField:

http://www.lucidimagination.com/search/document/CDRG_ch04_4.4.4?q=ExternalFileField

This lets you store the current prices for all items in a separate
file. You can only use it in a function query, that is. But it does
allow you to maintain one Solr index, which is very very worthwhile.

On Wed, Mar 17, 2010 at 4:19 AM, Geert-Jan Brits gbr...@gmail.com wrote:
 If you dont' plan on filtering/ sorting and/or faceting on fast-changing
 fields it would be better to store them outside of solr/lucene in my
 opinion.

 If you must: for indexing-performance reasons you will probably end up with
 maintaining seperate indices (1 for slow-changing/static fields and 1 for
 fast-changing-fields) .
 You frequently commit the fast-changing -index to incorporate the changes
 in current_price. Afterwards you have 2 options I believe:

 1. use parallelreader to query the seperate indices directly. Afaik, this is
 not (completely) integrated in Solr... I wouldn't recommend it.
 2. after you commit the fast-changing-index, merge with the static-index.
 You're left with 1 fresh index, which you can push to your slave-servers.
 (all this in regular interverals)

 Disadvatages:
 - In any way, you must be very careful with maintaining multiple parallel
 indexes with the purpose of treating them as one. For instance document
 inserts must be done exactly in the same order, otherwise the indices go
 'out-of-sync' and are unusable.
 - higher maintenance
 - there is always a time-window in which the current_price values are stale.
 If that's within reqs that's ok.

 The other path, which I recommend, would be to store the current_price
 outside of solr (like you're currently doing) but instead of using a
 relational db, try looking into persistent key-value stores. Many of them
 exist and a lot of progress has been made in the last couple of years. For
 simple key-lookups (what you need as far as I can tell) they really blow
 every relational db out of the water (considering the same hardware of
 course)

 We're currently using Tokyo Cabinet with the server-frontend Tokyo Tyrant
 and seeing almost a 5x increased in lookup performance compared to our
 previous kv-store memcachedDB which is based on BerkelyDB. Memcachedb was
 already several times faster than our mysql-setup (although not optimally
 tuned) .

 to sum things up: use the best tools for what they were meant to do.

 - index/search -- solr/ lucene without a doubt.

 - kv-lookup -- consensus is still forming, and a lot of players (with a lot
 of different types of functionality) but if all you need is simple
 key-value-lookup, I would go for Tokyo Cabinet (TC) / Tyrant at the moment.
  Please note that TC and competitors aren't just some code/ hobby projects
 but are usually born out of a real need at huge websites / social networks
 such as TC which is born from mixi  (big social network in Japan) . So at
 least you're in good company..

 for kv-stores I would suggest to begin your research at:
 http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/
 (beginning
 2009)
 http://randomfoo.net/2009/04/20/some-notes-on-distributed-key-stores (half
 2009)
 and get a feel of the kv-playing field.

 Hope this (pretty long) post helps,
 Geert-Jan


 2010/3/17 Krzysztof Grodzicki krzysztof.grodzi...@iterate.pl

 Hi Mortiz,

 You can take a look on the project ZOIE -
 http://code.google.com/p/zoie/. I think it's that what are you looking
 for.

 br
 Krzysztof

 On Wed, Mar 17, 2010 at 9:49 AM, Moritz Mädler m...@moritz-maedler.de
 wrote:
  Hi List,
 
  we are running a marketplace which has about a comparable functionality
 like ebay (auctions, fixed-price items etc).
  The items are placed on the market by users who want to sell their goods.
 
  Currently we are using Sphinx as an indexing engine, but, as Sphinx
 returns only document ids we have to make a
  database-query to fetch the data to display. This massively decreases
 performance as we have to do two requests to
  display data.
 
  I heard that Solr is able to return a complete dataset and we hope a
 switch to Solr can boost perfomance.
  A critical question is left and i was not able to find a solution for it
 in the docs: Is it possible to update attributes directly in the
  index?
  An example for better illustration:
  We have an index which holds all the auctions (containing auctionid,
 auction title) with its current prices(field: current_price). When a user
 places a new bid,
  is it possible to update the attribute 'current_price' directly in the
 index so that we can fetch the current_price from Solr and not from the
 database?
 
  I hope you understood my problem. It would be kind if someone can point
 me to the right direction.
 
  Thanks alot!
 
  Moritz





-- 
Lance Norskog
goks...@gmail.com


Re: XML data in solr field

2010-03-17 Thread Lance Norskog
You can use dynamic fields (wildcard field names) to add any and all
element names. You would have to add a suffix to every element name in
your preparation, but you will not have to add all of the element
names to your schema.

On Wed, Mar 17, 2010 at 7:04 AM, Walter Underwood wun...@wunderwood.org wrote:
 Have you considered an XML database? Because this is exactly what they are 
 designed to do.

 eXist is open source, or you can use Mark Logic (my employer), which is much 
 faster and more scalable. We do give out free academic and community licenses 
 for Mark Logic.

 wunder

 On Mar 16, 2010, at 11:04 PM, Nair, Manas wrote:

 Thankyou Tommy. But the real problem here is that the xml is dynamic and the 
 element names will be different in different docs which means that there 
 will be a lot of field names to be added in schema if I were to index those 
 xml nodes separately.
 Is it possible to have nested indexing (xml within xml) in solr without the 
 overhead of adding all those inner xml nodes as actual fields in solr schema?

 Manas

 

 From: Tommy Chheng [mailto:tommy.chh...@gmail.com]
 Sent: Tue 3/16/2010 5:05 PM
 To: solr-user@lucene.apache.org
 Subject: Re: XML data in solr field




  Do you have the option of just importing each xml node as a
 field/value when you add the document?

 That'll let you do the search easily. If you need to store the raw XML,
 you can use an extra field.

 Tommy Chheng
 Programmer and UC Irvine Graduate Student
 Twitter @tommychheng
 http://tommy.chheng.com http://tommy.chheng.com/


 On 3/16/10 12:59 PM, Nair, Manas wrote:
 Hello Experts,

 I need help on this issue of mine. I am unsure if this scenario is possible.
 I have a field in my solr document namedinputxml, the value of which is a 
 xml string as below. This xml structure is within the inputxml field value. 
 I needed help on searching this xml structure i.e. if I search  for Venue, 
 I should get Radio City Music Hall as the result and not the complete tag 
 likeVenue value=Radio City Music Hall /. Is this supported in solr?? If 
 it is, how can this be implemented??

 root
 Venue value=Radio City Music Hall /
 Link value=http://bit.ly/Rndab; /
 LinkText value=En savoir + /
 Address value=New-York, USA /
 /root

 Any help is appreciated. I donot need the tag name in the result, instead I 
 need the tag value.

 Thanks in advance,
 Manas Nair










-- 
Lance Norskog
goks...@gmail.com


Re: Exception encountered during replication on slave....Any clues?

2010-03-17 Thread Lance Norskog
The localhost URLs have no port numbers.

Is there a more complete error in the logs?

On Wed, Mar 17, 2010 at 9:15 AM, JavaGuy84 bbar...@gmail.com wrote:

 Hi William,

 We are facing the same issue as yourself.. just thought of checking if you
 had already resolve this issue?

 Thanks,
 Barani


 William Pierce-3 wrote:

 Folks:

 I am seeing this exception in my logs that is causing my replication to
 fail.    I start with  a clean slate (empty data directory).  I index the
 data on the postingsmaster using the dataimport handler and it succeeds.
 When the replication slave attempts to replicate it encounters this error.

 Dec 7, 2009 9:20:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 SEVERE: Master at: http://localhost/postingsmaster/replication is not
 available. Index fetch failed. Exception: Invalid version or the data in
 not in 'javabin' format

 Any clues as to what I should look for to debug this further?

 Replication is enabled as follows:

 The postingsmaster solrconfig.xml looks as follows:

 requestHandler name=/replication class=solr.ReplicationHandler 
     lst name=master
       !--Replicate on 'optimize' it can also be  'commit' --
       str name=replicateAftercommit/str
       !--If configuration files need to be replicated give the names here
 . comma separated --
       str name=confFiles/str
     /lst
   /requestHandler

 The postings slave solrconfig.xml looks as follows:

 requestHandler name=/replication class=solr.ReplicationHandler 
     lst name=slave
         !--fully qualified url for the replication handler of master --
         str
 name=masterUrlhttp://localhost/postingsmaster/replication/str
         !--Interval in which the slave should poll master .Format is
 HH:mm:ss . If this is absent slave does not poll automatically.
          But a snappull can be triggered from the admin or the http API
 --
         str name=pollInterval00:05:00/str
      /lst
   /requestHandler


 Thanks,

 - Bill





 --
 View this message in context: 
 http://old.nabble.com/Exception-encountered-during-replication-on-slaveAny-clues--tp26684769p27933575.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
Lance Norskog
goks...@gmail.com


Re: Replication failed due to HTTP PROXY?

2010-03-17 Thread Lance Norskog
A 5-second connection is not going to work trans-globally. The
replication engine is generally tested in local sites.

If it is possible to set defaults for the Apache Commons http classes
via system properties, that might let this work. This doc does not
seem promising:

http://www.jdocs.com/httpclient/3.0.1/api-index.html?m=packagep=org.apache.commons.httpclientrender=classic

On Wed, Mar 17, 2010 at 9:22 AM, JavaGuy84 bbar...@gmail.com wrote:

 Hi,

 One of my collegue back in India is not able to replicate the index present
 in the Servers (USA).

 I am now thinking if this is due to any proxy related issue? He is getting
 the below metioned error message

 Is there a way to configure PROXY in SOLR config files?

 Server logs
 INFO: [] Registered new searcher searc...@edf730 main
 Mar 17, 2010 8:38:06 PM org.apache.solr.handler.ReplicationHandler
 getReplicatio
 nDetails
 WARNING: Exception while invoking 'details' method for replication on master
 org.apache.commons.httpclient.ConnectTimeoutException: The host did not
 accept t
 he connection within timeout of 5000 ms
        at
 org.apache.commons.httpclient.protocol.ReflectionSocketFactory.create
 Socket(ReflectionSocketFactory.java:155)
        at
 org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.c
 reateSocket(DefaultProtocolSocketFactory.java:125)
        at
 org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java
 :707)
        at
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Http
 ConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361)
        at
 org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(Htt
 pMethodDirector.java:387)
        at
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMe
 thodDirector.java:171)
        at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.jav
 a:397)
        at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.jav
 a:323)
        at
 org.apache.solr.handler.SnapPuller.getNamedListResponse(SnapPuller.ja
 va:193)
        at
 org.apache.solr.handler.SnapPuller.getCommandResponse(SnapPuller.java
 :188)
        at
 org.apache.solr.handler.ReplicationHandler.getReplicationDetails(Repl
 icationHandler.java:581)
        at
 org.apache.solr.handler.ReplicationHandler.handleRequestBody(Replicat
 ionHandler.java:180)
        at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
 erBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
        at
 org.apache.jsp.admin.replication.index_jsp.executeCommand(org.apache.
 jsp.admin.replication.index_jsp:50)
        at
 org.apache.jsp.admin.replication.index_jsp._jspService(org.apache.jsp
 .admin.replication.index_jsp:231)
        at
 org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:80)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at
 org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper
 .java:373)
        at
 org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:4
 64)
        at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:358)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487
 )
        at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
 67)
        at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
 a:216)
        at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
 81)
        at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
 12)
        at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)

        at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268)
        at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
        at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
 r.java:264)
        at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
 Handler.java:1089)
        at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
 65)
        at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
 a:216)
        at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
 81)
        at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
 12)
        at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)

        at
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand
 lerCollection.java:211)
        at
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
 java:114)
        at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1
 39)
        at org.mortbay.jetty.Server.handle(Server.java:285)
        at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:50
 2)

Re: Solr Performance Issues

2010-03-17 Thread Lance Norskog
Try cutting back Solr's memory - the OS knows how to manage disk
caches better than Solr does.

Another approach is to raise and lower the queryResultCache and see if
the hitratio changes.

On Wed, Mar 17, 2010 at 9:44 AM, Siddhant Goel siddhantg...@gmail.com wrote:
 Hi,

 Apparently the bottleneck seem to be the time periods when CPU is waiting to
 do some I/O. Out of all the numbers I can see, the CPU wait times for I/O
 seem to be the highest. I've alloted 4GB to Solr out of the total 8GB
 available. There's only 47MB free on the machine, so I assume the rest of
 the memory is being used for OS disk caches. In addition, the hit ratios for
 queryResultCache isn't going beyond 20%. So the problem I think is not at
 Solr's end. Are there any pointers available on how can I resolve such
 issues related to disk I/O? Does this mean I need more overall memory? Or
 reducing the amount of memory allocated to Solr so that the disk cache has
 more memory, would help?

 Thanks,

 On Fri, Mar 12, 2010 at 11:21 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Sounds like you're pretty well on your way then. This is pretty typical
 of multi-threaded situations... Threads 1-n wait around on I/O and
 increasing the number of threads increases throughput without
 changing (much) the individual response time.

 Threads n+1 - p don't change throughput much, but increase
 the response time for each request. On aggregate, though, the
 throughput doesn't change (much).

 Adding threads after p+1 *decreases* throughput while
 *increasing* individual response time as your processors start
 spending w to much time context and/or memory
 swapping.

 The trick is finding out what n and p are G.

 Best
 Erick

 On Fri, Mar 12, 2010 at 12:06 PM, Siddhant Goel siddhantg...@gmail.com
 wrote:

  Hi,
 
  Thanks for your responses. It actually feels good to be able to locate
  where
  the bottlenecks are.
 
  I've created two sets of data - in the first one I'm measuring the time
  took
  purely on Solr's end, and in the other one I'm including network latency
  (just for reference). The data that I'm posting below contains the time
  took
  purely by Solr.
 
  I'm running 10 threads simultaneously and the average response time (for
  each query in each thread) remains close to 40 to 50 ms. But as soon as I
  increase the number of threads to something like 100, the response time
  goes
  up to ~600ms, and further up when the number of threads is close to 500.
  Yes
  the average time definitely depends on the number of concurrent requests.
 
  Going from memory, debugQuery=on will let you know how much time
   was spent in various operations in SOLR. It's important to know
   whether it was the searching, assembling the response, or
   transmitting the data back to the client.
 
 
  I just tried this. The information that it gives me for a query that took
  7165ms is - http://pastebin.ca/1835644
 
  So out of the total time 7165ms, QueryComponent took most of the time.
 Plus
  I can see the load average going up when the number of threads is really
  high. So it actually makes sense. (I didn't add any other component while
  searching; it was a plain /select?q=query call).
  Like I mentioned earlier in this mail, I'm maintaining separate sets for
  data with/without network latency, and I don't think its the bottleneck.
 
 
   How many threads does it take to peg the CPU? And what
   response times are you getting when your number of threads is
   around 10?
  
 
  If the number of threads is greater than 100, that really takes its toll
 on
  the CPU. So probably thats the number.
 
  When the number of threads is around 10, the response times average to
  something like 60ms (and 95% of the queries fall within 100ms of that
  value).
 
  Thanks,
 
 
 
 
  
   Erick
  
   On Fri, Mar 12, 2010 at 3:39 AM, Siddhant Goel siddhantg...@gmail.com
   wrote:
  
I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS
   disk
caching.
   
I think that at any point of time, there can be a maximum of number
 of
threads concurrent requests, which happens to make sense btw (does
  it?).
   
As I increase the number of threads, the load average shown by top
 goes
   up
to as high as 80%. But if I keep the number of threads low (~10), the
   load
average never goes beyond ~8). So probably thats the number of
 requests
  I
can expect Solr to serve concurrently on this index size with this
hardware.
   
Can anyone give a general opinion as to how much hardware should be
sufficient for a Solr deployment with an index size of ~43GB,
  containing
around 2.5 million documents? I'm expecting it to serve at least 20
requests
per second. Any experiences?
   
Thanks
   
On Fri, Mar 12, 2010 at 12:47 AM, Tom Burton-West 
  tburtonw...@gmail.com
wrote:
   

 How much of your memory are you allocating to the JVM and how much
  are
you
 leaving free?

  

Re: Indexing CLOB Column in Oracle

2010-03-17 Thread Lance Norskog
This could be the problem: the text field in the example schema is
indexed, but not stored. If you query the index with text:monkeys it
will find records with monkeys, but the text field will not appear
in the returned XML because it was not stored.

On Wed, Mar 17, 2010 at 11:17 AM, Neil Chaudhuri
nchaudh...@potomacfusion.com wrote:
 For those who might encounter a similar issue, merging what I had into a 
 single entity and using getClobVal() did the trick.

 In other words:

 document
        entity name=doc query=SELECT d.EFFECTIVE_DT, d.ARCHIVE_ID, 
 d.XML.getClobVal() AS TEXT FROM DOC d transformer=ClobTransformer
            field column=EFFECTIVE_DT name=effectiveDate /
            field column=ARCHIVE_ID name=id /
            field column=TEXT name=text clob=true
        /entity
 /document

 Thanks.



 -Original Message-
 From: Craig Christman [mailto:cchrist...@caci.com]
 Sent: Wednesday, March 17, 2010 11:23 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Indexing CLOB Column in Oracle

 To convert an XMLTYPE to CLOB use the getClobVal() method like this:

 SELECT d.XML.getClobVal() FROM DOC d WHERE d.ARCHIVE_ID = '${doc.ARCHIVE_ID}'


 -Original Message-
 From: Shawn Heisey [mailto:s...@elyograg.org]
 Sent: Tuesday, March 16, 2010 7:37 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Indexing CLOB Column in Oracle

 Disclaimer:  My Oracle experience is miniscule at best.  I am also a
 beginner at Solr, so grab yourself the proverbial grain of salt.

 I googled a bit on CLOB.  One page I found mentioned setting up a view
 to return the data type you want.  Can you use the functions described
 on these pages in either the Solr query or a view?

 http://www.oradev.com/dbms_lob.jsp
 http://www.dba-oracle.com/t_dbms_lob.htm
 http://www.praetoriate.com/dbms_packages/ddp_dbms_lob.htm

 I also was trying to find a way to convert from xmltype directly to a
 string in a query, but that quickly got way over my level of
 understanding.  I saw hints that it is possible, though.

 Shawn

 On 3/16/2010 4:59 PM, Neil Chaudhuri wrote:
 Since my original thread was straying to a new topic, I thought it made 
 sense to create a new thread of discussion.

 I am using the DataImportHandler to index 3 fields in a table: an id, a 
 date, and the text of a document. This is an Oracle database, and the 
 document is an XML document stored as Oracle's xmltype data type, which is 
 an instance of oracle.sql.OPAQUE. Still, it is nothing more than a fancy 
 clob.






-- 
Lance Norskog
goks...@gmail.com


Re: Dummy boost question

2010-03-17 Thread Chris Hostetter

: I want to *search* on title and content, and then, within these results 
*boost* by keyword.
...
:   str name=bqkeyword:(*.*)^1.0/str
: 
: But I'm fairly sure that this is boosting on all keywords (not just ones 
matching my search term)

correct.

: Does anyone know how to achieve what I want (I'm using the DisMax query 
request handler btw.)

H... 

you could use the pf param to specify your keywords field (if you aren't 
already) so that queries where the entire search string match the keyword 
field are boosted (it's not clear to me if that's what you want or not)

Alternately, you could specify the bq at query time and copy your search 
terms into it  actually ... i've never tried it, but something like 
this might work...

   str name=bq{!lucene df=keyword v=$q}/str

...that should the Local Params derefrencing feature to use the q 
param as the value of the of the bq param (typically it's used to bake 
localparams in the config while pulling some other param from the request 
-- but i can't think of any reason why you can't use it to access q 
directly.



-Hoss



Re: indexing key/value field type

2010-03-17 Thread Chris Hostetter

: tagskey,value , where key is String and value is Int.
: key is a given tag and value is a count of how many users used this tag for
: a given document.
: 
: How can I index and store a key/value type of field? such that one can
: search on the values as well as keys of this field. 

It depends on what types of searches you want to do.

Some people only care about searching on the tag string and just want 
the numeric value to boost the score -- in which case Payloads work 
really well (and there's already a Tokenizer that makes it easy to index 
the pairs, but i think you still need a custom QParser to query them)

If you actaully want to be able to apply arbitrary numeric constraints 
(ie: find all docs where more then 13 and less then 34 people applied teh 
tag 'food' then things get a lot more complicated ... you can do it with 
parallel fields (ie: the tags in one multiValued string field, and the 
numbers in another multiValued int field) but then you really have to 
write a lot of custom query code to pay attention to the position info 
when evaluating matches.

: I have looked at FAQs, where one mailing-list suggests using the dynamic
: field type such as: 
: 
: dynamicField name=tags_* type=string indexed=true stored=true
: omitNorms=true /
: 
: but how would we search on the dynamic field names?

tags_food:[13 TO 34]

...if you want to know if a document has a tag at all, you could use 
something like tags_food:[* TO *] or lump all the tag strings into a 
tags field as well (tags:food)


-Hoss



Re: XPath Processing Applied to Clob

2010-03-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
keep in mind that the xpath is case-sensitive. paste a sample xml

what is dataField=d.text  it does not seem to refer to anything.
where is the enclosing entity?
did you mean dataField=doc.text.

xpath=//BODY is a supported syntax as long as you are using Solr1.4 or higher




On Thu, Mar 18, 2010 at 3:15 AM, Neil Chaudhuri
nchaudh...@potomacfusion.com wrote:
 Incidentally, I tried adding this:

 datasource name=f type=FieldReaderDataSource /
 document
        entity dataSource=f processor=XPathEntityProcessor 
 dataField=d.text forEach=/MESSAGE
                  field column=body xpath=//BODY/
        /entity
 /document

 But this didn't seem to change anything.

 Any insight is appreciated.

 Thanks.



 From: Neil Chaudhuri
 Sent: Wednesday, March 17, 2010 3:24 PM
 To: solr-user@lucene.apache.org
 Subject: XPath Processing Applied to Clob

 I am using the DataImportHandler to index 3 fields in a table: an id, a date, 
 and the text of a document. This is an Oracle database, and the document is 
 an XML document stored as Oracle's xmltype data type. Since this is nothing 
 more than a fancy CLOB, I am using the ClobTransformer to extract the actual 
 XML. However, I don't want to index/store all the XML but instead just the 
 XML within a set of tags. The XPath itself is trivial, but it seems like the 
 XPathEntityProcessor only works for XML file content rather than the output 
 of a Transformer.

 Here is what I currently have that fails:


 document

        entity name=doc query=SELECT d.EFFECTIVE_DT, d.ARCHIVE_ID, 
 d.XML.getClobVal() AS TEXT FROM DOC d transformer=ClobTransformer

            field column=EFFECTIVE_DT name=effectiveDate /

            field column=ARCHIVE_ID name=id /

            field column=TEXT name=text clob=true
            entity name=text processor=XPathEntityProcessor 
 forEach=/MESSAGE url=${doc.text}
                field column=body xpath=//BODY/

            /entity

        /entity

 /document


 Is there an easy way to do this without writing my own custom transformer?

 Thanks.




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com


What is the use of Solr configuration in Katta master and nodes after integrating katta into Solr

2010-03-17 Thread V SudershanReddy
Hi All,

  Can some body please explain, What is the use of Solr configuration in
Katta master and nodes after integrating katta into Solr (1395 Patch). 

 

Thanks,

vsreddy



Re: What is the use of Solr configuration in Katta master and nodes after integrating katta into Solr

2010-03-17 Thread Jason Venner
The katta master is set up to act as a solr master server.
The config there is to be setup to distribute requests to the individual shards.

The solr config in the nodes is the default config to use, to start the solr 
instance in the node.


On 3/17/10 9:05 PM, V SudershanReddy vsre...@huawei.com wrote:

Hi All,

  Can some body please explain, What is the use of Solr configuration in
Katta master and nodes after integrating katta into Solr (1395 Patch).



Thanks,

vsreddy