Re: Solr and UIMA

2010-03-02 Thread JCodina

You can test our UIMA to Solr cas consumer
is based on JulieLab Lucas and uses their CAS.
but transformed to generate XML which can be saved to a file or posted
direcly to solr
In the map file you can define which information is generated for each
token, and how its concatenaded, allowing the generation of thinks like
the|AD car|NC  which then can be processed using payloads.

now you can get it from my page
http://www.barcelonamedia.org/personal/joan.codina/en
http://www.barcelonamedia.org/personal/joan.codina/en 


-- 
View this message in context: 
http://old.nabble.com/Solr-and-UIMA-tp24567504p27753399.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr Version

2010-03-02 Thread Marc Wilson
Hi,

This is probably a really dumb question, but how can I find out which version 
of Solr is currently running on my (Windows) system? I can't seem to find 
anything in the Solr Admin interface nor the TomCat Manager.

Thanks,

Marc


AW: Solr Version

2010-03-02 Thread Markus.Rietzler
go to solr admin and then click on info, right in the first line you see the 
solr version 

 -Ursprüngliche Nachricht-
 Von: Marc Wilson [mailto:wo...@fancydressoutfitters.co.uk] 
 Gesendet: Dienstag, 2. März 2010 09:55
 An: Solr
 Betreff: Solr Version
 
 Hi,
 
 This is probably a really dumb question, but how can I find 
 out which version of Solr is currently running on my 
 (Windows) system? I can't seem to find anything in the Solr 
 Admin interface nor the TomCat Manager.
 
 Thanks,
 
 Marc
 


AW: Query from User Session to Documents with Must-Have Permissions

2010-03-02 Thread Markus.Rietzler
little question: what's the difference between a MustHavePermission and a 
protected
document?

at the moment we are developing a new search for our intranet and using solr.
we also have some protected documents and implemented this kind of filter like 
you.

i just think on using a true filter (fq=xxx) instead of adding conditions to 
the query.

filters are cached and so improving performance and much more they do not 
affect scoring
of matched documents!

markus


 -Ursprüngliche Nachricht-
 Von: _jochen [mailto:jgai...@kbs.kaba.com] 
 Gesendet: Montag, 1. März 2010 14:09
 An: solr-user@lucene.apache.org
 Betreff: Query from User Session to Documents with Must-Have 
 Permissions
 
 
 Hi @ all,
 
 i try to create a query out of a webbased content management 
 system. In The
 CMS there are some protecetd Documents. While Feeding the 
 Documents to Solr
 I have the Information: A Document is not protected ore someone with
 userGroup:group1 has access. So the query can look like: 
 
 collection:collection1 AND textbody:text AND unprotected:true OR
 userGroups:group1 OR userGroups:group2 ... OR all other 
 userGroups from
 user Session
 
 How does The Query look like, if Document contains Must-Have 
 Permissions. I
 have this information while Feeding, so i have the possibility to Feed
 mustHaveGroups. I need to get all results matching the user 
 Session and
 mustHaveUserGroup.
 
 Thanks for posting ideas!
  
 -- 
 View this message in context: 
 http://old.nabble.com/Query-from-User-Session-to-Documents-wit
 h-Must-Have-Permissions-tp27743114p27743114.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 


Re: Cyrillic problem

2010-03-02 Thread michaelnazaruk

Thank you very much! but I have problem with url :) If I send request using
get method - I get:
http://localhost/russian/result.php?search=%EF%F0%E8%E2%B3%F2
I use function (php)urldecode! If I print result, i get привіт!  But if i
send request to solr, my q param = пїЅпїЅпїЅпїЅпїЅ! 
-- 
View this message in context: 
http://old.nabble.com/Cyrillic-problem-tp27744106p27753656.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr for reporting purposes

2010-03-02 Thread Ron Chan
doesn't sound like you need to add the complexity of breaking it up into 500 
record chunks 

plenty of memory and a quad-core+ system and you should be fine with the kind 
of load you are talking about 



after all, should load test it first before you try any optimization tricks 
like this right? 


- Original Message - 
From: adeelmahmood adeelmahm...@gmail.com 
To: solr-user@lucene.apache.org 
Sent: Monday, 1 March, 2010 2:05:44 PM 
Subject: Re: solr for reporting purposes 


well thanks for ur reply .. as far as the load goes again I think most of the 
reports will be for 1000-4000 records and we dont have that many users .. 
its an internal system so we have about 400 users per day and we are opening 
this up for only half of those people (a specific role of people) .. so 
close to 200 people could potentially use it .. so practially speaking i 
think we can have up to 50 requests at a given time .. but again since its 
reports they are gonna be needed every day .. once you get a report you have 
it for a while .. so overall i dont think its that much of user load that we 
have .. what do you think 

also i was thinking about handling requests in a 500 records limit fashion 
.. so a request for 2000 records will be handled as 5 separate (refresh by a 
5 sec timeout) requests .. do you think thats a good idea to ask solr to 
return 500 rows at a time but make that request 5 times .. or its better to 
just ask for 2000 rows alltogether 



Ron Chan wrote: 
 
 we've done it successfully for similar requirements 
 
 the resource requirements depends on how many concurrent people will be 
 running those types of reports 
 
 up to 4000 records is not a problem at all, one report at a time, but if 
 you had concurrent requests running into thousands as well then you may 
 have a problem, although you will probably run into memory problems at the 
 rendering end before you have problems with Solr, i.e. not a Solr problem 
 as such, but a problem generally of unrestricted adhoc reporting 
 
 
 
 
 - Original Message - 
 From: adeelmahmood adeelmahm...@gmail.com 
 To: solr-user@lucene.apache.org 
 Sent: Saturday, 27 February, 2010 5:57:00 AM 
 Subject: Re: solr for reporting purposes 
 
 
 I just want to clarify if its not obvious .. that the reason I am 
 concerned 
 about the performance of solr is becaues for reporting requests I will 
 probably have to request all result rows at the same time .. instead of 10 
 or 20 
 
 
 adeelmahmood wrote: 
 
 we are trying to use solr for somewhat of a reporting system too (along 
 with search) .. since it provides such amazing control over queries and 
 basically over the data that user wants .. they might as well be able to 
 dump that data in an excel file too if needed .. our data isnt too much 
 close to 25K docs with 15-20 fields in each doc .. and mostly these 
 reports will be for close to 500 - 4000 records .. i am thinking about 
 setting up a simple servlet that grabs all this data that submits the 
 user 
 query to solr over http .. grabs all that results data and dumps it in an 
 excel file .. i was just hoping to get some idea of whether this is going 
 to cause any performance impact on solr search .. especially since its 
 all 
 on the same server and some users will be doing reports while others will 
 be searching .. right now search is working GREAT .. its blazing fast .. 
 i 
 dont wanna loose this but at the same time reporting is an important 
 requirement as well .. 
 
 also i would appreciate any hints towards some creative ways of doing it 
 .. something like getting 500 some records in a single request and then 
 using some timer task repeat the process .. 
 
 thanks for ur help 
 
 
 -- 
 View this message in context: 
 http://old.nabble.com/solr-for-reporting-purposes-tp27725967p27726016.html 
 Sent from the Solr - User mailing list archive at Nabble.com. 
 
 
 

-- 
View this message in context: 
http://old.nabble.com/solr-for-reporting-purposes-tp27725967p27743896.html 
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: AW: Query from User Session to Documents with Must-Have Permissions

2010-03-02 Thread _jochen

We have 2 different options in our acl:
Someone has access using group1 OR group1,...
Or someone has access using role1: group1 AND group2,...

i could solve this problem resolving the roles while logging in of the user.
So the session know which roles (group1 AND group2,...) the user has:

 queryString.append(  AND (unprotected:true );   
if (user != null) {
CollectionString groups = user.getGroups();
for (String group : groups) {
queryString.append( OR groups:);
queryString.append(\ + group + \);
}
CollectionString andRoles = user.getAndRoles();
if (!andRoles.isEmpty()) {
for (String role : andRoles) {
queryString.append( OR roles:);
queryString.append(\ + role + \);
}
}
}
queryString.append()); 
-- 
View this message in context: 
http://old.nabble.com/Query-from-User-Session-to-Documents-with-Must-Have-Permissions-tp27743114p27754276.html
Sent from the Solr - User mailing list archive at Nabble.com.



Simultaneous Writes to Index

2010-03-02 Thread Kranti™ K K Parisa
Hi,

I am planning to development some application on which users could update
their account data after login, this is on top of the search facility users
have. the basic work flow is
1) user logs in
2) searches for some data
3) gets the results from solr index
4) save some of the search results into their repository
5) later on they may view their repository

for this, at step4 I am planning to write that into a separate solr index as
user may search within his repository and get the results, facets..etc.
So thinking to write such data/info to a separate solr index.

in this plan, how simultaneous writes to the user history index works. what
are the best practices in such scenarios of updating index at a time by
different users.

the other alternative is to store such user info into DB, and schedule
indexing process at regular intervals. But that wont make the system live
with user actions, as there would be some delay, users cant see the data
they saved in their repository until its indexed.

that is the reason I am planning to use SOLR xml post request to update the
index silently but how about multiple users writing on same index?

Best Regards,
Kranti K K Parisa


Issue on stopword list

2010-03-02 Thread Suram

Hi,

 How can i search using stopword my query like this

This - 0 results becuase it is a stopword
is - 0 results becuase it is a stopword
that - 0 results becuase it is a stopword

if i search like  This is that - it must give the result

for that i need to change anything in my schema file to get result This is
that
-- 
View this message in context: 
http://old.nabble.com/Issue-on-stopword-list-tp27754434p27754434.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Implementing hierarchical facet

2010-03-02 Thread Peter S

Hi Andy,

 

It sounds like you may want to have a look at tree faceting:

  https://issues.apache.org/jira/browse/SOLR-792

 


 
 Date: Mon, 1 Mar 2010 18:23:51 -0800
 From: angelf...@yahoo.com
 Subject: Implementing hierarchical facet
 To: solr-user@lucene.apache.org
 
 I read that a simple way to implement hierarchical facet is to concatenate 
 strings with a separator. Something like level1level2level3 with  as 
 the separator.
 
 A problem with this approach is that the number of facet values will greatly 
 increase.
 
 For example I have a facet Location with the hierarchy countrystatecity. 
 Using the above approach every single city will lead to a separate facet 
 value. With tens of thousands of cities in the world the response from Solr 
 will be huge. And then on the client side I'd have to loop through all the 
 facet values and combine those with the same country into a single value.
 
 Ideally Solr would be aware of the hierarchy structure and send back 
 responses accordingly. So at level 1 Solr will send back facet values based 
 on country (100 or so values). Level 2 the facet values will be based on the 
 states within the selected country (a few dozen values). Next level will be 
 cities within that state. and so on.
 
 Is it possible to implement hierarchical facet this way using Solr?
 
 
 
 
  
_
Tell us your greatest, weirdest and funniest Hotmail stories
http://clk.atdmt.com/UKM/go/195013117/direct/01/

Re: Simultaneous Writes to Index

2010-03-02 Thread Ron Chan
as long as the document id is unique, concurrent writes is fine 

if for same reason the same doc id is used then it is overwritten, so last in 
will be the one that is in the index 

Ron 

- Original Message - 
From: Kranti™ K K Parisa kranti.par...@gmail.com 
To: solr-user@lucene.apache.org 
Sent: Tuesday, 2 March, 2010 10:40:37 AM 
Subject: Simultaneous Writes to Index 

Hi, 

I am planning to development some application on which users could update 
their account data after login, this is on top of the search facility users 
have. the basic work flow is 
1) user logs in 
2) searches for some data 
3) gets the results from solr index 
4) save some of the search results into their repository 
5) later on they may view their repository 

for this, at step4 I am planning to write that into a separate solr index as 
user may search within his repository and get the results, facets..etc. 
So thinking to write such data/info to a separate solr index. 

in this plan, how simultaneous writes to the user history index works. what 
are the best practices in such scenarios of updating index at a time by 
different users. 

the other alternative is to store such user info into DB, and schedule 
indexing process at regular intervals. But that wont make the system live 
with user actions, as there would be some delay, users cant see the data 
they saved in their repository until its indexed. 

that is the reason I am planning to use SOLR xml post request to update the 
index silently but how about multiple users writing on same index? 

Best Regards, 
Kranti K K Parisa 


Re: Simultaneous Writes to Index

2010-03-02 Thread Kranti™ K K Parisa
Hi Ron,

Thanks for the reply. So does this mean that writer lock is nothing to do
with concurrent writes?

Best Regards,
Kranti K K Parisa



On Tue, Mar 2, 2010 at 4:19 PM, Ron Chan rc...@i-tao.com wrote:

 as long as the document id is unique, concurrent writes is fine

 if for same reason the same doc id is used then it is overwritten, so last
 in will be the one that is in the index

 Ron

 - Original Message -
 From: Kranti™ K K Parisa kranti.par...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, 2 March, 2010 10:40:37 AM
 Subject: Simultaneous Writes to Index

 Hi,

 I am planning to development some application on which users could update
 their account data after login, this is on top of the search facility users
 have. the basic work flow is
 1) user logs in
 2) searches for some data
 3) gets the results from solr index
 4) save some of the search results into their repository
 5) later on they may view their repository

 for this, at step4 I am planning to write that into a separate solr index
 as
 user may search within his repository and get the results, facets..etc.
 So thinking to write such data/info to a separate solr index.

 in this plan, how simultaneous writes to the user history index works. what
 are the best practices in such scenarios of updating index at a time by
 different users.

 the other alternative is to store such user info into DB, and schedule
 indexing process at regular intervals. But that wont make the system live
 with user actions, as there would be some delay, users cant see the data
 they saved in their repository until its indexed.

 that is the reason I am planning to use SOLR xml post request to update the
 index silently but how about multiple users writing on same index?

 Best Regards,
 Kranti K K Parisa



Optimize Index

2010-03-02 Thread Lee Smith
Hi All

Is there a post request method to clean the index?  

I have removed my index folder and restarted solr and its still showing 
documents in the stats.

I have run this post request: 
http://localhost:8983/solr/core1/update?optimize=true

I get no errors but the stats are still show my 4 documents

Hope you can advise.

Thanks

fieldType text

2010-03-02 Thread Frederico Azeiteiro
Hi,

I'm using the default text  field type that comes with the example.

 

When searching for simple words as 'HP' or 'TCS' solr is returning
results that contains 'HP1' or 'TCS'

Is there a solution for to avoid this?

 

Thanks,

Frederico



search and count ocurrences

2010-03-02 Thread Frederico Azeiteiro
Hi,

I need to implement a search where i should count the number of times
the string appears on the search field, 

ie: only return articles that mention the word 'HP' at least 2x.

 

I'm currently doing this after the SOLR search with my own methods. 

Is there a way that SOLR does this type of operation for me?

 

Thanks,

Frederico

 



Re: Solr Cell and Deduplication - Get ID of doc

2010-03-02 Thread Bill Engle
Thanks for the responses.  This is exactly what I had to resort to.  I will
definitely put in a feature request to get the generated ID back from the
extract request.

I am doing this with PHP cURL for extraction and pecl php solr for
querying.  I am then saving the unique id and dupe hash in a MySQL table
which I check against after the doc is indexed in Solr.  If it is a dupe I
delete the Solr record and discard the file.  My problem now is the dupe
hash sometimes comes back NULL from Solr although when I check it through
Solr Admin it is there.  I am working through this now to isolate.

I had to set Solr to ALLOW duplicates because I have to somehow know that
the file is a dupe and then remove the duplicate files on my filesystem.
Based on the extract response I have no way of knowing this if duplicates
are disallowed.

-Bill


On Tue, Mar 2, 2010 at 2:11 AM, Chris Hostetter hossman_luc...@fucit.orgwrote:



 : To quote from the wiki,
...
 That's all true ... but Bill explicitly said he wanted to use
 SignatureUpdateProcessorFactory to generate a uniqueKey from the content
 field post-extraction so he could dedup documents with the same content
 ... his question was how to get that key after adding a doc.

 Using a unique literal.field value will work -- but only as the value of
 a secondary field that he can then query on to get the uniqueKeyField
 value.


 :  : You could create your own unique ID and pass it in with the
 :  : literal.field=value feature.
 : 
 :  By which Lance means you could specify an unique value in a differnet
 :  field from yoru uniqueKey field, and then query on that field:value
 pair
 :  to get the doc after it's been added -- but that query will only work
 :  until some other version of the doc (with some other value) overwrites
 it.
 :  so you'd esentially have to query for the field:value to lookup the
 :  uniqueKey.
 : 
 :  it seems like it should definitely be feasible for the
 :  Update RequestHandlers to return the uniqueKeyField values for all the
 :  added docs (regardless of wether the key was included in the request,
 or
 :  added by an UpdateProcessor -- but i'm not sure how that would fit in
 with
 :  the SolrJ API.
 : 
 :  would you mind opening a feature request in Jira?
 : 
 : 
 : 
 :  -Hoss
 : 
 : 
 :
 :
 :
 : --
 : Lance Norskog
 : goks...@gmail.com
 :



 -Hoss




Re: Issue on stopword list

2010-03-02 Thread Erick Erickson
This is a classic problem with Stopword removal. Have you tried
just removing stopwords from the indexing definition and the
query definition and reindexing?

You can't search on them no matter what you do if they've
been removed, they just aren't there

HTH
Erick

On Tue, Mar 2, 2010 at 5:47 AM, Suram reactive...@yahoo.com wrote:


 Hi,

  How can i search using stopword my query like this

 This - 0 results becuase it is a stopword
 is - 0 results becuase it is a stopword
 that - 0 results becuase it is a stopword

 if i search like  This is that - it must give the result

 for that i need to change anything in my schema file to get result This is
 that
 --
 View this message in context:
 http://old.nabble.com/Issue-on-stopword-list-tp27754434p27754434.html
 Sent from the Solr - User mailing list archive at Nabble.com.




get Server Status, TotalDocCount .... PHP !

2010-03-02 Thread stocki

hello

I use Solr in my cakePHP Framework.

How can i get status information of my solr cores ?? 

I dont want analyze everytime the responseXML. 

do anybody know a nice way to get status messages from solr ? 

thx ;) Jonas
-- 
View this message in context: 
http://old.nabble.com/get-Server-Status%2C-TotalDocCount--PHP-%21-tp27756118p27756118.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: fieldType text

2010-03-02 Thread Siddhant Goel
I think that's because of the internal tokenization that Solr does. If a
document contains HP1, and you're using the default text field type, Solr
would tokenize that to HP and 1, so that document figures in the list of
documents containing HP, and hence that documents appears in the search
results for HP. Creating a separate text field which does not tokenize like
that might be what you want.

The various filter/tokenizer types are listed here -
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

On Tue, Mar 2, 2010 at 6:07 PM, Frederico Azeiteiro 
frederico.azeite...@cision.com wrote:

 Hi,

 I'm using the default text  field type that comes with the example.



 When searching for simple words as 'HP' or 'TCS' solr is returning
 results that contains 'HP1' or 'TCS'

 Is there a solution for to avoid this?



 Thanks,

 Frederico




-- 
- Siddhant


Re: Optimize Index

2010-03-02 Thread Erick Erickson
My very first guess would be that you're removing an index that isn't
the one your SOLR configuration points at.

Second guess would be that your browser is caching the results of
your first query and not going to SOLR at all. Stranger things have
happened G.

Third guess is you've mis-identified the core in your URL.

Can you check those three things and let us know if you still
have the problem?

Erick

On Tue, Mar 2, 2010 at 7:36 AM, Lee Smith l...@weblee.co.uk wrote:

 Hi All

 Is there a post request method to clean the index?

 I have removed my index folder and restarted solr and its still showing
 documents in the stats.

 I have run this post request:
 http://localhost:8983/solr/core1/update?optimize=true

 I get no errors but the stats are still show my 4 documents

 Hope you can advise.

 Thanks


Re: fieldType text

2010-03-02 Thread Erick Erickson
Expanding on Siddant's comment, look carefully at
WordDelimiterFilterFactory, as I remember it's in the default
schema definition.

This page helps:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFiltersErick

On Tue, Mar 2, 2010 at 8:51 AM, Siddhant Goel siddhantg...@gmail.comwrote:

 I think that's because of the internal tokenization that Solr does. If a
 document contains HP1, and you're using the default text field type, Solr
 would tokenize that to HP and 1, so that document figures in the list of
 documents containing HP, and hence that documents appears in the search
 results for HP. Creating a separate text field which does not tokenize like
 that might be what you want.

 The various filter/tokenizer types are listed here -
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

 On Tue, Mar 2, 2010 at 6:07 PM, Frederico Azeiteiro 
 frederico.azeite...@cision.com wrote:

  Hi,
 
  I'm using the default text  field type that comes with the example.
 
 
 
  When searching for simple words as 'HP' or 'TCS' solr is returning
  results that contains 'HP1' or 'TCS'
 
  Is there a solution for to avoid this?
 
 
 
  Thanks,
 
  Frederico
 
 


 --
 - Siddhant



exact search

2010-03-02 Thread Suram

Hi,

 How do search the exact match like this The Books of Three ,if give
this it would found Exact result +
Some result related to Books. In my schema.xml file i has changed field type
String instead of Text but not getting anychange


-- 
View this message in context: 
http://old.nabble.com/exact-search-tp27756351p27756351.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Implementing hierarchical facet

2010-03-02 Thread Koji Sekiguchi

 Ideally Solr would be aware of the hierarchy structure and
 send back responses accordingly.

If I understand it correctly, SOLR-64 supports them I think?

 So at level 1 Solr will send back facet values based on country (100 
or so values).


facet=onfacet.depth=1 ?

 Level 2 the facet values will be based on the states within the selected
 country (a few dozen values).

facet=onfacet.prefix=selected-countryfacet.depth=2 ?

 Next level will be cities within that state. and so on.

facet=onfacet.prefix=selected-country/selected-statefacet.depth=3 ?

Koji

--
http://www.rondhuit.com/en/



Re: get Server Status, TotalDocCount .... PHP !

2010-03-02 Thread Guillaume Rossolini
Hi

Have you tried the php_solr extension from PECL?  It has a handy
SolrPingResponse class.
Or you could just call the CORENAME/admin/ping?wt=phps URL and unserialize
it.

Regards,

--
I N S T A N T  |  L U X E - 44 rue de Montmorency | 75003 Paris | France
Tél. : 01 80 50 52 51 | Mob. : 06 09 96 10 29 | web : www.instantluxe.com


On Tue, Mar 2, 2010 at 2:50 PM, stocki st...@shopgate.com wrote:


 hello

 I use Solr in my cakePHP Framework.

 How can i get status information of my solr cores ??

 I dont want analyze everytime the responseXML.

 do anybody know a nice way to get status messages from solr ?

 thx ;) Jonas
 --
 View this message in context:
 http://old.nabble.com/get-Server-Status%2C-TotalDocCount--PHP-%21-tp27756118p27756118.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Issue on stopword list

2010-03-02 Thread Walter Underwood
Don't remove stopwords if you want to search on them. --wunder

On Mar 2, 2010, at 5:43 AM, Erick Erickson wrote:

 This is a classic problem with Stopword removal. Have you tried
 just removing stopwords from the indexing definition and the
 query definition and reindexing?
 
 You can't search on them no matter what you do if they've
 been removed, they just aren't there
 
 HTH
 Erick
 
 On Tue, Mar 2, 2010 at 5:47 AM, Suram reactive...@yahoo.com wrote:
 
 
 Hi,
 
 How can i search using stopword my query like this
 
 This - 0 results becuase it is a stopword
 is - 0 results becuase it is a stopword
 that - 0 results becuase it is a stopword
 
 if i search like  This is that - it must give the result
 
 for that i need to change anything in my schema file to get result This is
 that
 --
 View this message in context:
 http://old.nabble.com/Issue-on-stopword-list-tp27754434p27754434.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 



Re: get Server Status, TotalDocCount .... PHP !

2010-03-02 Thread stocki

Hey-

No i use the SolrPHPClient http://code.google.com/p/solr-php-client/
i not really want tu use two different php-libs. ^^

what do you mean with unserialize ? XD





Guillaume Rossolini-2 wrote:
 
 Hi
 
 Have you tried the php_solr extension from PECL?  It has a handy
 SolrPingResponse class.
 Or you could just call the CORENAME/admin/ping?wt=phps URL and unserialize
 it.
 
 Regards,
 
 --
 I N S T A N T  |  L U X E - 44 rue de Montmorency | 75003 Paris | France
 Tél. : 01 80 50 52 51 | Mob. : 06 09 96 10 29 | web : www.instantluxe.com
 
 
 On Tue, Mar 2, 2010 at 2:50 PM, stocki st...@shopgate.com wrote:
 

 hello

 I use Solr in my cakePHP Framework.

 How can i get status information of my solr cores ??

 I dont want analyze everytime the responseXML.

 do anybody know a nice way to get status messages from solr ?

 thx ;) Jonas
 --
 View this message in context:
 http://old.nabble.com/get-Server-Status%2C-TotalDocCount--PHP-%21-tp27756118p27756118.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://old.nabble.com/get-Server-Status%2C-TotalDocCount--PHP-%21-tp27756118p27756852.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Optimize Index

2010-03-02 Thread Lee Smith
Ha

Now I feel stupid !!

I had a misspell in the data path and you were correct.

Can I ask Erik was the command correct though ?

Thank you

Lee

On 2 Mar 2010, at 13:54, Erick Erickson wrote:

 My very first guess would be that you're removing an index that isn't
 the one your SOLR configuration points at.
 
 Second guess would be that your browser is caching the results of
 your first query and not going to SOLR at all. Stranger things have
 happened G.
 
 Third guess is you've mis-identified the core in your URL.
 
 Can you check those three things and let us know if you still
 have the problem?
 
 Erick
 
 On Tue, Mar 2, 2010 at 7:36 AM, Lee Smith l...@weblee.co.uk wrote:
 
 Hi All
 
 Is there a post request method to clean the index?
 
 I have removed my index folder and restarted solr and its still showing
 documents in the stats.
 
 I have run this post request:
 http://localhost:8983/solr/core1/update?optimize=true
 
 I get no errors but the stats are still show my 4 documents
 
 Hope you can advise.
 
 Thanks



Indexing HTML document

2010-03-02 Thread György Frivolt
Hi, How to index properly HTML documents? All the documents are HTML, some
containing charaters encodid like #x17E;#xED; ... Is there a character
filter for filtering these codes? Is there a way to strip the HTML tags out?
Does solr weight the terms in the document based on where they appear?..
words in headers (H1, H2,..) would be supposed to describe the document more
then words in paragraphs.

Thanks for help,

   Georg


Re: Indexing HTML document

2010-03-02 Thread Siddhant Goel
There is an HTML filter documented here, which might be of some help -
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory

Control characters can be eliminated using code like this -
http://bitbucket.org/cogtree/python-solr/src/tip/pythonsolr/pysolr.py#cl-449

On Tue, Mar 2, 2010 at 9:37 PM, György Frivolt gyorgy.friv...@gmail.comwrote:

 Hi, How to index properly HTML documents? All the documents are HTML, some
 containing charaters encodid like #x17E;#xED; ... Is there a character
 filter for filtering these codes? Is there a way to strip the HTML tags
 out?
 Does solr weight the terms in the document based on where they appear?..
 words in headers (H1, H2,..) would be supposed to describe the document
 more
 then words in paragraphs.

 Thanks for help,

   Georg




-- 
- Siddhant


Re: Issue on stopword list

2010-03-02 Thread Joe Calderon
or you can try the commongrams filter that combines tokens next to a stopword

On Tue, Mar 2, 2010 at 6:56 AM, Walter Underwood wun...@wunderwood.org wrote:
 Don't remove stopwords if you want to search on them. --wunder

 On Mar 2, 2010, at 5:43 AM, Erick Erickson wrote:

 This is a classic problem with Stopword removal. Have you tried
 just removing stopwords from the indexing definition and the
 query definition and reindexing?

 You can't search on them no matter what you do if they've
 been removed, they just aren't there

 HTH
 Erick

 On Tue, Mar 2, 2010 at 5:47 AM, Suram reactive...@yahoo.com wrote:


 Hi,

 How can i search using stopword my query like this

 This             - 0 results becuase it is a stopword
 is                 - 0 results becuase it is a stopword
 that             - 0 results becuase it is a stopword

 if i search like  This is that - it must give the result

 for that i need to change anything in my schema file to get result This is
 that
 --
 View this message in context:
 http://old.nabble.com/Issue-on-stopword-list-tp27754434p27754434.html
 Sent from the Solr - User mailing list archive at Nabble.com.






Re: replication issue

2010-03-02 Thread Matthieu Labour
Hi Paul
Thank you for your amswer
I did put all the directory structure on /raid ... /raid/solr_env/solr ... , 
/raid/solr_env/jetty ...
And it still didn't work even after I applied patch  SOLR-1736
I am investigating if this is because tempDir and data dir are not on the same 
partition
matt

--- On Mon, 3/1/10, Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com wrote:

From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com
Subject: Re: replication issue
To: solr-user@lucene.apache.org
Date: Monday, March 1, 2010, 10:30 PM

The data/index.20100226063400 dir is a temporary dir and isc reated in
the same dir where the index dir is located.

I'm wondering if the symlink is causing the problem. Why don't you set
the data dir as /raid/data instead of /solr/data

On Sat, Feb 27, 2010 at 12:13 AM, Matthieu Labour
matthieu_lab...@yahoo.com wrote:
 Hi

 I am still having issues with the replication and wonder if things are 
 working properly

 So I have 1 master and 1 slave

 On the slave, I deleted the data/index directory and 
 data/replication.properties file and restarted solr.

 When slave is pulling data from master, I can see that the size of data 
 directory is growing

 r...@slr8:/raid/data# du -sh
 3.7M    .
 r...@slr8:/raid/data# du -sh
 4.7M    .

 and I can see that data/replication.properties  file got created and also a 
 directory data/index.20100226063400

 soon after index.20100226063400 disapears and the size of data/index is back 
 to 12K

 r...@slr8:/raid/data/index# du -sh
 12K    .

 And when I look for the number of documents via the admin interface, I still 
 see 0 documents so I feel something is wrong

 One more thing, I have a symlink for /solr/data --- /raid/data

 Thank you for your help !

 matt










-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com



  

Re: Warning : no lockType configured for...

2010-03-02 Thread Tom Hill.

Hi Mani,

Mani EZZAT wrote:
 I'm dynamically creating cores with a new index, using the same schema 
 and solrconfig.xml

Does the problem occur if you use the same configuration in a single, static
core?

Tom

-- 
View this message in context: 
http://old.nabble.com/Re%3A-Warning-%3A-no-lockType-configured-for...-tp27740724p27758951.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: get Server Status, TotalDocCount .... PHP !

2010-03-02 Thread Israel Ekpo
The last time I tried using SolrPHPClient for this stuff, it did not really
handle the response very well because of the JSON response generated on the
server side.

I am not sure if anything has changed since then.

The JSON code generated could not be parsed properly.

If you do not want to analyze the xml response each time and if you are not
using the pecl extension you will need to send a request manually to the
solr server using CURL and you have to specify the response format as phps


On Tue, Mar 2, 2010 at 9:59 AM, stocki st...@shopgate.com wrote:


 Hey-

 No i use the SolrPHPClient http://code.google.com/p/solr-php-client/
 i not really want tu use two different php-libs. ^^

 what do you mean with unserialize ? XD





 Guillaume Rossolini-2 wrote:
 
  Hi
 
  Have you tried the php_solr extension from PECL?  It has a handy
  SolrPingResponse class.
  Or you could just call the CORENAME/admin/ping?wt=phps URL and
 unserialize
  it.
 
  Regards,
 
  --
  I N S T A N T  |  L U X E - 44 rue de Montmorency | 75003 Paris | France
  Tél. : 01 80 50 52 51 | Mob. : 06 09 96 10 29 | web :
 www.instantluxe.com
 
 
  On Tue, Mar 2, 2010 at 2:50 PM, stocki st...@shopgate.com wrote:
 
 
  hello
 
  I use Solr in my cakePHP Framework.
 
  How can i get status information of my solr cores ??
 
  I dont want analyze everytime the responseXML.
 
  do anybody know a nice way to get status messages from solr ?
 
  thx ;) Jonas
  --
  View this message in context:
 
 http://old.nabble.com/get-Server-Status%2C-TotalDocCount--PHP-%21-tp27756118p27756118.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://old.nabble.com/get-Server-Status%2C-TotalDocCount--PHP-%21-tp27756118p27756852.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: replication issue

2010-03-02 Thread Matthieu Labour
I think this issue is tot related to patch  SOLR-1736

Here is the error I get ... Thank you for any help


[2010-03-02 19:07:26] [pool-3-thread-1] ERROR(ReplicationHandler.java:266) - 
SnapPull failed
org.apache.solr.common.SolrException: Unable to download _7bre.fdt completely. 
Downloaded 0!=15591
    at 
org.apache.solr.handler.SnapPuller$FileFetcher.cleanup(SnapPuller.java:1036)
    at 
org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:916)
    at 
org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:541)
    at 
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:294)
    at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:264)
    at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
    at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
    at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:280)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:135)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:65)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:146)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:170)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
    at java.lang.Thread.run(Thread.java:595)


--- On Tue, 3/2/10, Matthieu Labour matthieu_lab...@yahoo.com wrote:

From: Matthieu Labour matthieu_lab...@yahoo.com
Subject: Re: replication issue
To: solr-user@lucene.apache.org
Date: Tuesday, March 2, 2010, 11:23 AM

Hi Paul
Thank you for your amswer
I did put all the directory structure on /raid ... /raid/solr_env/solr ... , 
/raid/solr_env/jetty ...
And it still didn't work even after I applied patch  SOLR-1736
I am investigating if this is because tempDir and data dir are not on the same 
partition
matt

--- On Mon, 3/1/10, Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com wrote:

From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com
Subject: Re: replication issue
To: solr-user@lucene.apache.org
Date: Monday, March 1, 2010, 10:30 PM

The data/index.20100226063400 dir is a temporary dir and isc reated in
the same dir where the index dir is located.

I'm wondering if the symlink is causing the problem. Why don't you set
the data dir as /raid/data instead of /solr/data

On Sat, Feb 27, 2010 at 12:13 AM, Matthieu Labour
matthieu_lab...@yahoo.com wrote:
 Hi

 I am still having issues with the replication and wonder if things are 
 working properly

 So I have 1 master and 1 slave

 On the slave, I deleted the data/index directory and 
 data/replication.properties file and restarted solr.

 When slave is pulling data from master, I can see that the size of data 
 directory is growing

 r...@slr8:/raid/data# du -sh
 3.7M    .
 r...@slr8:/raid/data# du -sh
 4.7M    .

 and I can see that data/replication.properties  file got created and also a 
 directory data/index.20100226063400

 soon after index.20100226063400 disapears and the size of data/index is back 
 to 12K

 r...@slr8:/raid/data/index# du -sh
 12K    .

 And when I look for the number of documents via the admin interface, I still 
 see 0 documents so I feel something is wrong

 One more thing, I have a symlink for /solr/data --- /raid/data

 Thank you for your help !

 matt










-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com



      


  

Logging in Embedded SolrServer - What a nightmare.

2010-03-02 Thread Lucas F. A. Teixeira
Hello all,

I'm having a hard time trying to change Solr queries logging level.
I've tried a lot of things I've found in the internet, this mailing list and
solr docs.

What I've found so far:

- Solr Embedded Server uses sfl4j lib for intermediating logging. Here I'm
using Log4j as my logging framework.
- Changing the .../jre/lib/logging.properties worked, but only when querying
using solr over http, and not on solr embedded.
- A log4j.xml that I've added it is not being respected. (It is logging with
a totally different layout and appenders)
- I've searched for other log4j config files in the classpath, and found
nothing...
- Even tried to call Logger.getLogger(org.apache.solr) and then set its
level manually inside the app, nothing changed...

So, Embedded Solr Server keeps logging queries and other stuff in my stdout.

Most docs and guides I've found in the internet is talking about solr http,
this is ok for me, with http I got everything working, but not with solr
embedded.
Have anyone achieved this with embedded?

Thanks a lot ppl,

[]s,


Lucas Frare Teixeira .·.
- lucas...@gmail.com
- lucastex.com.br
- blog.lucastex.com
- twitter.com/lucastex


Ignore accents

2010-03-02 Thread Tommy Molto
Hi, guys,

I have a solr index, and i need it to ignore accents and special characters.
Eg: São Paulo = Sao Paulo, cadarço=cadarco. I know we could use a
synonim, but i guess solr already has a filter or plugin for theses cases.
Anyone knows how to do it?

Att,

Paulo Marinho


Re: Ignore accents

2010-03-02 Thread Ahmet Arslan
 I have a solr index, and i need it to ignore accents and
 special characters.
 Eg: São Paulo = Sao Paulo, cadarço=cadarco. I
 know we could use a
 synonim, but i guess solr already has a filter or plugin
 for theses cases.
 Anyone knows how to do it?

ASCIIFoldingFilterFactory[1] or charFilter 
class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/

[1]http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory





CoreAdminHandler question

2010-03-02 Thread Leonardo Souza
The action CREATE creates a new core based on preexisting
instanceDir/solrconfig.xml/schema.xml, and registers it.
That's what the documentation is stating.

Is there a way to instruct solr to create the instanceDir if does not exist?

I'm trying to create new core based on a existing schema/config to rebuild
the index, after that swap it with the existing old core. The problem is
that the instanceDir of the new core should exist before the core creation,
and would be nice to programmatically create the instanceDir using the
CoreAdminHandler.

Maybe i'm missing something..

Thanks in advance.

[ ]'s
Leonardo da S. Souza
°v°   Linux user #375225
/(_)\   http://counter.li.org/
^ ^


Unindexed Fields Are Searchable?

2010-03-02 Thread Thomas Nguyen
I've noticed that fields that I define as index=false in the
schema.xml are still searchable.  Here's the definition of the field:

 

field name=object_id type=string index=false stored=true
multiValued=false/

or

field name=object_id type=string index=false stored=false
multiValued=false/

 

I can then add a new document with the field object_id=26 and have the
document returned when searching for +object_Id=26.  On the other hand
if I add the document using the Lucene API the Solr search does not
return the document.  Is there a bug in Solr 1.4 that allows for
searchable unindexed fields for documents added by Solr? 



Re: Unindexed Fields Are Searchable?

2010-03-02 Thread Ahmet Arslan
 I've noticed that fields that I
 define as index=false in the
 schema.xml are still searchable.  


indexed=false defined fields are neither searchable nor sortable.

Did you re-start servlet container and re-index your documents after changing 
this attribute in schema.xml?





Returning function result in results

2010-03-02 Thread Dragisa Krsmanovic
Is there way to return function value in search results besides using
score ?=20
--
This email is confidential to the intended recipient. If you have received it 
in error, please notify the sender and delete it from your
system. Any unauthorized use, disclosure or copying is not permitted. The views 
or opinions presented are solely those of the sender and do
not necessarily represent those of Public Library of Science unless otherwise 
specifically stated. Please note that neither Public Library
of Science nor any of its agents accept any responsibility for any viruses that 
may be contained in this e-mail or its attachments and it
is your responsibility to scan the e-mail and attachments (if any).


Different weights to different fields

2010-03-02 Thread Alex Thurlow

Hi everyone,
I'm new to Solr and just getting it set up and testing it out.  I'd 
like to know if there's a way to give a different weight to different 
data fields.


For an example, I'm going to be storing song information.  I have the 
fields: Artist, Title, Description, and Tags.  I'd like occurrences of 
the search term in Artist and Title to count more than the ones found in 
Description and Tags.  For instance, a search for Bruce Springsteen 
against all the fields should return the ones where artist=Bruce 
Springsteen higher than ones that just have that within the 
description.  Is this possible either in the indexing or with a query 
option?


Thanks,
Alex

--
Alex Thurlow
Blastro Networks

http://www.blastro.com
http://www.roxwel.com
http://www.yallwire.com



Setting the return query fields

2010-03-02 Thread Dhanushka Samarakoon
Hi,
I would like to solr to return to record from /exampledocs/hd.xml when I
search for the value 6H500F0 (which is the ID field for the 2'nd record in
that file).
I know there is a setting that I should change to get this done, but I can't
locate it.
Field name ID is alread included in schema.xml file.
Thanks,
Dhanushka


RE: Unindexed Fields Are Searchable?

2010-03-02 Thread Thomas Nguyen
My schema has always had index=false for that field.  I only stopped and 
restarted the servlet container when I added a document to the index using the 
Lucene API instead of Solr.

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Tuesday, March 02, 2010 1:01 PM
To: solr-user@lucene.apache.org
Subject: Re: Unindexed Fields Are Searchable?

 I've noticed that fields that I
 define as index=false in the
 schema.xml are still searchable.  


indexed=false defined fields are neither searchable nor sortable.

Did you re-start servlet container and re-index your documents after changing 
this attribute in schema.xml?


  



Re: Different weights to different fields

2010-03-02 Thread Ahmet Arslan

     I'm new to Solr and just getting it set up
 and testing it out.  I'd like to know if there's a way
 to give a different weight to different data fields.
 
 For an example, I'm going to be storing song
 information.  I have the fields: Artist, Title,
 Description, and Tags.  I'd like occurrences of the
 search term in Artist and Title to count more than the ones
 found in Description and Tags.  For instance, a search
 for Bruce Springsteen against all the fields should return
 the ones where artist=Bruce Springsteen higher than ones
 that just have that within the description.  Is this
 possible either in the indexing or with a query option?

You can do it in either query time or index time. In query time you can assign 
different boost values with carat operator. 
e.g. Artist:(Bruce Springsteen)^10 Title:(Bruce Springsteen)^5

Also dismax[1] request handler might useful to you.

[1]http://wiki.apache.org/solr/DisMaxRequestHandler

At index time you can give different boost values to different fields. [2]
e.g. field name=Artist boost=10.0

[2]http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22field.22





Re: Setting the return query fields

2010-03-02 Thread Ahmet Arslan

 Hi,
 I would like to solr to return to record from
 /exampledocs/hd.xml when I
 search for the value 6H500F0 (which is the ID field for
 the 2'nd record in
 that file).
 I know there is a setting that I should change to get this
 done, but I can't
 locate it.
 Field name ID is alread included in schema.xml file.

If you want to retrieve the document with ID=6H500F0, use id:6H500F0 as query. 
If you don't explicitly specify field name in your query defaultSearchField 
(which is defined in schema.xml) is used/queried.

http://localhost:8983/solr/select/?q=id%3A6H500F0version=2.2start=0rows=10indent=on


  


RE: Unindexed Fields Are Searchable?

2010-03-02 Thread Ahmet Arslan
 My schema has always had
 index=false for that field.  I only stopped and
 restarted the servlet container when I added a document to
 the index using the Lucene API instead of Solr.

Is there a special reason/use-case for to add documents using Lucene API?





Re: Setting the return query fields

2010-03-02 Thread Dhanushka Samarakoon
Thanks for the reply.
Is there a place in the config file where I can set it to explicitly search
the fields I want?

On Tue, Mar 2, 2010 at 3:17 PM, Ahmet Arslan iori...@yahoo.com wrote:


  Hi,
  I would like to solr to return to record from
  /exampledocs/hd.xml when I
  search for the value 6H500F0 (which is the ID field for
  the 2'nd record in
  that file).
  I know there is a setting that I should change to get this
  done, but I can't
  locate it.
  Field name ID is alread included in schema.xml file.

 If you want to retrieve the document with ID=6H500F0, use id:6H500F0 as
 query. If you don't explicitly specify field name in your query
 defaultSearchField (which is defined in schema.xml) is used/queried.


 http://localhost:8983/solr/select/?q=id%3A6H500F0version=2.2start=0rows=10indent=on






RE: Unindexed Fields Are Searchable?

2010-03-02 Thread Thomas Nguyen
For testing purposes.  I just wanted to see if unindex fields in documents 
added by Lucene API were searchable by Solr.  This is after discovering that 
the unindexed fields in documents added by Solr are searchable.

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Tuesday, March 02, 2010 1:23 PM
To: solr-user@lucene.apache.org
Subject: RE: Unindexed Fields Are Searchable?

 My schema has always had
 index=false for that field.  I only stopped and
 restarted the servlet container when I added a document to
 the index using the Lucene API instead of Solr.

Is there a special reason/use-case for to add documents using Lucene API?


  



Re: Setting the return query fields

2010-03-02 Thread Ahmet Arslan

 Thanks for the reply.
 Is there a place in the config file where I can set it to
 explicitly search
 the fields I want?

If you don't want to specify your fields at query time (also you want to query 
more than one fields at the same time) you can use DisMaxRequestHandler[1]. 
There are two example configurations (name=dismax and  name=partitioned) in 
solrconfig.xml. You can invoke them by appending qt=dismax or qt=partitioned 
to your search url.

[1]http://wiki.apache.org/solr/DisMaxRequestHandler


  


Re: Different weights to different fields

2010-03-02 Thread Erick Erickson
If you get the PACKT Solr 1.4 book, there are extensive examples of this
very thing.

It's *well* worth the time it'll save you...

Erick

On Tue, Mar 2, 2010 at 4:11 PM, Ahmet Arslan iori...@yahoo.com wrote:


  I'm new to Solr and just getting it set up
  and testing it out.  I'd like to know if there's a way
  to give a different weight to different data fields.
 
  For an example, I'm going to be storing song
  information.  I have the fields: Artist, Title,
  Description, and Tags.  I'd like occurrences of the
  search term in Artist and Title to count more than the ones
  found in Description and Tags.  For instance, a search
  for Bruce Springsteen against all the fields should return
  the ones where artist=Bruce Springsteen higher than ones
  that just have that within the description.  Is this
  possible either in the indexing or with a query option?

 You can do it in either query time or index time. In query time you can
 assign different boost values with carat operator.
 e.g. Artist:(Bruce Springsteen)^10 Title:(Bruce Springsteen)^5

 Also dismax[1] request handler might useful to you.

 [1]http://wiki.apache.org/solr/DisMaxRequestHandler

 At index time you can give different boost values to different fields. [2]
 e.g. field name=Artist boost=10.0

 [2]
 http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22field.22






Re: Logging in Embedded SolrServer - What a nightmare.

2010-03-02 Thread Kevin Osborn
Not sure if it will solve your specific problem. We use Solr as a WAR as well 
as Solrj.  So the main solr distribution comes with slf4j-jdk-1.5.5.jar. I just 
deleted that and replaced it with slf4j-log4j12-1.5.5.jar. And then it used my 
existing log4j.properties file.





From: Lucas F. A. Teixeira lucas...@gmail.com
To: solr-user@lucene.apache.org
Sent: Tue, March 2, 2010 11:14:26 AM
Subject: Logging in Embedded SolrServer - What a nightmare.

Hello all,

I'm having a hard time trying to change Solr queries logging level.
I've tried a lot of things I've found in the internet, this mailing list and
solr docs.

What I've found so far:

- Solr Embedded Server uses sfl4j lib for intermediating logging. Here I'm
using Log4j as my logging framework.
- Changing the .../jre/lib/logging.properties worked, but only when querying
using solr over http, and not on solr embedded.
- A log4j.xml that I've added it is not being respected. (It is logging with
a totally different layout and appenders)
- I've searched for other log4j config files in the classpath, and found
nothing...
- Even tried to call Logger.getLogger(org.apache.solr) and then set its
level manually inside the app, nothing changed...

So, Embedded Solr Server keeps logging queries and other stuff in my stdout.

Most docs and guides I've found in the internet is talking about solr http,
this is ok for me, with http I got everything working, but not with solr
embedded.
Have anyone achieved this with embedded?

Thanks a lot ppl,

[]s,


Lucas Frare Teixeira .·.
- lucas...@gmail.com
- lucastex.com.br
- blog.lucastex.com
- twitter.com/lucastex



  

Re: Unindexed Fields Are Searchable?

2010-03-02 Thread Erik Hatcher
Again, note that it should be index_ed_=false.  ed - very  
important!   If you're saying index=false, Solr is not reading that  
attribute at all, and going with the default for the field type.


Erik

On Mar 2, 2010, at 4:31 PM, Thomas Nguyen wrote:

For testing purposes.  I just wanted to see if unindex fields in  
documents added by Lucene API were searchable by Solr.  This is  
after discovering that the unindexed fields in documents added by  
Solr are searchable.


-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Tuesday, March 02, 2010 1:23 PM
To: solr-user@lucene.apache.org
Subject: RE: Unindexed Fields Are Searchable?


My schema has always had
index=false for that field.  I only stopped and
restarted the servlet container when I added a document to
the index using the Lucene API instead of Solr.


Is there a special reason/use-case for to add documents using Lucene  
API?









Re: replication issue

2010-03-02 Thread Matthieu Labour
The replication does not work for me


I have a big master solr and I want to start replicating it. I can see that the 
slave is downloading data from the master... I see a directory 
index.20100302093000 gets created in data/ next to index... I can see its size 
growing but then the directory gets deleted

Here is the complete trace (I added a couple of LOG messages and compile solr)

[2010-03-02 21:24:00] [pool-3-thread-1] 
DEBUG(MultiThreadedHttpConnectionManager.java:961) - Notifying no-one, there 
are no waiting threads
[2010-03-02 21:24:00] [pool-3-thread-1] INFO (SnapPuller.java:278) - Number of 
files in latest index in master: 163
[2010-03-02 21:24:00] [pool-3-thread-1] DEBUG(SnapPuller.java:536) - 
downloadIndexFiles(downloadCompleteIndex=false,tmpIdxDir=../solr/data/index.20100302092400,latestVersion=1266003907838)
[2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(SnapPuller.java:541) - 
--localIndexFile=/opt/solr_env/solr/data/index/_7h0y.fdx
[2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(SnapPuller.java:900) - fetchFile()
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter 
PostMethod.addParameter(String, String)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - 
enter EntityEnclosingMethod.clearRequestBody()
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter 
PostMethod.addParameter(String, String)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - 
enter EntityEnclosingMethod.clearRequestBody()
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter 
PostMethod.addParameter(String, String)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - 
enter EntityEnclosingMethod.clearRequestBody()
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter 
PostMethod.addParameter(String, String)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - 
enter EntityEnclosingMethod.clearRequestBody()
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter 
PostMethod.addParameter(String, String)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - 
enter EntityEnclosingMethod.clearRequestBody()
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpClient.java:321) - enter 
HttpClient.executeMethod(HttpMethod)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpClient.java:374) - enter 
HttpClient.executeMethod(HostConfiguration,HttpMethod,HttpState)
[2010-03-02 21:24:40] [pool-3-thread-1] 
TRACE(MultiThreadedHttpConnectionManager.java:405) - enter 
HttpConnectionManager.getConnectionWithTimeout(HostConfiguration, long)
[2010-03-02 21:24:40] [pool-3-thread-1] 
DEBUG(MultiThreadedHttpConnectionManager.java:412) - 
HttpConnectionManager.getConnection:  config = 
HostConfiguration[host=http://myserver.com:8983], timeout = 0
[2010-03-02 21:24:40] [pool-3-thread-1] 
TRACE(MultiThreadedHttpConnectionManager.java:805) - enter 
HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)
[2010-03-02 21:24:40] [pool-3-thread-1] 
TRACE(MultiThreadedHttpConnectionManager.java:805) - enter 
HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)
[2010-03-02 21:24:40] [pool-3-thread-1] 
DEBUG(MultiThreadedHttpConnectionManager.java:839) - Getting free connection, 
hostConfig=HostConfiguration[host=http://myserver.com:8983]
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodDirector.java:379) - 
Attempt number 1 to process request
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:1079) - enter 
HttpMethodBase.execute(HttpState, HttpConnection)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:2057) - enter 
HttpMethodBase.writeRequest(HttpState, HttpConnection)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:2212) - enter 
HttpMethodBase.writeRequestLine(HttpState, HttpConnection)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:1496) - enter 
HttpMethodBase.generateRequestLine(HttpConnection, String, String, String, 
String)
[2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(Wire.java:70) -  POST 
/solr/replication HTTP/1.1[\r][\n]
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpConnection.java:1032) - enter 
HttpConnection.print(String)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpConnection.java:942) - enter 
HttpConnection.write(byte[])
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpConnection.java:963) - enter 
HttpConnection.write(byte[], int, int)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:2175) - enter 
HttpMethodBase.writeRequestHeaders(HttpState,HttpConnection)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:370) - 
enter EntityEnclosingMethod.addRequestHeaders(HttpState, HttpConnection)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(ExpectContinueMethod.java:183) - 
enter ExpectContinueMethod.addRequestHeaders(HttpState, HttpConnection)

Re: Different weights to different fields

2010-03-02 Thread Alex Thurlow

That's great information.  Thanks!

-Alex

Alex Thurlow
Blastro Networks

http://www.blastro.com
http://www.roxwel.com
http://www.yallwire.com


On 3/2/2010 3:11 PM, Ahmet Arslan wrote:
   

 I'm new to Solr and just getting it set up
and testing it out.  I'd like to know if there's a way
to give a different weight to different data fields.

For an example, I'm going to be storing song
information.  I have the fields: Artist, Title,
Description, and Tags.  I'd like occurrences of the
search term in Artist and Title to count more than the ones
found in Description and Tags.  For instance, a search
for Bruce Springsteen against all the fields should return
the ones where artist=Bruce Springsteen higher than ones
that just have that within the description.  Is this
possible either in the indexing or with a query option?
 

You can do it in either query time or index time. In query time you can assign 
different boost values with carat operator.
e.g. Artist:(Bruce Springsteen)^10 Title:(Bruce Springsteen)^5

Also dismax[1] request handler might useful to you.

[1]http://wiki.apache.org/solr/DisMaxRequestHandler

At index time you can give different boost values to different fields. [2]
e.g.field name=Artist boost=10.0

[2]http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22field.22



   


Re: Unindexed Fields Are Searchable?

2010-03-02 Thread Ahmet Arslan

 Again, note that it should be
 index_ed_=false.  ed - very
 important!   If you're saying index=false,
 Solr is not reading that attribute at all, and going with
 the default for the field type.

Perfect catch :)





Re: replication issue

2010-03-02 Thread Matthieu Labour
One More information

I deleted the index on the master and I restarted the master and restarted the 
slave and now the replication works

Would it be possible that the replication doesn work well when started against 
an already existing big index ?

Thank you

--- On Tue, 3/2/10, Matthieu Labour matthieu_lab...@yahoo.com wrote:

From: Matthieu Labour matthieu_lab...@yahoo.com
Subject: Re: replication issue
To: solr-user@lucene.apache.org
Date: Tuesday, March 2, 2010, 3:35 PM

The replication does not work for me


I have a big master solr and I want to start replicating it. I can see that the 
slave is downloading data from the master... I see a directory 
index.20100302093000 gets created in data/ next to index... I can see its size 
growing but then the directory gets deleted

Here is the complete trace (I added a couple of LOG messages and compile solr)

[2010-03-02 21:24:00] [pool-3-thread-1] 
DEBUG(MultiThreadedHttpConnectionManager.java:961) - Notifying no-one, there 
are no waiting threads
[2010-03-02 21:24:00] [pool-3-thread-1] INFO (SnapPuller.java:278) - Number of 
files in latest index in master: 163
[2010-03-02 21:24:00] [pool-3-thread-1] DEBUG(SnapPuller.java:536) - 
downloadIndexFiles(downloadCompleteIndex=false,tmpIdxDir=../solr/data/index.20100302092400,latestVersion=1266003907838)
[2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(SnapPuller.java:541) - 
--localIndexFile=/opt/solr_env/solr/data/index/_7h0y.fdx
[2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(SnapPuller.java:900) - fetchFile()
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter 
PostMethod.addParameter(String, String)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - 
enter EntityEnclosingMethod.clearRequestBody()
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter 
PostMethod.addParameter(String, String)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - 
enter EntityEnclosingMethod.clearRequestBody()
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter 
PostMethod.addParameter(String, String)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - 
enter EntityEnclosingMethod.clearRequestBody()
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter 
PostMethod.addParameter(String, String)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - 
enter EntityEnclosingMethod.clearRequestBody()
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter 
PostMethod.addParameter(String, String)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) - 
enter EntityEnclosingMethod.clearRequestBody()
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpClient.java:321) - enter 
HttpClient.executeMethod(HttpMethod)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpClient.java:374) - enter 
HttpClient.executeMethod(HostConfiguration,HttpMethod,HttpState)
[2010-03-02 21:24:40] [pool-3-thread-1] 
TRACE(MultiThreadedHttpConnectionManager.java:405) - enter 
HttpConnectionManager.getConnectionWithTimeout(HostConfiguration, long)
[2010-03-02 21:24:40] [pool-3-thread-1] 
DEBUG(MultiThreadedHttpConnectionManager.java:412) - 
HttpConnectionManager.getConnection:  config = 
HostConfiguration[host=http://myserver.com:8983], timeout = 0
[2010-03-02 21:24:40] [pool-3-thread-1] 
TRACE(MultiThreadedHttpConnectionManager.java:805) - enter 
HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)
[2010-03-02 21:24:40] [pool-3-thread-1] 
TRACE(MultiThreadedHttpConnectionManager.java:805) - enter 
HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)
[2010-03-02 21:24:40] [pool-3-thread-1] 
DEBUG(MultiThreadedHttpConnectionManager.java:839) - Getting free connection, 
hostConfig=HostConfiguration[host=http://myserver.com:8983]
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodDirector.java:379) - 
Attempt number 1 to process request
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:1079) - enter 
HttpMethodBase.execute(HttpState, HttpConnection)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:2057) - enter 
HttpMethodBase.writeRequest(HttpState, HttpConnection)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:2212) - enter 
HttpMethodBase.writeRequestLine(HttpState, HttpConnection)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:1496) - enter 
HttpMethodBase.generateRequestLine(HttpConnection, String, String, String, 
String)
[2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(Wire.java:70) -  POST 
/solr/replication HTTP/1.1[\r][\n]
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpConnection.java:1032) - enter 
HttpConnection.print(String)
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpConnection.java:942) - enter 
HttpConnection.write(byte[])
[2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpConnection.java:963) - enter 

Re: replication issue

2010-03-02 Thread Otis Gospodnetic
Hi Matthieu,

Does this happen over and over?
Is this with Solr 1.4 or some other version?
Is there anything unusual about _7h0y.fdx?
Does _7h0y.fdx still exist on the master when the replication fails?
...

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
 From: Matthieu Labour matthieu_lab...@yahoo.com
 To: solr-user@lucene.apache.org
 Sent: Tue, March 2, 2010 4:35:46 PM
 Subject: Re: replication issue
 
 The replication does not work for me
 
 
 I have a big master solr and I want to start replicating it. I can see that 
 the 
 slave is downloading data from the master... I see a directory 
 index.20100302093000 gets created in data/ next to index... I can see its 
 size 
 growing but then the directory gets deleted
 
 Here is the complete trace (I added a couple of LOG messages and compile solr)
 
 [2010-03-02 21:24:00] [pool-3-thread-1] 
 DEBUG(MultiThreadedHttpConnectionManager.java:961) - Notifying no-one, there 
 are 
 no waiting threads
 [2010-03-02 21:24:00] [pool-3-thread-1] INFO (SnapPuller.java:278) - Number 
 of 
 files in latest index in master: 163
 [2010-03-02 21:24:00] [pool-3-thread-1] DEBUG(SnapPuller.java:536) - 
 downloadIndexFiles(downloadCompleteIndex=false,tmpIdxDir=../solr/data/index.20100302092400,latestVersion=1266003907838)
 [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(SnapPuller.java:541) - 
 --localIndexFile=/opt/solr_env/solr/data/index/_7h0y.fdx
 [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(SnapPuller.java:900) - 
 fetchFile()
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter 
 PostMethod.addParameter(String, String)
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) 
 - 
 enter EntityEnclosingMethod.clearRequestBody()
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter 
 PostMethod.addParameter(String, String)
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) 
 - 
 enter EntityEnclosingMethod.clearRequestBody()
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter 
 PostMethod.addParameter(String, String)
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) 
 - 
 enter EntityEnclosingMethod.clearRequestBody()
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter 
 PostMethod.addParameter(String, String)
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) 
 - 
 enter EntityEnclosingMethod.clearRequestBody()
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter 
 PostMethod.addParameter(String, String)
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) 
 - 
 enter EntityEnclosingMethod.clearRequestBody()
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpClient.java:321) - enter 
 HttpClient.executeMethod(HttpMethod)
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpClient.java:374) - enter 
 HttpClient.executeMethod(HostConfiguration,HttpMethod,HttpState)
 [2010-03-02 21:24:40] [pool-3-thread-1] 
 TRACE(MultiThreadedHttpConnectionManager.java:405) - enter 
 HttpConnectionManager.getConnectionWithTimeout(HostConfiguration, long)
 [2010-03-02 21:24:40] [pool-3-thread-1] 
 DEBUG(MultiThreadedHttpConnectionManager.java:412) - 
 HttpConnectionManager.getConnection:  config = 
 HostConfiguration[host=http://myserver.com:8983], timeout = 0
 [2010-03-02 21:24:40] [pool-3-thread-1] 
 TRACE(MultiThreadedHttpConnectionManager.java:805) - enter 
 HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)
 [2010-03-02 21:24:40] [pool-3-thread-1] 
 TRACE(MultiThreadedHttpConnectionManager.java:805) - enter 
 HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)
 [2010-03-02 21:24:40] [pool-3-thread-1] 
 DEBUG(MultiThreadedHttpConnectionManager.java:839) - Getting free connection, 
 hostConfig=HostConfiguration[host=http://myserver.com:8983]
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodDirector.java:379) - 
 Attempt number 1 to process request
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:1079) - 
 enter 
 HttpMethodBase.execute(HttpState, HttpConnection)
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:2057) - 
 enter 
 HttpMethodBase.writeRequest(HttpState, HttpConnection)
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:2212) - 
 enter 
 HttpMethodBase.writeRequestLine(HttpState, HttpConnection)
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:1496) - 
 enter 
 HttpMethodBase.generateRequestLine(HttpConnection, String, String, String, 
 String)
 [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(Wire.java:70) -  POST 
 /solr/replication HTTP/1.1[\r][\n]
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpConnection.java:1032) - 
 enter 
 HttpConnection.print(String)
 [2010-03-02 21:24:40] [pool-3-thread-1] 

RE: Unindexed Fields Are Searchable?

2010-03-02 Thread Thomas Nguyen
Great catch!  Thanks for spotting my error :)

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Tuesday, March 02, 2010 2:07 PM
To: solr-user@lucene.apache.org
Subject: Re: Unindexed Fields Are Searchable?


 Again, note that it should be
 index_ed_=false.  ed - very
 important!   If you're saying index=false,
 Solr is not reading that attribute at all, and going with
 the default for the field type.

Perfect catch :)


  



Re: replication issue

2010-03-02 Thread Matthieu Labour
Otis
Thank your for your response. I apologize for not being specific enough
-- yes it happened over  over.
-- apache-solr-1.4.0
-- I restarted the indexing+replication from scratch. Before I did that, I 
backed up the master index directory. I don't see _7h0y.fdx in it 
What could have possibly happen?



--- On Tue, 3/2/10, Otis Gospodnetic otis_gospodne...@yahoo.com wrote:

From: Otis Gospodnetic otis_gospodne...@yahoo.com
Subject: Re: replication issue
To: solr-user@lucene.apache.org
Date: Tuesday, March 2, 2010, 4:40 PM

Hi Matthieu,

Does this happen over and over?
Is this with Solr 1.4 or some other version?
Is there anything unusual about _7h0y.fdx?
Does _7h0y.fdx still exist on the master when the replication fails?
...

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
 From: Matthieu Labour matthieu_lab...@yahoo.com
 To: solr-user@lucene.apache.org
 Sent: Tue, March 2, 2010 4:35:46 PM
 Subject: Re: replication issue
 
 The replication does not work for me
 
 
 I have a big master solr and I want to start replicating it. I can see that 
 the 
 slave is downloading data from the master... I see a directory 
 index.20100302093000 gets created in data/ next to index... I can see its 
 size 
 growing but then the directory gets deleted
 
 Here is the complete trace (I added a couple of LOG messages and compile solr)
 
 [2010-03-02 21:24:00] [pool-3-thread-1] 
 DEBUG(MultiThreadedHttpConnectionManager.java:961) - Notifying no-one, there 
 are 
 no waiting threads
 [2010-03-02 21:24:00] [pool-3-thread-1] INFO (SnapPuller.java:278) - Number 
 of 
 files in latest index in master: 163
 [2010-03-02 21:24:00] [pool-3-thread-1] DEBUG(SnapPuller.java:536) - 
 downloadIndexFiles(downloadCompleteIndex=false,tmpIdxDir=../solr/data/index.20100302092400,latestVersion=1266003907838)
 [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(SnapPuller.java:541) - 
 --localIndexFile=/opt/solr_env/solr/data/index/_7h0y.fdx
 [2010-03-02 21:24:40] [pool-3-thread-1] DEBUG(SnapPuller.java:900) - 
 fetchFile()
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter 
 PostMethod.addParameter(String, String)
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) 
 - 
 enter EntityEnclosingMethod.clearRequestBody()
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter 
 PostMethod.addParameter(String, String)
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) 
 - 
 enter EntityEnclosingMethod.clearRequestBody()
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter 
 PostMethod.addParameter(String, String)
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) 
 - 
 enter EntityEnclosingMethod.clearRequestBody()
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter 
 PostMethod.addParameter(String, String)
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) 
 - 
 enter EntityEnclosingMethod.clearRequestBody()
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(PostMethod.java:265) - enter 
 PostMethod.addParameter(String, String)
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(EntityEnclosingMethod.java:150) 
 - 
 enter EntityEnclosingMethod.clearRequestBody()
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpClient.java:321) - enter 
 HttpClient.executeMethod(HttpMethod)
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpClient.java:374) - enter 
 HttpClient.executeMethod(HostConfiguration,HttpMethod,HttpState)
 [2010-03-02 21:24:40] [pool-3-thread-1] 
 TRACE(MultiThreadedHttpConnectionManager.java:405) - enter 
 HttpConnectionManager.getConnectionWithTimeout(HostConfiguration, long)
 [2010-03-02 21:24:40] [pool-3-thread-1] 
 DEBUG(MultiThreadedHttpConnectionManager.java:412) - 
 HttpConnectionManager.getConnection:  config = 
 HostConfiguration[host=http://myserver.com:8983], timeout = 0
 [2010-03-02 21:24:40] [pool-3-thread-1] 
 TRACE(MultiThreadedHttpConnectionManager.java:805) - enter 
 HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)
 [2010-03-02 21:24:40] [pool-3-thread-1] 
 TRACE(MultiThreadedHttpConnectionManager.java:805) - enter 
 HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)
 [2010-03-02 21:24:40] [pool-3-thread-1] 
 DEBUG(MultiThreadedHttpConnectionManager.java:839) - Getting free connection, 
 hostConfig=HostConfiguration[host=http://myserver.com:8983]
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodDirector.java:379) - 
 Attempt number 1 to process request
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:1079) - 
 enter 
 HttpMethodBase.execute(HttpState, HttpConnection)
 [2010-03-02 21:24:40] [pool-3-thread-1] TRACE(HttpMethodBase.java:2057) - 
 enter 
 HttpMethodBase.writeRequest(HttpState, HttpConnection)
 [2010-03-02 21:24:40] [pool-3-thread-1] 

Re: Implementing hierarchical facet

2010-03-02 Thread Geert-Jan Brits
If it's a requirement to let Solr handle the facet-hierarchy please
disregard this post, but
an alternative would be to have your App control when to ask for which
'facet-level' (e.g: country, state, city) in the hierarchy.

as follows,

each doc has 3 seperate fields (indexed=true, stored=false):
- countryid
- stateid
- cityid

facet on country:
facet=onfacet.field=countryid

facet on state ( country selected. functionally you probably don't want to
show states without the user having selected a country anyway)
facet=onfacet.field=countryidfq=countryid:somecountryid

facet on city (state selected, same functional analogy as above)
facet=onfacet.field=cityidfq=stateid:somestateid

or

facet on city (countryselected, same functional analogy as above)
facet=onfacet.field=cityidfq=countryid:somecountryid

grab the resulting facat and drop it under Location

pros:
- reusing fq's (good performance, I've never used hierarchical facets, but
would be surprised if it has a (major) speed increase to this method)
- flexible (you get multiple hierarchies: country -- state -- city and
country -- city)

cons:
- a little more application logic

Hope that helps,
Geert-Jan





2010/3/2 Andy angelf...@yahoo.com

 I read that a simple way to implement hierarchical facet is to concatenate
 strings with a separator. Something like level1level2level3 with  as
 the separator.

 A problem with this approach is that the number of facet values will
 greatly increase.

 For example I have a facet Location with the hierarchy
 countrystatecity. Using the above approach every single city will lead to
 a separate facet value. With tens of thousands of cities in the world the
 response from Solr will be huge. And then on the client side I'd have to
 loop through all the facet values and combine those with the same country
 into a single value.

 Ideally Solr would be aware of the hierarchy structure and send back
 responses accordingly. So at level 1 Solr will send back facet values based
 on country (100 or so values). Level 2 the facet values will be based on the
 states within the selected country (a few dozen values). Next level will be
 cities within that state. and so on.

 Is it possible to implement hierarchical facet this way using Solr?






Re: Implementing hierarchical facet

2010-03-02 Thread Geert-Jan Brits
Using Solr 1.4: even less changes to the frontend:

facet=onfacet.field={!key=Location}countryid
...
facet=onfacet.field={!key=Location}cityidfq=countryid:somecountryid
etc.

will consistently render the resulting facet under the name Location .


2010/3/3 Geert-Jan Brits gbr...@gmail.com

 If it's a requirement to let Solr handle the facet-hierarchy please
 disregard this post, but
 an alternative would be to have your App control when to ask for which
 'facet-level' (e.g: country, state, city) in the hierarchy.

 as follows,

 each doc has 3 seperate fields (indexed=true, stored=false):
 - countryid
 - stateid
 - cityid

 facet on country:
 facet=onfacet.field=countryid

 facet on state ( country selected. functionally you probably don't want to
 show states without the user having selected a country anyway)
 facet=onfacet.field=countryidfq=countryid:somecountryid

 facet on city (state selected, same functional analogy as above)
 facet=onfacet.field=cityidfq=stateid:somestateid

 or

 facet on city (countryselected, same functional analogy as above)
 facet=onfacet.field=cityidfq=countryid:somecountryid

 grab the resulting facat and drop it under Location

 pros:
 - reusing fq's (good performance, I've never used hierarchical facets, but
 would be surprised if it has a (major) speed increase to this method)
 - flexible (you get multiple hierarchies: country -- state -- city and
 country -- city)

 cons:
 - a little more application logic

 Hope that helps,
 Geert-Jan





 2010/3/2 Andy angelf...@yahoo.com

 I read that a simple way to implement hierarchical facet is to concatenate
 strings with a separator. Something like level1level2level3 with  as
 the separator.

 A problem with this approach is that the number of facet values will
 greatly increase.

 For example I have a facet Location with the hierarchy
 countrystatecity. Using the above approach every single city will lead to
 a separate facet value. With tens of thousands of cities in the world the
 response from Solr will be huge. And then on the client side I'd have to
 loop through all the facet values and combine those with the same country
 into a single value.

 Ideally Solr would be aware of the hierarchy structure and send back
 responses accordingly. So at level 1 Solr will send back facet values based
 on country (100 or so values). Level 2 the facet values will be based on the
 states within the selected country (a few dozen values). Next level will be
 cities within that state. and so on.

 Is it possible to implement hierarchical facet this way using Solr?









Need suggestion regarding custom transformer

2010-03-02 Thread KshamaPai

Hi,
Am new to solr.
I am trying location aware search with spatial lucene in solr1.5 nightly
build.
My table in mysql has just lat,lng and some text .I want to add geohash,
lat_rad(lat in radian) and lng_rad field into the document before indexing.
I have used dataimport to get my table to solr.
I have to use GeohashUtils.Encode() to get geohash from corresponding
lat,lng of each row;
and *ToRads function to get lat in radians.

Can i use custom transformers so that after retreiving each row , add these
fields and then index while using dataimport?
Or do i have to do data migration to xml and then do changes required before
indexing?

Thanks in advance.



-- 
View this message in context: 
http://old.nabble.com/Need-suggestion-regarding-custom-transformer-tp27763576p27763576.html
Sent from the Solr - User mailing list archive at Nabble.com.



Getting total term count

2010-03-02 Thread Akash Sahu

Hi, 
I want a want a way to get total term count per document. I am using
solr1.4. 

My query looks something like this
http://192.168.1.50:8080/solr1/core_SFS/select/?q=content%3Apresident%0D%0Aversion=2.2start=0rows=10indent=on

I tried to use TermVectorComponent but it just gives me the number of
document where the term was found.
(This was the option which i used ---
qt=tvrhtv=truetv.tf=truetv.df=truetv.positionstv.offsets=true)

Can anyone please guide me on how to get the total term count per document.

Thanks.
-- 
View this message in context: 
http://old.nabble.com/Getting-total-term-count-tp27763844p27763844.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Implementing hierarchical facet

2010-03-02 Thread Andy
Thanks. I didn't know about the {!key=Location} trick.

Thanks everyone for your help. From what I could gather, there're 3 approaches:

1) SOLR-64
Pros:
- can have arbitrary levels of hierarchy without modifying schema
Cons:
- each combination of all the levels in the hierarchy will result in a separate 
filter cache. This number could be huge, which would lead to poor performance

2) SOLR-792
Pros:
- each level of the hierarchy separately results in filter cache. Much smaller 
number of filter cache. Better performance.
Cons:
- Only 2 levels are supported

3) Separate fields for each hierarchy levels
Pros:
- same as SOLR-792. Good performance
Cons:
- can only handle a fixed number of levels in the hierarchy. Adding any levels 
beyond that requires schema modification

Does that sound right?

Option 3 is probably the best match for my use case. Is there any trick to make 
it able to deal with arbitrary number of levels?

Thanks.

--- On Tue, 3/2/10, Geert-Jan Brits gbr...@gmail.com wrote:

From: Geert-Jan Brits gbr...@gmail.com
Subject: Re: Implementing hierarchical facet
To: solr-user@lucene.apache.org
Date: Tuesday, March 2, 2010, 8:02 PM

Using Solr 1.4: even less changes to the frontend:

facet=onfacet.field={!key=Location}countryid
...
facet=onfacet.field={!key=Location}cityidfq=countryid:somecountryid
etc.

will consistently render the resulting facet under the name Location .


2010/3/3 Geert-Jan Brits gbr...@gmail.com

 If it's a requirement to let Solr handle the facet-hierarchy please
 disregard this post, but
 an alternative would be to have your App control when to ask for which
 'facet-level' (e.g: country, state, city) in the hierarchy.

 as follows,

 each doc has 3 seperate fields (indexed=true, stored=false):
 - countryid
 - stateid
 - cityid

 facet on country:
 facet=onfacet.field=countryid

 facet on state ( country selected. functionally you probably don't want to
 show states without the user having selected a country anyway)
 facet=onfacet.field=countryidfq=countryid:somecountryid

 facet on city (state selected, same functional analogy as above)
 facet=onfacet.field=cityidfq=stateid:somestateid

 or

 facet on city (countryselected, same functional analogy as above)
 facet=onfacet.field=cityidfq=countryid:somecountryid

 grab the resulting facat and drop it under Location

 pros:
 - reusing fq's (good performance, I've never used hierarchical facets, but
 would be surprised if it has a (major) speed increase to this method)
 - flexible (you get multiple hierarchies: country -- state -- city and
 country -- city)

 cons:
 - a little more application logic

 Hope that helps,
 Geert-Jan





 2010/3/2 Andy angelf...@yahoo.com

 I read that a simple way to implement hierarchical facet is to concatenate
 strings with a separator. Something like level1level2level3 with  as
 the separator.

 A problem with this approach is that the number of facet values will
 greatly increase.

 For example I have a facet Location with the hierarchy
 countrystatecity. Using the above approach every single city will lead to
 a separate facet value. With tens of thousands of cities in the world the
 response from Solr will be huge. And then on the client side I'd have to
 loop through all the facet values and combine those with the same country
 into a single value.

 Ideally Solr would be aware of the hierarchy structure and send back
 responses accordingly. So at level 1 Solr will send back facet values based
 on country (100 or so values). Level 2 the facet values will be based on the
 states within the selected country (a few dozen values). Next level will be
 cities within that state. and so on.

 Is it possible to implement hierarchical facet this way using Solr?










  

Re: question regarding coord() value

2010-03-02 Thread Lance Norskog
The first 2 queries 'electORnics' instead of 'electROnics'.

The third query shows the situation. The first clause has 1 out of 2
matches, and the second has 1 out of 3 matches. Look for the two
'coord' entries. They are 1/2 and 1/3.

  str name=SP2514N
0.61808145 = (MATCH) sum of:
  0.16856766 = (MATCH) product of:
0.33713531 = (MATCH) sum of:
  0.33713531 = (MATCH) weight(name:samsung in 0), product of:
0.39687544 = queryWeight(name:samsung), product of:
  3.3978953 = idf(docFreq=1, maxDocs=22)
  0.116800375 = queryNorm
0.84947383 = (MATCH) fieldWeight(name:samsung in 0), product of:
  1.0 = tf(termFreq(name:samsung)=1)
  3.3978953 = idf(docFreq=1, maxDocs=22)
  0.25 = fieldNorm(field=name, doc=0)
0.5 = coord(1/2)
   0.44951376 = (MATCH) product of:
1.3485413 = (MATCH) sum of:
  1.3485413 = (MATCH) weight(manu:electronics in 0), product of:
0.39687544 = queryWeight(manu:electronics), product of:
  3.3978953 = idf(docFreq=1, maxDocs=22)
  0.116800375 = queryNorm
3.3978953 = (MATCH) fieldWeight(manu:electronics in 0), product of:
  1.0 = tf(termFreq(manu:electronics)=1)
  3.3978953 = idf(docFreq=1, maxDocs=22)
  1.0 = fieldNorm(field=manu, doc=0)
0.3334 = coord(1/3)


On Tue, Mar 2, 2010 at 3:35 AM, Smith G gudumba.sm...@gmail.com wrote:
 Hello ,
         I have been trying to find out what exactly coord-value is .
 I have executed different queries where I have observed strange
 behaviour.
 Leave the numerator-value in coord fraction at the moment as I am
 really confused what exactly the denominator is.
 Here are the examples .

 Query 1)

  (+text:samsung +text:electron +name:samsung) (+manu:samsung
 +features:samsung (+manu:electronics +name:electronics))
 manu:electornics name:one name:two

 coord value is : 1/5 [consider only denominator], I guess as there are
 5 clauses (combinations) it could be five.
 
 Query 2)

 ((+text:samsung +(text:electron name:samsung)) (+manu:samsung
 +features:samsung (+manu:electronics +name:electronics)))
 (manu:electornics name:one) name:two

 coord value is :1/3 . Same logic works here [for the denominator value-3]
 
 Query 3)

 (name:samsung features:abc) (features:name name:electronics manu:electronics)

 But here, coord value is : 1/3 . I have been trying to reckon how it
 could be 3, but I could not.
 -

 I have tried to correlate the info present in the Java Documentation,
 but I was not successful again.
 Please clarify.

 Thanks.




-- 
Lance Norskog
goks...@gmail.com


Re: Simultaneous Writes to Index

2010-03-02 Thread Lance Norskog
Locking is at a lower level than indexing and queries. Solr
coordinates multi-threaded indexing and query operations in memory and
a separate thread writes data to disk. There are no performance
problems with multiple searches and indexes happening at the same
time.

2010/3/2 Kranti™ K K Parisa kranti.par...@gmail.com:
 and also about the time when two update requests come at the same time. Then
 whichever request comes first will be updating the index while other
 requests wait until the locktimeout that we have configured??


 Best Regards,
 Kranti K K Parisa



 2010/3/2 Kranti™ K K Parisa kranti.par...@gmail.com

 Hi Ron,

 Thanks for the reply. So does this mean that writer lock is nothing to do
 with concurrent writes?

 Best Regards,
 Kranti K K Parisa



 On Tue, Mar 2, 2010 at 4:19 PM, Ron Chan rc...@i-tao.com wrote:

 as long as the document id is unique, concurrent writes is fine

 if for same reason the same doc id is used then it is overwritten, so last
 in will be the one that is in the index

 Ron

 - Original Message -
 From: Kranti™ K K Parisa kranti.par...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, 2 March, 2010 10:40:37 AM
 Subject: Simultaneous Writes to Index

 Hi,

 I am planning to development some application on which users could update
 their account data after login, this is on top of the search facility
 users
 have. the basic work flow is
 1) user logs in
 2) searches for some data
 3) gets the results from solr index
 4) save some of the search results into their repository
 5) later on they may view their repository

 for this, at step4 I am planning to write that into a separate solr index
 as
 user may search within his repository and get the results, facets..etc.
 So thinking to write such data/info to a separate solr index.

 in this plan, how simultaneous writes to the user history index works.
 what
 are the best practices in such scenarios of updating index at a time by
 different users.

 the other alternative is to store such user info into DB, and schedule
 indexing process at regular intervals. But that wont make the system live
 with user actions, as there would be some delay, users cant see the data
 they saved in their repository until its indexed.

 that is the reason I am planning to use SOLR xml post request to update
 the
 index silently but how about multiple users writing on same index?

 Best Regards,
 Kranti K K Parisa







-- 
Lance Norskog
goks...@gmail.com


DIH onError question

2010-03-02 Thread Shah, Nirmal
Hi all,

I am using Solr 1.5 from trunk.  I am getting the below error on a full
load, and it is causing the import to fail and rollback.  I am not
concerned about the error but rather that I cannot seem to tell the
indexing to continue.  I have two entities, and I have tried all (4)
combinations of skip and continue for their onError attributes.

SEVERE: Exception while processing: f document : null
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NoClassDefFoundError:
org/bouncycastle/jce/provider/BouncyCastleProvider
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:652)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:606)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
:261)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
5)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
r.java:333)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
:391)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
372)
Caused by: java.lang.NoClassDefFoundError:
org/bouncycastle/jce/provider/BouncyCastleProvider
at
org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1108
)
at
org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573)
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:23
5)
at
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
at
org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
yProcessor.java:124)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
ProcessorWrapper.java:233)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:580)
... 6 more
Mar 2, 2010 10:21:05 PM org.apache.solr.handler.dataimport.DataImporter
doFullImport
SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NoClassDefFoundError:
org/bouncycastle/jce/provider/BouncyCastleProvider
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:652)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:606)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
:261)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
5)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
r.java:333)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
:391)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
372)
Caused by: java.lang.NoClassDefFoundError:
org/bouncycastle/jce/provider/BouncyCastleProvider
at
org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1108
)
at
org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573)
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:23
5)
at
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
at
org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
yProcessor.java:124)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
ProcessorWrapper.java:233)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:580)
... 6 more
Mar 2, 2010 10:21:05 PM org.apache.solr.update.DirectUpdateHandler2
rollback
INFO: start rollback


My data-config file:
dataConfig
  dataSource name=binaryFile type=BinFileDataSource /
  document
entity name=f processor=FileListEntityProcessor
transformer=RegexTransformer,TemplateTransformer baseDir=C:\Docs
fileName=.*pdf recursive=true   rootEntity=false pk=id
dataSource=binaryFile onError=skip
field column=id sourceColName=fileAbsolutePath regex=\\
replaceWith=/ /
  entity dataSource=binaryFile name=x
processor=TikaEntityProcessor url=${f.fileAbsolutePath}
onError=continue 
field column=text name=text /
  /entity
/entity
  /document
/dataConfig


Thanks,
Nirmal


Re: How can I get Solr-Cell to extract to multi-valued fields?

2010-03-02 Thread Lance Norskog
It is a bug. I just filed this. It is just a unit test that displays
the behavior.

http://issues.apache.org/jira/browse/SOLR-1803

On Tue, Mar 2, 2010 at 9:07 AM, Mark Roberts mark.robe...@red-gate.com wrote:
 Hi,

 I have a schema with a multivalued field like so:

 field name=product type=string indexed=true stored=true 
 multiValued=true/

 I am uploading html documents to the Solr extraction handler which contain 
 meta in the head, like so:

 meta name=product content=firstproduct /
 meta name=product content=anotherproduct /
 meta name=product content=andanotherproduct /

 I want the extraction handler to map each of these pieces of meta onto the 
 product field, however, there seems to be a problem - only the last item 
 andanotherproduct is mapped, the first seem to be ignored.

 It does work, however, if I pass the values as literals in the query string 
 (e.g. 
 literal.product=firstproductliteral.product=anotherproductliteral.product=andanotherproduct)

 I've tried the release version 1.4 of solr and a recent nightly build of 1.5 
 and neither work.

 Is this a bug in Solr-cell or am I doing something wrong?

 Many thanks,
 Mark.




-- 
Lance Norskog
goks...@gmail.com


Re: Warning : no lockType configured for...

2010-03-02 Thread Mani EZZAT
I don't know, I didn't try because I have the need to create a different 
core each time.


I'll do some tests with the default config and will report back to all 
of you

Thank you for your time

Tom Hill. wrote:

Hi Mani,

Mani EZZAT wrote:
  
I'm dynamically creating cores with a new index, using the same schema 
and solrconfig.xml



Does the problem occur if you use the same configuration in a single, static
core?

Tom