Re: Indexing HTML document

2010-03-03 Thread György Frivolt
Thank you! That's even more I wanted to know. ;)

Georg


On Tue, Mar 2, 2010 at 10:05 PM, Walter Underwood wun...@wunderwood.orgwrote:

 You are in luck, because Avi Rappoport has just written a tutorial about
 how to do this. It is available from Lucid Imagination:


 http://www.lucidimagination.com/solutions/whitepapers/Indexing-Text-and-HTML-Files-with-Solr

 I've just started reviewing it, but knowing Avi, I expect it to be very
 helpful.

 wunder

 On Mar 2, 2010, at 8:28 AM, Siddhant Goel wrote:

  There is an HTML filter documented here, which might be of some help -
 
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory
 
  Control characters can be eliminated using code like this -
 
 http://bitbucket.org/cogtree/python-solr/src/tip/pythonsolr/pysolr.py#cl-449
 
  On Tue, Mar 2, 2010 at 9:37 PM, György Frivolt gyorgy.friv...@gmail.com
 wrote:
 
  Hi, How to index properly HTML documents? All the documents are HTML,
 some
  containing charaters encodid like #x17E;#xED; ... Is there a character
  filter for filtering these codes? Is there a way to strip the HTML tags
  out?
  Does solr weight the terms in the document based on where they appear?..
  words in headers (H1, H2,..) would be supposed to describe the document
  more
  then words in paragraphs.
 
  Thanks for help,
 
   Georg
 
 
 
 
  --
  - Siddhant




Clustering from anlayzed text instead of raw input

2010-03-03 Thread JCodina

I'm trying to use  carrot2 (now I started with the workbench) and I can
cluster any field, but, the text used for clustering is the original raw
text, the one that was indexed, without any of the processing performed by
the tokenizer or filters. 
So I get stop words.
 I also did shingles (after filtering by POS) and I can not cluster using
these multiwords. 
So my question is about how to get in a query answer the indexed text
instead of the original one, because if I set stored to false, then the
search does not return the content of the field.

Tahnks in advance

Joan
-- 
View this message in context: 
http://old.nabble.com/Clustering-from-anlayzed-text-instead-of-raw-input-tp27765780p27765780.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: 2 Cores, 1 Table, 2 DataImporter -- Import at the same time ?

2010-03-03 Thread stocki

pleeease help me somebody =( :P




stocki wrote:
 
 Hello again ;)
 
 i install tomcat5.5 on my debian server ...
 
 i use 2 cores and two different DIH with seperatet Index, one for the
 normal search-feature and the other core for the suggest-feature. 
 
 but i cannot start both DIH with an import command at the same time. how
 it this possible ? 
 
 
 thx 
 

-- 
View this message in context: 
http://old.nabble.com/2-Cores%2C-1-Table%2C-2-DataImporter---%3E-Import-at-the-same-time---tp27756255p27765825.html
Sent from the Solr - User mailing list archive at Nabble.com.



error in sum function

2010-03-03 Thread JCodina

the sum function or the map one are not parsed correctly,
doing this sort, works as a charm...
sort=score+desc,sum(Num,map(Num,0,2000,42000))+asc
but

sort=score+desc,sum(map(Num,0,2000,42000),Num)+asc

gives the following exception

SEVERE: org.apache.solr.common.SolrException: Must declare sort field or
function
at
org.apache.solr.search.QueryParsing.processSort(QueryParsing.java:376)
at
org.apache.solr.search.QueryParsing.parseSort(QueryParsing.java:281)
at org.apache.solr.search.QParser.getSort(QParser.java:217)
at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:86)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

you can test it in here using these two url's

http://varoitus.barcelonamedia.org:8180/fbm-ijec/select?indent=onfl=id,Num,scoresort=score%20asc,sum%28map%28Num,0,5000,42000%29,Num%29+ascq=+entities_org:%28%22Amena%22%29

http://varoitus.barcelonamedia.org:8180/fbm-ijec/select?indent=onwt=phpfl=id,Num,scorerows=50sort=score+desc,sum%28Num,map%28Num,0,2000,42000%29%29+ascq=+entities_org:Amena




-- 
View this message in context: 
http://old.nabble.com/error-in-sum-function-tp27765881p27765881.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Implementing hierarchical facet

2010-03-03 Thread Geert-Jan Brits
you could always define 1 dynamicfield and encode the hierarchy level in the
fieldname:

dynamicField name=_loc_hier_* type=string stored=false indexed=true
omitNorms=true/
using:
facet=onfacet.field={!key=Location}_loc_hier_cityfq=_loc_hier_country:somecountryid
...
adding cityarea later for instance would be as simple as:
facet=onfacet.field={!key=Location}_loc_hier_cityareafq=_loc_hier_city:somecityid

Cheers,
Geert-Jan


2010/3/3 Andy angelf...@yahoo.com

 Thanks. I didn't know about the {!key=Location} trick.

 Thanks everyone for your help. From what I could gather, there're 3
 approaches:

 1) SOLR-64
 Pros:
 - can have arbitrary levels of hierarchy without modifying schema
 Cons:
 - each combination of all the levels in the hierarchy will result in a
 separate filter cache. This number could be huge, which would lead to poor
 performance

 2) SOLR-792
 Pros:
 - each level of the hierarchy separately results in filter cache. Much
 smaller number of filter cache. Better performance.
 Cons:
 - Only 2 levels are supported

 3) Separate fields for each hierarchy levels
 Pros:
 - same as SOLR-792. Good performance
 Cons:
 - can only handle a fixed number of levels in the hierarchy. Adding any
 levels beyond that requires schema modification

 Does that sound right?

 Option 3 is probably the best match for my use case. Is there any trick to
 make it able to deal with arbitrary number of levels?

 Thanks.

 --- On Tue, 3/2/10, Geert-Jan Brits gbr...@gmail.com wrote:

 From: Geert-Jan Brits gbr...@gmail.com
 Subject: Re: Implementing hierarchical facet
 To: solr-user@lucene.apache.org
 Date: Tuesday, March 2, 2010, 8:02 PM

 Using Solr 1.4: even less changes to the frontend:

 facet=onfacet.field={!key=Location}countryid
 ...
 facet=onfacet.field={!key=Location}cityidfq=countryid:somecountryid
 etc.

 will consistently render the resulting facet under the name Location .


 2010/3/3 Geert-Jan Brits gbr...@gmail.com

  If it's a requirement to let Solr handle the facet-hierarchy please
  disregard this post, but
  an alternative would be to have your App control when to ask for which
  'facet-level' (e.g: country, state, city) in the hierarchy.
 
  as follows,
 
  each doc has 3 seperate fields (indexed=true, stored=false):
  - countryid
  - stateid
  - cityid
 
  facet on country:
  facet=onfacet.field=countryid
 
  facet on state ( country selected. functionally you probably don't want
 to
  show states without the user having selected a country anyway)
  facet=onfacet.field=countryidfq=countryid:somecountryid
 
  facet on city (state selected, same functional analogy as above)
  facet=onfacet.field=cityidfq=stateid:somestateid
 
  or
 
  facet on city (countryselected, same functional analogy as above)
  facet=onfacet.field=cityidfq=countryid:somecountryid
 
  grab the resulting facat and drop it under Location
 
  pros:
  - reusing fq's (good performance, I've never used hierarchical facets,
 but
  would be surprised if it has a (major) speed increase to this method)
  - flexible (you get multiple hierarchies: country -- state -- city and
  country -- city)
 
  cons:
  - a little more application logic
 
  Hope that helps,
  Geert-Jan
 
 
 
 
 
  2010/3/2 Andy angelf...@yahoo.com
 
  I read that a simple way to implement hierarchical facet is to
 concatenate
  strings with a separator. Something like level1level2level3 with 
 as
  the separator.
 
  A problem with this approach is that the number of facet values will
  greatly increase.
 
  For example I have a facet Location with the hierarchy
  countrystatecity. Using the above approach every single city will lead
 to
  a separate facet value. With tens of thousands of cities in the world
 the
  response from Solr will be huge. And then on the client side I'd have to
  loop through all the facet values and combine those with the same
 country
  into a single value.
 
  Ideally Solr would be aware of the hierarchy structure and send back
  responses accordingly. So at level 1 Solr will send back facet values
 based
  on country (100 or so values). Level 2 the facet values will be based on
 the
  states within the selected country (a few dozen values). Next level will
 be
  cities within that state. and so on.
 
  Is it possible to implement hierarchical facet this way using Solr?
 
 
 
 
 
 
 







Re: error in sum function

2010-03-03 Thread Koji Sekiguchi

Can you try it latest trunk? I have just fixed it in a couple of days

Koji Sekiguchi from mobile


On 2010/03/03, at 18:18, JCodina joan.cod...@barcelonamedia.org wrote:



the sum function or the map one are not parsed correctly,
doing this sort, works as a charm...
sort=score+desc,sum(Num,map(Num,0,2000,42000))+asc
but

sort=score+desc,sum(map(Num,0,2000,42000),Num)+asc

gives the following exception

SEVERE: org.apache.solr.common.SolrException: Must declare sort  
field or

function
   at
org.apache.solr.search.QueryParsing.processSort(QueryParsing.java:376)
   at
org.apache.solr.search.QueryParsing.parseSort(QueryParsing.java:281)
   at org.apache.solr.search.QParser.getSort(QParser.java:217)
   at
org.apache.solr.handler.component.QueryComponent.prepare 
(QueryComponent.java:86)

   at
org.apache.solr.handler.component.SearchHandler.handleRequestBody 
(SearchHandler.java:174)

   at
org.apache.solr.handler.RequestHandlerBase.handleRequest 
(RequestHandlerBase.java:131)


you can test it in here using these two url's

http://varoitus.barcelonamedia.org:8180/fbm-ijec/select?indent=onfl=id,Num,scoresort=score%20asc,sum%28map%28Num,0,5000,42000%29,Num%29+ascq=+entities_org:%28%22Amena%22%29

http://varoitus.barcelonamedia.org:8180/fbm-ijec/select?indent=onwt=phpfl=id,Num,scorerows=50sort=score+desc,sum%28Num,map%28Num,0,2000,42000%29%29+ascq=+entities_org:Amena




--
View this message in context: 
http://old.nabble.com/error-in-sum-function-tp27765881p27765881.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr with Tika - Text ordering garbled.

2010-03-03 Thread Wick2804

We are loading PDF documents with OCR contentl ayer into Solr through Tika.
The load process appears to work fine and all of the words from the OCR
layer are stored as Text in Solr, and therfore searchable.

Our problem is that in the results returned from a search the words in the
'Text' field are not returned in the same order as those in the original OCR
content in the PDF. This means that the snippet does not accurately reflect
the original document content.

It appears that sections of text from the OCR are ordered randomly, so a
section from the bottom of the document appears alongside text from the top
of the dcument.

Additionally Tika strips out Carraige Return characters, but does not
replace then with anything so terms in separate paragraphs get joined
together.

Any help welcomed. 


-- 
View this message in context: 
http://old.nabble.com/Solr-with-Tika---Text-ordering-garbled.-tp27766815p27766815.html
Sent from the Solr - User mailing list archive at Nabble.com.



Error on startup

2010-03-03 Thread Lee Smith
Hi All.

I have shutdown solr removed the index so I can start over then re-launched.

I am getting an error of

SEVERE: REFCOUNT ERROR: unreferenced org.apache.solr.solrc...@14db38a4 (core1) 
has a reference count of 1

Any idea on what this is a result of ?

Hope you can advise.

Lee

Problems with variable geo_distance

2010-03-03 Thread Emad Mushtaq
Hi,

I am having a very strange problem, related to local solr. In my documents
there is a record for location called Gujranwala which is a city in
Pakistan. I try to get search results with respect to coordinates of
Lahore (another city of Pakistan). When I do a search within 100 miles,
there are no results. When I do a search of 200 miles, it gives me
Gujranwala in the end results. However the problem over here is that the
geo_distance it gives is 48.112120348665925. This result should had been in
the search within 100 miles since the geo_distance is 48.112.  Here is the
query that I was making:

http://localhost:8983/solr/select/?q=+title:*qt=geolat=31.4845long=74.3216radius=100
http://localhost:8983/solr/select/?q=+title:*qt=geolat=31.4845long=74.3216radius=200

The coordinates of Gujranwala is :
double name=latitude32.168652/double
double name=longitude74.173981/double

I would appreciate any help on this.

Thanks



-- 
Muhammad Emad Mushtaq
http://www.emadmushtaq.com/


[ANN] Carrot2 3.2.0 released

2010-03-03 Thread Stanislaw Osinski
Dear All,

I'm happy to announce three releases from the Carrot Search team: Carrot2
v3.2.0, Lingo3G v1.3.1 and Carrot Search Labs.


Carrot2 is an open source search results clustering engine. Version v3.2.0
introduces:

* experimental support for clustering Korean and Arabic content,
* a command-line batch processing application,
* significant updates to the Flash-based cluster visualization.

As of version 3.2.0, Carrot2 is free of LGPL-licensed dependencies.

Release notes:
http://project.carrot2.org/release-3.2.0-notes.html

Download:
http://project.carrot2.org/download.html



Lingo3G is a real-time document clustering engine from Carrot Search.
Version 1.3.1 introduces support for clustering Arabic, Danish, Finnish,
Hungarian, Korean, Romanian, Swedish and Turkish content, a command-line
application and a number of minor improvements. Please contact us at
i...@carrotsearch.com for details.



Carrot Search Labs shares some small pieces of software we created when
working on Carrot2 and Lingo3G. Please see http://labs.carrotsearch.com for
details and downloads.



Thanks!

Dawid Weiss, Stanislaw Osinski
Carrot Search, i...@carrot-search.com


Re-index after Solr config file changed without restarting services

2010-03-03 Thread Marc Wilson
Hi,

I am attempting to achieve what I believe many others have attempted in the 
past: allow an end user to modify a Solr config file through a custom UI and 
then roll out any changes made without restarting any services. Specifically, I 
want to be able to let the user edit the synonyms.txt file and after committing 
the changes, force Solr to re-index based on those changes without restarting 
Tomcat.

I have configured a Solr Master and Slave, each of which has a single core:


* http://master:8080/solr/core

* http://slave:8080/solr/core

The cores are defined in respective solr.xml files as:

solr persistent=true sharedLib=lib
 cores adminPath=/admin/cores
  core name=core instanceDir=core
property name=configDir value=../../conf/ /
  /core
 /cores
/solr

Replication has been configured in the Master solrconfig.xml as follows:

requestHandler name=/replication class=solr.ReplicationHandler 
lst name=master
str name=replicateAfterstartup/str
str name=replicateAftercommit/str
str name=snapshotstartup/str
str name=snapshotcommit/str
str 
name=confFilesschema.xml,${configDir}stopwords.txt,${configDir}elevate.xml,${configDir}synonyms.txt/str
/lst
/requestHandler

and the Slave solrconfig.xml as:

requestHandler name=/replication class=solr.ReplicationHandler 
lst name=slave
  str name=masterUrlhttp://master:8080/solr/core/replication/str
  str name=compressioninternal/str
  str name=httpConnTimeout5000/str
  str name=httpReadTimeout1/str
  str name=httpBasicAuthUserusername/str
  str name=httpBasicAuthPasswordpassword/str
  str name=pollInterval00:00:20/str
 /lst
/requestHandler

At service startup, replication works fine. However, when a change is made to 
the synonyms.txt file and 
http://master:8080/solr/admin/cores?action=RELOADcore=core is called neither 
the Master nor Slave are updated to reflect the modification. I am assuming 
that this is because in the Master schema.xml file the SynonymFilterFactory is 
being used at index time and the CoreAdmin RELOAD does not force a Solr 
re-index. If this is so, please can someone advise what the best methodology is 
to achieve what I am attempting? If not, please could someone let me know what 
I'm doing wrong?!

Thanks,

Marc


Re: Logging in Embedded SolrServer - What a nightmare.

2010-03-03 Thread Lucas F. A. Teixeira
Hello Kevin,

No, haven't worked.
I tried a lot of combinations between the jars of log4j, lsf4j and
log4j-slf4j and got no success.

As I said, for the solr.war, this you said seems to work, the same way I got
it working confiuring jre/lib/logging.properties, but not with embedded
server...

Anyone can please help me?

[]s,


Lucas Frare Teixeira .·.
- lucas...@gmail.com
- lucastex.com.br
- blog.lucastex.com
- twitter.com/lucastex


On Tue, Mar 2, 2010 at 6:36 PM, Kevin Osborn osbo...@yahoo.com wrote:

 Not sure if it will solve your specific problem. We use Solr as a WAR as
 well as Solrj.  So the main solr distribution comes with
 slf4j-jdk-1.5.5.jar. I just deleted that and replaced it with
 slf4j-log4j12-1.5.5.jar. And then it used my existing log4j.properties file.




 
 From: Lucas F. A. Teixeira lucas...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tue, March 2, 2010 11:14:26 AM
 Subject: Logging in Embedded SolrServer - What a nightmare.

 Hello all,

 I'm having a hard time trying to change Solr queries logging level.
 I've tried a lot of things I've found in the internet, this mailing list
 and
 solr docs.

 What I've found so far:

 - Solr Embedded Server uses sfl4j lib for intermediating logging. Here I'm
 using Log4j as my logging framework.
 - Changing the .../jre/lib/logging.properties worked, but only when
 querying
 using solr over http, and not on solr embedded.
 - A log4j.xml that I've added it is not being respected. (It is logging
 with
 a totally different layout and appenders)
 - I've searched for other log4j config files in the classpath, and found
 nothing...
 - Even tried to call Logger.getLogger(org.apache.solr) and then set its
 level manually inside the app, nothing changed...

 So, Embedded Solr Server keeps logging queries and other stuff in my
 stdout.

 Most docs and guides I've found in the internet is talking about solr http,
 this is ok for me, with http I got everything working, but not with solr
 embedded.
 Have anyone achieved this with embedded?

 Thanks a lot ppl,

 []s,


 Lucas Frare Teixeira .·.
 - lucas...@gmail.com
 - lucastex.com.br
 - blog.lucastex.com
 - twitter.com/lucastex







Re: Clustering from anlayzed text instead of raw input

2010-03-03 Thread Stanislaw Osinski
Hi Joan,

I'm trying to use  carrot2 (now I started with the workbench) and I can
 cluster any field, but, the text used for clustering is the original raw
 text, the one that was indexed, without any of the processing performed by
 the tokenizer or filters.
 So I get stop words.


The easiest way to fix this is to update the stop words list used by
Carrot2, see http://wiki.apache.org/solr/ClusteringComponent, Tuning
Carrot2 clustering section at the bottom. If you want to get readable
cluster labels, it's best to feed the raw text for clustering (cluster
labels are phrases taken from the input text, if you remove stopwords and
stem everything, the phrases will become unreadable).

Cheers,

Staszek


need help with Solr Cores

2010-03-03 Thread muneeb

Hi Everyone,

I am new to Solr, and still trying to get my hands on it.
I have indexed over 6 million documents and currently have a single large
index. I update my index using SolrJ client due to the format I store my
documents (i.e. JSON blobs) in database.

I need to find a way to have multiple indexes for one solr instance. One for
ongoing query search and one for updating index with new documents/schema
etc. the idea is to switch between indexes while one is being updated. So
that users could still search my index.

I know Solr supports multiple cores and I have read wiki pages plus mailing
lists on this, which help alot. However I am still confused with the need of
having two separate indexes. I have solr.xml in solr.home dir with two dirs
for each core and each core has conf folder copied from standard solr.home
folder. 

Do i need data folder in each core's directory? and Should i copy/paste the
index folder in each core's directory?

Thanks for your help in advance!!


-- 
View this message in context: 
http://old.nabble.com/need-help-with-Solr-Cores-tp27767694p27767694.html
Sent from the Solr - User mailing list archive at Nabble.com.



Best performance for facet dates in trunk using solr.TrieDateField

2010-03-03 Thread Marc Sturlese

Hey there,
I am testing date facets in trunk with huge index. Aparently, as the default
solrconfig.xml shows, the fastest way to run dace facets queries is index
the field with this data type:

!-- A Trie based date field for faster date range queries and date
faceting. --
fieldType name=tdate class=solr.TrieDateField omitNorms=true
precisionStep=6 positionIncrementGap=0/

I am wandering... setting precisionStep=8 to the TriedateField would
improve even more the speed of the queries??
When using the TrieDateField doest still make sense to use the date rounds?
For example:

  str name=facet.datedate/str
  str name=facet.date.start2006-06-01T00:00:00Z/MONTH/str
  str name=facet.date.end2010-01-30T23:59:59Z/MONTH/str
  str name=facet.date.gap+1MONTH/str

Thanks in advance
-- 
View this message in context: 
http://old.nabble.com/Best-performance-for-facet-dates-in-trunk-using-solr.TrieDateField-tp27767793p27767793.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: 2 Cores, 1 Table, 2 DataImporter -- Import at the same time ?

2010-03-03 Thread Erik Hatcher

what's the error you're getting?

is DIH keeping some static that prevents it from running across two  
cores separately?  if so, that'd be a bug.


Erik

On Mar 3, 2010, at 4:12 AM, stocki wrote:



pleeease help me somebody =( :P




stocki wrote:


Hello again ;)

i install tomcat5.5 on my debian server ...

i use 2 cores and two different DIH with seperatet Index, one for the
normal search-feature and the other core for the suggest-feature.

but i cannot start both DIH with an import command at the same  
time. how

it this possible ?


thx



--
View this message in context: 
http://old.nabble.com/2-Cores%2C-1-Table%2C-2-DataImporter---%3E-Import-at-the-same-time---tp27756255p27765825.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: error in sum function

2010-03-03 Thread JCodina


Ok, solved!!!

Joan

Koji Sekiguchi-2 wrote:
 
 Can you try it latest trunk? I have just fixed it in a couple of days
 
 Koji Sekiguchi from mobile
 
 
 On 2010/03/03, at 18:18, JCodina joan.cod...@barcelonamedia.org wrote:
 

 the sum function or the map one are not parsed correctly,
 doing this sort, works as a charm...
 sort=score+desc,sum(Num,map(Num,0,2000,42000))+asc
 but

 sort=score+desc,sum(map(Num,0,2000,42000),Num)+asc

 gives the following exception

 SEVERE: org.apache.solr.common.SolrException: Must declare sort  
 field or
 function
at
 org.apache.solr.search.QueryParsing.processSort(QueryParsing.java:376)
at
 org.apache.solr.search.QueryParsing.parseSort(QueryParsing.java:281)
at org.apache.solr.search.QParser.getSort(QParser.java:217)
at
 org.apache.solr.handler.component.QueryComponent.prepare 
 (QueryComponent.java:86)
at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody 
 (SearchHandler.java:174)
at
 org.apache.solr.handler.RequestHandlerBase.handleRequest 
 (RequestHandlerBase.java:131)

 you can test it in here using these two url's

 http://varoitus.barcelonamedia.org:8180/fbm-ijec/select?indent=onfl=id,Num,scoresort=score%20asc,sum%28map%28Num,0,5000,42000%29,Num%29+ascq=+entities_org:%28%22Amena%22%29

 http://varoitus.barcelonamedia.org:8180/fbm-ijec/select?indent=onwt=phpfl=id,Num,scorerows=50sort=score+desc,sum%28Num,map%28Num,0,2000,42000%29%29+ascq=+entities_org:Amena




 -- 
 View this message in context:
 http://old.nabble.com/error-in-sum-function-tp27765881p27765881.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 

-- 
View this message in context: 
http://old.nabble.com/error-in-sum-function-tp27765881p27768877.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Issue on stopword list

2010-03-03 Thread Suram



Joe Calderon-2 wrote:
 
 or you can try the commongrams filter that combines tokens next to a
 stopword
 
 On Tue, Mar 2, 2010 at 6:56 AM, Walter Underwood wun...@wunderwood.org
 wrote:
 Don't remove stopwords if you want to search on them. --wunder

 On Mar 2, 2010, at 5:43 AM, Erick Erickson wrote:

 This is a classic problem with Stopword removal. Have you tried
 just removing stopwords from the indexing definition and the
 query definition and reindexing?

 You can't search on them no matter what you do if they've
 been removed, they just aren't there

 HTH
 Erick

 On Tue, Mar 2, 2010 at 5:47 AM, Suram reactive...@yahoo.com wrote:


 Hi,

 How can i search using stopword my query like this

 This             - 0 results becuase it is a stopword
 is                 - 0 results becuase it is a stopword
 that             - 0 results becuase it is a stopword

 if i search like  This is that - it must give the result

 for that i need to change anything in my schema file to get result
 This is
 that
 --
 View this message in context:
 http://old.nabble.com/Issue-on-stopword-list-tp27754434p27754434.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 
 


I tried commongrams also but won't worked . here search this is it .i
would like to get exact information not for this is,is or it.

my document like

field name=id101/field
field name=nameThis Is It/field
field name=manuApache Software Foundation/field
field name=catsoftware/field
field name=catsearch/field

Here my schema

http://old.nabble.com/file/p27768959/schema.xml schema.xml 


and i set the specific field for searchable like Name ,manu, catwhen i index
it not found search.
-- 
View this message in context: 
http://old.nabble.com/Issue-on-stopword-list-tp27754434p27768959.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: 2 Cores, 1 Table, 2 DataImporter -- Import at the same time ?

2010-03-03 Thread stocki


okay i change the lockType to single but with no good effect.

so i think now, that my two DIH are using the same data-Folder. why ist it
so ? i thought that each DIH use his own index ... ?!

i think it is not possible to import from one table parallel with more than
one DIH`s ?!


myexception: 

java.io.FileNotFoundException:
/var/lib/tomcat5.5/temp/solr/data/index/_5d.fnm (No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccessFile.java:212)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.init(SimpleFSDirectory.java:78)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.init(SimpleFSDirectory.java:108)
at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.init(NIOFSDirectory.java:94)
at
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:70)
at
org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:691)
at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:68)
at
org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:116)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608)
at
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:686)
at
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:662)
at
org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:954)
at
org.apache.lucene.index.IndexWriter.applyDeletes(IndexWriter.java:5190)
at
org.apache.lucene.index.IndexWriter.doFlushInternal(IndexWriter.java:4354)
at
org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:4192)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:4183)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2647)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2601)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:392)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)





Erik Hatcher-4 wrote:
 
 what's the error you're getting?
 
 is DIH keeping some static that prevents it from running across two  
 cores separately?  if so, that'd be a bug.
 
   Erik
 
 On Mar 3, 2010, at 4:12 AM, stocki wrote:
 

 pleeease help me somebody =( :P




 stocki wrote:

 Hello again ;)

 i install tomcat5.5 on my debian server ...

 i use 2 cores and two different DIH with seperatet Index, one for the
 normal search-feature and the other core for the suggest-feature.

 but i cannot start both DIH with an import command at the same  
 time. how
 it this possible ?


 thx


 -- 
 View this message in context:
 http://old.nabble.com/2-Cores%2C-1-Table%2C-2-DataImporter---%3E-Import-at-the-same-time---tp27756255p27765825.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 
 

-- 
View this message in context: 
http://old.nabble.com/SEVERE%3A-SolrIndexWriter-was-not-closed-prior-to-finalize%28%29%2C-indicates-a-bugPOSSIBLE-RESOURCE-LEAK%21%21%21-tp27756255p27768997.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Clustering from anlayzed text instead of raw input

2010-03-03 Thread JCodina

Thanks Staszek
 I'll give a try to stopwords treatbment, but the problem is that we perform
POS tagging and then use payloads to keep only Nouns and Adjectives, and we
thought that could be interesting to perform clustering only with these
elements, to avoid senseless words.

Of course is a problem of clustering, but maybe is also a feature that could
be interesting to have in solr: not to index the raw input text but the
analyzed one, so stored could be False | Raw | analyzed


Stanislaw Osinski-2 wrote:
 
 Hi Joan,
 
 I'm trying to use  carrot2 (now I started with the workbench) and I can
 cluster any field, but, the text used for clustering is the original raw
 text, the one that was indexed, without any of the processing performed
 by
 the tokenizer or filters.
 So I get stop words.

 
 The easiest way to fix this is to update the stop words list used by
 Carrot2, see http://wiki.apache.org/solr/ClusteringComponent, Tuning
 Carrot2 clustering section at the bottom.
 
  If you want to get readable
 cluster labels, it's best to feed the raw text for clustering (cluster
 labels are phrases taken from the input text, if you remove stopwords and
 stem everything, the phrases will become unreadable).
 
 Cheers,
 
 Staszek
 
 

-- 
View this message in context: 
http://old.nabble.com/Clustering-from-anlayzed-text-instead-of-raw-input-tp27765780p27769034.html
Sent from the Solr - User mailing list archive at Nabble.com.



Can I used .XML files instead of .OSM files

2010-03-03 Thread mamathahl

I'm very new to Solr.  I downloaded apache-solr-1.5-dev and was trying out
the example in order to first figure out how Solr is working.  I found out
that the data directory consisted of .OSM files.  But I have an XML file
consisting of latitude, longitude and relevant news for that location.  Can
I just use the XML file to index the data or is it necessary for me to
convert this file to .OSM file using some tool and then proceed further?
Also the attribute value from the .OSM file is being considered in that
example.  Since there are no attributes for the tags in my XML file, how can
I extract only the contents of my tags?Any help in this direction will be
appreciated.  Thanks in advance.
-- 
View this message in context: 
http://old.nabble.com/Can-I-used-.XML-files-instead-of-.OSM-files-tp27769082p27769082.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: need help with Solr Cores

2010-03-03 Thread muneeb

Figured it out !!

I actually created two folders in solr.home/data folder, each holding the
index for a given core. So for core0 and core1 i had indexes as:

solr.home/data/core0/index
solr.home/data/core1/index

Feeling a little stupid now, having figured out a simple issue :s



muneeb wrote:
 
 Hi Everyone,
 
 I am new to Solr, and still trying to get my hands on it.
 I have indexed over 6 million documents and currently have a single large
 index. I update my index using SolrJ client due to the format I store my
 documents (i.e. JSON blobs) in database.
 
 I need to find a way to have multiple indexes for one solr instance. One
 for ongoing query search and one for updating index with new
 documents/schema etc. the idea is to switch between indexes while one is
 being updated. So that users could still search my index.
 
 I know Solr supports multiple cores and I have read wiki pages plus
 mailing lists on this, which help alot. However I am still confused with
 the need of having two separate indexes. I have solr.xml in solr.home dir
 with two dirs for each core and each core has conf folder copied from
 standard solr.home folder. 
 
 Do i need data folder in each core's directory? and Should i copy/paste
 the index folder in each core's directory?
 
 Thanks for your help in advance!!
 
 
 

-- 
View this message in context: 
http://old.nabble.com/need-help-with-Solr-Cores-tp27767694p27769171.html
Sent from the Solr - User mailing list archive at Nabble.com.



How to see the query generated by MoreLikeThisHandler?

2010-03-03 Thread Christopher Bottaro
Hello,

Is there a way to see exactly what query is generated by the
MoreLikeThisHandler?  If I send debugQuery=true then I see in the
response a key called parsedquery but it doesn't seem quite right.

What I mean by that is when I make the MoreLikeThis query, I set
mlt.fl to title,content but the query shown in parsedquery does
not query on title at all... only on content.  Furthermore, the
query looks something like this content:word1 content:word2
content:word3 but if I copy and paste that into a standard query,
nothing comes back because the default term operator is AND.

If I change that query to content:word1 OR content:word2 OR
content:word3, I get results but they are not the same as what the
MLT query returns.

Is there a way to see the generated query without actually running it?
 As of now, I'm making a MLT query with rows=0, but I think it's still
running the query because it takes a non trivial amount of time and it
also shows numFound in the response.

Thanks for the help,
-- Christopher


DisMaxRequestHandler questions about bf and bq

2010-03-03 Thread Christopher Bottaro
Hello,

I have a couple of questions regarding the bf and bq params to the
DisMaxRequestHandler.

1)  Can I specify them more than once?  Ex:
bf=log(popularity)bf=log(comment_count)

2)  When using bq, how can I specify what score to use for documents
not returned by the query?  In other words, how do I mimic this
behavior using bq:
bf=query($qq, 0.1)qq=site:news.yahoo.com


Thanks for the help!


Formatting Results

2010-03-03 Thread Lee Smith
Hey All

I am indexing around 10,000 documents with Solar Cell which has gone superb.

I can of course search the content like the example given:  
http://localhost:8983/solr/select?q=attr_content:tutorial

But what I would like is for Solr to return the document with x many words and 
the matched content highlighted. I suppose a allot like google does.

How can I achive such a result ?

I know I can use the highlighting but cant seem to get this to work.

Hope someone can put me on the right track.

Thank you

RE: DIH onError question

2010-03-03 Thread Shah, Nirmal
Thanks for your prompt reply.  I resolved the ERROR, and used continue to 
bypass any EXCEPTIONS.

Nirmal Shah
Remedy Consultant|Column Technologies|Cell: (630) 244-1648


-Original Message-
From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.p...@gmail.com] 
Sent: Tuesday, March 02, 2010 11:13 PM
To: solr-user@lucene.apache.org
Subject: Re: DIH onError question

onError only handles Exception (not Error or Throwable). I your case
it is a NoClassDefFoundError . If it is an Error or Throwable it is a
symptom of a larger problem. If you fix the NoClassDefFoundError it
should be ok

On Wed, Mar 3, 2010 at 10:06 AM, Shah, Nirmal ns...@columnit.com wrote:
 Hi all,

 I am using Solr 1.5 from trunk.  I am getting the below error on a full
 load, and it is causing the import to fail and rollback.  I am not
 concerned about the error but rather that I cannot seem to tell the
 indexing to continue.  I have two entities, and I have tried all (4)
 combinations of skip and continue for their onError attributes.

 SEVERE: Exception while processing: f document : null
 org.apache.solr.handler.dataimport.DataImportHandlerException:
 java.lang.NoClassDefFoundError:
 org/bouncycastle/jce/provider/BouncyCastleProvider
        at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
 ava:652)
        at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
 ava:606)
        at
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
 :261)
        at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
 5)
        at
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
 r.java:333)
        at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
 :391)
        at
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
 372)
 Caused by: java.lang.NoClassDefFoundError:
 org/bouncycastle/jce/provider/BouncyCastleProvider
        at
 org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1108
 )
        at
 org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573)
        at
 org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:23
 5)
        at
 org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
        at
 org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
        at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69)
        at
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
        at
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
        at
 org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
 yProcessor.java:124)
        at
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
 ProcessorWrapper.java:233)
        at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
 ava:580)
        ... 6 more
 Mar 2, 2010 10:21:05 PM org.apache.solr.handler.dataimport.DataImporter
 doFullImport
 SEVERE: Full Import failed
 org.apache.solr.handler.dataimport.DataImportHandlerException:
 java.lang.NoClassDefFoundError:
 org/bouncycastle/jce/provider/BouncyCastleProvider
        at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
 ava:652)
        at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
 ava:606)
        at
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
 :261)
        at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
 5)
        at
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
 r.java:333)
        at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
 :391)
        at
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
 372)
 Caused by: java.lang.NoClassDefFoundError:
 org/bouncycastle/jce/provider/BouncyCastleProvider
        at
 org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1108
 )
        at
 org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573)
        at
 org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:23
 5)
        at
 org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
        at
 org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
        at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69)
        at
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
        at
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
        at
 org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
 yProcessor.java:124)
        at
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
 ProcessorWrapper.java:233)
        at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
 ava:580)
        ... 6 more
 Mar 2, 2010 10:21:05 PM 

Re: Formatting Results

2010-03-03 Thread Marc Sturlese

I'll give you an example about how to configure your default SearchHandler to
do highlighting but I strongly recomend you to check properly the wiki.
Everything is really well explained in there:
http://wiki.apache.org/solr/HighlightingParameters

   str name=hltrue/str
   str name=hl.flattr_content/str
   str name=f.attr_content.hl.fragsize200/str
   str name=f.attr_content.hl.snippets1/str
   str name=f.attr_content.hl.alternateFieldf.attr_content/str
   str name=f.attr_content.hl.maxAlternateFieldLength300/str




Lee Smith-6 wrote:
 
 Hey All
 
 I am indexing around 10,000 documents with Solar Cell which has gone
 superb.
 
 I can of course search the content like the example given: 
 http://localhost:8983/solr/select?q=attr_content:tutorial
 
 But what I would like is for Solr to return the document with x many words
 and the matched content highlighted. I suppose a allot like google does.
 
 How can I achive such a result ?
 
 I know I can use the highlighting but cant seem to get this to work.
 
 Hope someone can put me on the right track.
 
 Thank you
 

-- 
View this message in context: 
http://old.nabble.com/Formatting-Results-tp27771256p27772151.html
Sent from the Solr - User mailing list archive at Nabble.com.



SOLR Index or database

2010-03-03 Thread caman

Hello All, 

Just struggling with a thought where SOLR or a database would be good option
for me.Here are my requirements.
We index about 600+ news/blogs into out system. Only information we store
locally is the title,link and article snippet.We are able to index all these
sources into SOLR index and it works perfectly.
This is where is gets tricky: 
We need to store certain meta information as well. e.g.
1. Rating/popularity of article
2. Sharing of the articles between users
3. How may times articles is viewed.
4. Comments on each article.

So far, we are deciding to store meta-information in the database and link
this data with the a document in the index. When user opens the page,
results are combined from index and the database to render the view. 

Any reservation on using the above architecture? 
Is SOLR right fit in this case? We do need full text search so SOLR is
no-brainer imho but would love to hear community view.

Any feedback appreciated

thanks




-- 
View this message in context: 
http://old.nabble.com/SOLR-Index-or-database-tp27772362p27772362.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Error on startup

2010-03-03 Thread Marc Sturlese

If you shut down the server propertly it's weird that you get an error when
starting up again.
How did you delete the index? I was experiencing something similar long time
ago because I was removing the content from the index folder but not the
folder itself. The correct way to do it was to remove the index folder and
start up the server again (Solr creates the index folder if not present). I
don't know if this has currently changed



Lee Smith-6 wrote:
 
 Hi All.
 
 I have shutdown solr removed the index so I can start over then
 re-launched.
 
 I am getting an error of
 
 SEVERE: REFCOUNT ERROR: unreferenced org.apache.solr.solrc...@14db38a4
 (core1) has a reference count of 1
 
 Any idea on what this is a result of ?
 
 Hope you can advise.
 
 Lee
 

-- 
View this message in context: 
http://old.nabble.com/Error-on-startup-tp27767018p27772394.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Formatting Results

2010-03-03 Thread Lee Smith
Thanks Mark

Ill have a good look at that part now. And I managed to get it started again 
:-).

Thank you again

Lee

On 3 Mar 2010, at 18:52, Marc Sturlese wrote:

 
 I'll give you an example about how to configure your default SearchHandler to
 do highlighting but I strongly recomend you to check properly the wiki.
 Everything is really well explained in there:
 http://wiki.apache.org/solr/HighlightingParameters
 
   str name=hltrue/str
   str name=hl.flattr_content/str
   str name=f.attr_content.hl.fragsize200/str
   str name=f.attr_content.hl.snippets1/str
   str name=f.attr_content.hl.alternateFieldf.attr_content/str
   str name=f.attr_content.hl.maxAlternateFieldLength300/str
 
 
 
 
 Lee Smith-6 wrote:
 
 Hey All
 
 I am indexing around 10,000 documents with Solar Cell which has gone
 superb.
 
 I can of course search the content like the example given: 
 http://localhost:8983/solr/select?q=attr_content:tutorial
 
 But what I would like is for Solr to return the document with x many words
 and the matched content highlighted. I suppose a allot like google does.
 
 How can I achive such a result ?
 
 I know I can use the highlighting but cant seem to get this to work.
 
 Hope someone can put me on the right track.
 
 Thank you
 
 
 -- 
 View this message in context: 
 http://old.nabble.com/Formatting-Results-tp27771256p27772151.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: SOLR Index or database

2010-03-03 Thread Walter Underwood
You need two, maybe three things that Solr doesn't do (or doesn't do well):

* field updating
* storing content
* real time search and/or simple transactions

I would seriously look at Mark Logic for that. It does all of those, plus 
full-text search, gracefully, plus it scales. There is also a version for 
Amazon EC2.  www.marklogic.com

Note: I work at Mark Logic, but I chose Solr for Netflix when I worked there.

wunder

On Mar 3, 2010, at 11:08 AM, caman wrote:

 
 Hello All, 
 
 Just struggling with a thought where SOLR or a database would be good option
 for me.Here are my requirements.
 We index about 600+ news/blogs into out system. Only information we store
 locally is the title,link and article snippet.We are able to index all these
 sources into SOLR index and it works perfectly.
 This is where is gets tricky: 
 We need to store certain meta information as well. e.g.
 1. Rating/popularity of article
 2. Sharing of the articles between users
 3. How may times articles is viewed.
 4. Comments on each article.
 
 So far, we are deciding to store meta-information in the database and link
 this data with the a document in the index. When user opens the page,
 results are combined from index and the database to render the view. 
 
 Any reservation on using the above architecture? 
 Is SOLR right fit in this case? We do need full text search so SOLR is
 no-brainer imho but would love to hear community view.
 
 Any feedback appreciated
 
 thanks



Re: Can I used .XML files instead of .OSM files

2010-03-03 Thread Marc Sturlese


Are you sure you don't have a folder called exampledocs with xml files
inside? These are the files to index as a first example:
apache-solr-1.5-dev/example/exampledocs
Check the
/home/marc/Desktop/data/apache-solr-1.5-dev/example/solr/conf/schema.xml and
solrconfig.xml and you will see how to configure them to be able to have
your data indexed


mamathahl wrote:
 
 I'm very new to Solr.  I downloaded apache-solr-1.5-dev and was trying out
 the example in order to first figure out how Solr is working.  I found out
 that the data directory consisted of .OSM files.  But I have an XML file
 consisting of latitude, longitude and relevant news for that location. 
 Can I just use the XML file to index the data or is it necessary for me to
 convert this file to .OSM file using some tool and then proceed further?
 Also the attribute value from the .OSM file is being considered in that
 example.  Since there are no attributes for the tags in my XML file, how
 can I extract only the contents of my tags?Any help in this direction will
 be appreciated.  Thanks in advance.
 

-- 
View this message in context: 
http://old.nabble.com/Can-I-used-.XML-files-instead-of-.OSM-files-tp27769082p27772507.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Need suggestion regarding custom transformer

2010-03-03 Thread Marc Sturlese

I think you can handle that writing a custom transformer. There's a good
explanation in the wiki:
http://wiki.apache.org/solr/DIHCustomTransformer



KshamaPai wrote:
 
 Hi,
 Am new to solr.
 I am trying location aware search with spatial lucene in solr1.5 nightly
 build.
 My table in mysql has just lat,lng and some text .I want to add geohash,
 lat_rad(lat in radian) and lng_rad field into the document before
 indexing. I have used dataimport to get my table to solr.
 I have to use GeohashUtils.Encode() to get geohash from corresponding
 lat,lng of each row;
 and *ToRads function to get lat in radians.
 
 Can i use custom transformers so that after retreiving each row , add
 these fields and then index while using dataimport?
 Or do i have to do data migration to xml and then do changes required
 before indexing?
 
 Thanks in advance.
 
 
 
 

-- 
View this message in context: 
http://old.nabble.com/Need-suggestion-regarding-custom-transformer-tp27763576p27772561.html
Sent from the Solr - User mailing list archive at Nabble.com.



Randomize MoreLikeThis

2010-03-03 Thread André Maldonado
Hello.

I'm implementing More Like This functionality in my search request.
Everything works fine, but I need to randomize the return of this more like
this query. Something like this:

*First request:*
Query - docId:528369
Results - fields ... More like This result name=528369 numFound=57162
start=0docstr name=docid1/str/docdocstr
name=docid2/str/doc

*Second request:* (same query, other resultset for more like this)
Query - docId:528369
Results - fields ... More like This result name=528369 numFound=57162
start=0docstr name=docid3/str/docdocstr
name=docid4/str/doc

There is a way to do it?

Thank's

Então aproximaram-se os que estavam no barco, e adoraram-no, dizendo: És
verdadeiramente o Filho de Deus. (Mateus 14:33)


Re: DisMaxRequestHandler questions about bf and bq

2010-03-03 Thread Erik Hatcher


On Mar 3, 2010, at 12:26 PM, Christopher Bottaro wrote:

I have a couple of questions regarding the bf and bq params to the
DisMaxRequestHandler.

1)  Can I specify them more than once?  Ex:
bf=log(popularity)bf=log(comment_count)


Yes, you can use multiple bf parameters, each adding an optional  
clause to the actual query executed.



2)  When using bq, how can I specify what score to use for documents
not returned by the query?  In other words, how do I mimic this
behavior using bq:
bf=query($qq, 0.1)qq=site:news.yahoo.com


Why bother with bq in this situation?

But I believe you could use bq={!func}query($qq, 
0.1)qq=site:news.yahoo.com


Weird issue with solr and jconsole/jmx

2010-03-03 Thread Andrew Greenburg
Hi,

I connected to one of my solr instances with Jconsole today and
noticed that most of the mbeans under the solr hierarchy are missing.
The only thing there was a Searcher, which I had no trouble seeing
attributes for, but the rest of the statistics beans were missing.
They all show up just fine on the stats.jsp page.

In the past this always worked fine. I did have the core reload due to
config file changes this morning. Could that have caused this?


Escaping options for tika/solr cell extract-only output

2010-03-03 Thread Dan Hertz (Insight 49, LLC)

Looking at http://wiki.apache.org/solr/ExtractingRequestHandler:

Extract Only
the output includes XML generated by Tika (and is hence further escaped 
by Solr's XML)


...is there an option to NOT have the resulting TIKA output escaped?

so lt;headgt; would come back as head/

If no, what would need to be done to enable this option? Looked into 
SOLR-1274.patch, but didn't see a parameter for such a thing.


Thanks,

Dan


Lucene: Finite-State Queries, Flexible Indexing, Scoring, and more

2010-03-03 Thread Otis Gospodnetic
Hello folks,

Those of you in or near New York and using Lucene or Solr should come to 
Lucene: Finite-State Queries, Flexible Indexing, Scoring, and more on March 
24th:

http://www.meetup.com/NYC-Search-and-Discovery/calendar/12720960/


The presenter will be the hyper active Lucene committer Robert Muir.

Please spread the word.

Otis
--
Lucene ecosystem search :: http://search-lucene.com/



Re: Randomize MoreLikeThis

2010-03-03 Thread Otis Gospodnetic
The first thing that came to mind is to index a random number with each doc and 
sort by that.

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
 From: André Maldonado andre.maldon...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wed, March 3, 2010 2:50:01 PM
 Subject: Randomize MoreLikeThis
 
 Hello.
 
 I'm implementing More Like This functionality in my search request.
 Everything works fine, but I need to randomize the return of this more like
 this query. Something like this:
 
 *First request:*
 Query - docId:528369
 Results - fields ... More like This 
 start=01
 name=docid2
 
 *Second request:* (same query, other resultset for more like this)
 Query - docId:528369
 Results - fields ... More like This 
 start=03
 name=docid4
 
 There is a way to do it?
 
 Thank's
 
 Então aproximaram-se os que estavam no barco, e adoraram-no, dizendo: És
 verdadeiramente o Filho de Deus. (Mateus 14:33)



Re: Re-index after Solr config file changed without restarting services

2010-03-03 Thread Otis Gospodnetic
Marc,

At least for the force Solr to reindex part, I think you'll need to index 
yourself.  That is, you need to run whatever app you run when you (re)index the 
data normally.  Solr won't automagically reindex the data.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
 From: Marc Wilson wo...@fancydressoutfitters.co.uk
 To: Solr solr-user@lucene.apache.org
 Sent: Wed, March 3, 2010 6:51:17 AM
 Subject: Re-index after Solr config file changed without restarting services
 
 Hi,
 
 I am attempting to achieve what I believe many others have attempted in the 
 past: allow an end user to modify a Solr config file through a custom UI and 
 then roll out any changes made without restarting any services. Specifically, 
 I 
 want to be able to let the user edit the synonyms.txt file and after 
 committing 
 the changes, force Solr to re-index based on those changes without restarting 
 Tomcat.
 
 I have configured a Solr Master and Slave, each of which has a single core:
 
 
 *http://master:8080/solr/core 
 
 *http://slave:8080/solr/core 
 
 The cores are defined in respective solr.xml files as:
 
 
 
   
 
   
 
 
 
 Replication has been configured in the Master solrconfig.xml as follows:
 
 
 
 startup
 commit
 startup
 commit
 
 name=confFilesschema.xml,${configDir}stopwords.txt,${configDir}elevate.xml,${configDir}synonyms.txt
 
 
 
 and the Slave solrconfig.xml as:
 
 
 
   http://master:8080/solr/core/replication
   internal
   5000
   1
   username
   password
   00:00:20
 
 
 
 At service startup, replication works fine. However, when a change is made to 
 the synonyms.txt file and 
 http://master:8080/solr/admin/cores?action=RELOADcore=core is called neither 
 the Master nor Slave are updated to reflect the modification. I am assuming 
 that 
 this is because in the Master schema.xml file the SynonymFilterFactory is 
 being 
 used at index time and the CoreAdmin RELOAD does not force a Solr re-index. 
 If 
 this is so, please can someone advise what the best methodology is to achieve 
 what I am attempting? If not, please could someone let me know what I'm doing 
 wrong?!
 
 Thanks,
 
 Marc



Multi core Search is not working when used with SHARDS

2010-03-03 Thread JavaGuy84

Hi all,

I am trying to search on multiple cores (distributed search) but not able to
succeed using Shards.

I am able to get the results when I am hitting each core seperately,

http://localhost:8981/solr/core1/select/?q=test
http://localhost:8981/solr/core0/select/?q=test

but when I try to use distributed search using Shards as below

http://localhost:8981/solr/core0/select?shards=localhost:8981/solr/core0,localhost:8981/solr/core1indent=trueq=test

I am getting the below error,

HTTP ERROR: 500
null

java.lang.NullPointerException
at
org.apache.solr.handler.component.QueryComponent.createMainQuery(QueryComponent.java:372)
at
org.apache.solr.handler.component.QueryComponent.distributedProcess(QueryComponent.java:292)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:234)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

RequestURI=/solr/core0/select

Powered by Jetty://

Do I need to make any changes to make the Shards work?

Thanks,
Barani

-- 
View this message in context: 
http://old.nabble.com/Multi-core-Search-is-not-working-when-used-with-SHARDS-tp27772726p27772726.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr query parsing

2010-03-03 Thread Jason Rutherglen
Why would fq=sdate:+20100110 parse via a Solr server but not via
QueryParsing.parseQuery?  Its choking on the + symbol in the sdate
value.

I'd use QParserPlugin however it requires passing a SolrQueryRequest,
which is not kosher for testing, perhaps I'll need to bite the bullet
and reproduce using QPP with an SQR.


Re: Multi core Search is not working when used with SHARDS

2010-03-03 Thread Yonik Seeley
Hmmm, do you have a uniqueKey defined in your schemas?

-Yonik
http://www.lucidimagination.com



On Wed, Mar 3, 2010 at 4:23 PM, JavaGuy84 bbar...@gmail.com wrote:

 Hi all,

 I am trying to search on multiple cores (distributed search) but not able to
 succeed using Shards.

 I am able to get the results when I am hitting each core seperately,

 http://localhost:8981/solr/core1/select/?q=test
 http://localhost:8981/solr/core0/select/?q=test

 but when I try to use distributed search using Shards as below

 http://localhost:8981/solr/core0/select?shards=localhost:8981/solr/core0,localhost:8981/solr/core1indent=trueq=test

 I am getting the below error,

 HTTP ERROR: 500
 null

 java.lang.NullPointerException
        at
 org.apache.solr.handler.component.QueryComponent.createMainQuery(QueryComponent.java:372)
        at
 org.apache.solr.handler.component.QueryComponent.distributedProcess(QueryComponent.java:292)
        at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:234)
        at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
        at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341)
        at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
        at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
        at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
        at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
        at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
        at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
        at
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
        at
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
        at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
        at org.mortbay.jetty.Server.handle(Server.java:285)
        at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
        at
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
        at
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
        at
 org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

 RequestURI=/solr/core0/select

 Powered by Jetty://

 Do I need to make any changes to make the Shards work?

 Thanks,
 Barani

 --
 View this message in context: 
 http://old.nabble.com/Multi-core-Search-is-not-working-when-used-with-SHARDS-tp27772726p27772726.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Can Solr Create New Indexes?

2010-03-03 Thread Thomas Nguyen
Is there a setting in the config I can set to have Solr create a new
Lucene index if the dataDir is empty on startup?  I'd like to open our
Solr system to allow other developers here to add new cores without
having to use the Lucene API directly to create the indexes.



Re: Can Solr Create New Indexes?

2010-03-03 Thread Mark Miller

On 03/03/2010 07:56 PM, Thomas Nguyen wrote:

Is there a setting in the config I can set to have Solr create a new
Lucene index if the dataDir is empty on startup?  I'd like to open our
Solr system to allow other developers here to add new cores without
having to use the Lucene API directly to create the indexes.


   

You don't have to use the Lucene API though?

Solr creates the index if its not there ...

--
- Mark

http://www.lucidimagination.com





weighted search and index

2010-03-03 Thread Jianbin Dai
Hi,

I am trying to use solr for a content match application. 

A content is described by a set of keywords with weights associated, eg.,

C1: fruit 0.8, apple 0.4, banana 0.2
C2: music 0.9, pop song 0.6, Britney Spears 0.4

Those contents would be indexed in solr.
In the search, I also have a set of keywords with weights:

Query: Sports 0.8, golf 0.5

I am trying to find the closest matching contents for this query.

My question is how to index the contents with weighted scores, and how to
write search query. I was trying to use boosting, but seems not working
right.

Thanks.

Jianbin




RE: Can Solr Create New Indexes?

2010-03-03 Thread Thomas Nguyen
Hmm I've tried starting Solr with no Lucene index in the dataDir.
Here's the Exception I receive when starting Solr and when attempting to
add a document to the core:


2010-03-03 16:44:06,479 [main] ERROR
org.apache.solr.core.CoreContainer  -
java.lang.RuntimeException: java.io.FileNotFoundException: no segments*
file found in
org.apache.lucene.store.simplefsdirect...@c:\ign\test-solr\objectIndex\i
ndex: files:
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
at org.apache.solr.core.SolrCore.init(SolrCore.java:579)
at
org.apache.solr.core.CoreContainer.create(CoreContainer.java:428)
at
org.apache.solr.core.CoreContainer.load(CoreContainer.java:278)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.
java:117)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:
83)
at
org.mortbay.jetty.servlet.FilterHolder.start(FilterHolder.java:71)
at
org.mortbay.jetty.servlet.WebApplicationHandler.initializeServlets(WebAp
plicationHandler.java:310)
at
org.mortbay.jetty.servlet.WebApplicationContext.doStart(WebApplicationCo
ntext.java:509)
at
org.mortbay.jetty.plus.PlusWebAppContext.doStart(PlusWebAppContext.java:
149)
at org.mortbay.util.Container.start(Container.java:72)
at org.mortbay.http.HttpServer.doStart(HttpServer.java:708)
at org.mortbay.jetty.plus.Server.doStart(Server.java:153)
at org.mortbay.util.Container.start(Container.java:72)
at org.mortbay.jetty.plus.Server.main(Server.java:202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.mortbay.start.Main.invokeMain(Main.java:151)
at org.mortbay.start.Main.start(Main.java:476)
at org.mortbay.start.Main.main(Main.java:94)
Caused by: java.io.FileNotFoundException: no segments* file found in
org.apache.lucene.store.simplefsdirect...@c:\ign\test-solr\objectIndex\i
ndex: files:
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.j
ava:655)
at
org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69)
at
org.apache.lucene.index.IndexReader.open(IndexReader.java:476)
at
org.apache.lucene.index.IndexReader.open(IndexReader.java:403)
at
org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexR
eaderFactory.java:38)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1057)
... 21 more 

Before this point I've been using existing Lucene indexes (created by
the Lucene API) with Solr without a problem.


-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Wednesday, March 03, 2010 5:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Can Solr Create New Indexes?

On 03/03/2010 07:56 PM, Thomas Nguyen wrote:
 Is there a setting in the config I can set to have Solr create a new
 Lucene index if the dataDir is empty on startup?  I'd like to open our
 Solr system to allow other developers here to add new cores without
 having to use the Lucene API directly to create the indexes.



You don't have to use the Lucene API though?

Solr creates the index if its not there ...

-- 
- Mark

http://www.lucidimagination.com






Re: Can Solr Create New Indexes?

2010-03-03 Thread Mark Miller

I'm guessing the index folder itself already exists?

The data dir can be there, but the index dir itself must not be - that's 
how it knows to create a new one.
Otherwise it thinks the empty dir is the index and cant find the files 
it expects.


On 03/03/2010 08:15 PM, Thomas Nguyen wrote:

Hmm I've tried starting Solr with no Lucene index in the dataDir.
Here's the Exception I receive when starting Solr and when attempting to
add a document to the core:


2010-03-03 16:44:06,479 [main] ERROR
org.apache.solr.core.CoreContainer  -
java.lang.RuntimeException: java.io.FileNotFoundException: no segments*
file found in
org.apache.lucene.store.simplefsdirect...@c:\ign\test-solr\objectIndex\i
ndex: files:
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
at org.apache.solr.core.SolrCore.init(SolrCore.java:579)
at
org.apache.solr.core.CoreContainer.create(CoreContainer.java:428)
at
org.apache.solr.core.CoreContainer.load(CoreContainer.java:278)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.
java:117)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:
83)
at
org.mortbay.jetty.servlet.FilterHolder.start(FilterHolder.java:71)
at
org.mortbay.jetty.servlet.WebApplicationHandler.initializeServlets(WebAp
plicationHandler.java:310)
at
org.mortbay.jetty.servlet.WebApplicationContext.doStart(WebApplicationCo
ntext.java:509)
at
org.mortbay.jetty.plus.PlusWebAppContext.doStart(PlusWebAppContext.java:
149)
at org.mortbay.util.Container.start(Container.java:72)
at org.mortbay.http.HttpServer.doStart(HttpServer.java:708)
at org.mortbay.jetty.plus.Server.doStart(Server.java:153)
at org.mortbay.util.Container.start(Container.java:72)
at org.mortbay.jetty.plus.Server.main(Server.java:202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.mortbay.start.Main.invokeMain(Main.java:151)
at org.mortbay.start.Main.start(Main.java:476)
at org.mortbay.start.Main.main(Main.java:94)
Caused by: java.io.FileNotFoundException: no segments* file found in
org.apache.lucene.store.simplefsdirect...@c:\ign\test-solr\objectIndex\i
ndex: files:
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.j
ava:655)
at
org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69)
at
org.apache.lucene.index.IndexReader.open(IndexReader.java:476)
at
org.apache.lucene.index.IndexReader.open(IndexReader.java:403)
at
org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexR
eaderFactory.java:38)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1057)
... 21 more

Before this point I've been using existing Lucene indexes (created by
the Lucene API) with Solr without a problem.


-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: Wednesday, March 03, 2010 5:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Can Solr Create New Indexes?

On 03/03/2010 07:56 PM, Thomas Nguyen wrote:
   

Is there a setting in the config I can set to have Solr create a new
Lucene index if the dataDir is empty on startup?  I'd like to open our
Solr system to allow other developers here to add new cores without
having to use the Lucene API directly to create the indexes.



 

You don't have to use the Lucene API though?

Solr creates the index if its not there ...

   



--
- Mark

http://www.lucidimagination.com





RE: Can Solr Create New Indexes?

2010-03-03 Thread Thomas Nguyen
Ah that's the problem.  Not sure why it didn't come to mind to follow
the call stack.  Thanks for your help!

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Wednesday, March 03, 2010 5:20 PM
To: solr-user@lucene.apache.org
Subject: Re: Can Solr Create New Indexes?

I'm guessing the index folder itself already exists?

The data dir can be there, but the index dir itself must not be - that's

how it knows to create a new one.
Otherwise it thinks the empty dir is the index and cant find the files 
it expects.

On 03/03/2010 08:15 PM, Thomas Nguyen wrote:
 Hmm I've tried starting Solr with no Lucene index in the dataDir.
 Here's the Exception I receive when starting Solr and when attempting
to
 add a document to the core:


 2010-03-03 16:44:06,479 [main] ERROR
 org.apache.solr.core.CoreContainer  -
 java.lang.RuntimeException: java.io.FileNotFoundException: no
segments*
 file found in

org.apache.lucene.store.simplefsdirect...@c:\ign\test-solr\objectIndex\i
 ndex: files:
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
   at org.apache.solr.core.SolrCore.init(SolrCore.java:579)
   at
 org.apache.solr.core.CoreContainer.create(CoreContainer.java:428)
   at
 org.apache.solr.core.CoreContainer.load(CoreContainer.java:278)
   at

org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.
 java:117)
   at

org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:
 83)
   at
 org.mortbay.jetty.servlet.FilterHolder.start(FilterHolder.java:71)
   at

org.mortbay.jetty.servlet.WebApplicationHandler.initializeServlets(WebAp
 plicationHandler.java:310)
   at

org.mortbay.jetty.servlet.WebApplicationContext.doStart(WebApplicationCo
 ntext.java:509)
   at

org.mortbay.jetty.plus.PlusWebAppContext.doStart(PlusWebAppContext.java:
 149)
   at org.mortbay.util.Container.start(Container.java:72)
   at org.mortbay.http.HttpServer.doStart(HttpServer.java:708)
   at org.mortbay.jetty.plus.Server.doStart(Server.java:153)
   at org.mortbay.util.Container.start(Container.java:72)
   at org.mortbay.jetty.plus.Server.main(Server.java:202)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
 a:39)
   at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
 Impl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.mortbay.start.Main.invokeMain(Main.java:151)
   at org.mortbay.start.Main.start(Main.java:476)
   at org.mortbay.start.Main.main(Main.java:94)
 Caused by: java.io.FileNotFoundException: no segments* file found in

org.apache.lucene.store.simplefsdirect...@c:\ign\test-solr\objectIndex\i
 ndex: files:
   at

org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.j
 ava:655)
   at
 org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69)
   at
 org.apache.lucene.index.IndexReader.open(IndexReader.java:476)
   at
 org.apache.lucene.index.IndexReader.open(IndexReader.java:403)
   at

org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexR
 eaderFactory.java:38)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1057)
   ... 21 more

 Before this point I've been using existing Lucene indexes (created by
 the Lucene API) with Solr without a problem.


 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com]
 Sent: Wednesday, March 03, 2010 5:00 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Can Solr Create New Indexes?

 On 03/03/2010 07:56 PM, Thomas Nguyen wrote:

 Is there a setting in the config I can set to have Solr create a new
 Lucene index if the dataDir is empty on startup?  I'd like to open
our
 Solr system to allow other developers here to add new cores without
 having to use the Lucene API directly to create the indexes.



  
 You don't have to use the Lucene API though?

 Solr creates the index if its not there ...




-- 
- Mark

http://www.lucidimagination.com






Re: weighted search and index

2010-03-03 Thread Erick Erickson
You have to provide some more details to get meaningful help.

You say I was trying to use boosting. How? At index time?
Search time? Both? Can you provide some code snippets?
What does your schema look like for the relevant field(s)?

You say but seems not working right. What does that mean? No hits?
Hits not ordered as you expect? Have you tried putting debugQuery=on on
your URL and examined the return values?

Have you looked at your index with the admin page and/or Luke to see if
the data in the index is as you expect?

As far as I know, boosts are multiplicative. So boosting by a value less
than
1 will actually decrease the ranking. But see the Lucene scoring, See:
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html

And remember, that boosting will *tend* to move a hit up or down in the
ranking, not position it absolutely.

HTH
Erick

On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote:

 Hi,

 I am trying to use solr for a content match application.

 A content is described by a set of keywords with weights associated, eg.,

 C1: fruit 0.8, apple 0.4, banana 0.2
 C2: music 0.9, pop song 0.6, Britney Spears 0.4

 Those contents would be indexed in solr.
 In the search, I also have a set of keywords with weights:

 Query: Sports 0.8, golf 0.5

 I am trying to find the closest matching contents for this query.

 My question is how to index the contents with weighted scores, and how to
 write search query. I was trying to use boosting, but seems not working
 right.

 Thanks.

 Jianbin





RE: weighted search and index

2010-03-03 Thread Jianbin Dai
Thank you very much Erick!

1. I used boost in search, but I don't know exactly what's the best way to
boost, for such as Sports 0.8, golf 0.5 in my example, would it be
sports^0.8 AND golf^0.5 ?


2. I cannot use boost in indexing. Because the weight of the value changes,
not the field, look at this example again,

C1: fruit 0.8, apple 0.4, banana 0.2
C2: music 0.9, pop song 0.6, Britney Spears 0.4

There is no good way to boost it during indexing.

Thanks.

JB


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, March 03, 2010 5:45 PM
To: solr-user@lucene.apache.org
Subject: Re: weighted search and index

You have to provide some more details to get meaningful help.

You say I was trying to use boosting. How? At index time?
Search time? Both? Can you provide some code snippets?
What does your schema look like for the relevant field(s)?

You say but seems not working right. What does that mean? No hits?
Hits not ordered as you expect? Have you tried putting debugQuery=on on
your URL and examined the return values?

Have you looked at your index with the admin page and/or Luke to see if
the data in the index is as you expect?

As far as I know, boosts are multiplicative. So boosting by a value less
than
1 will actually decrease the ranking. But see the Lucene scoring, See:
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.
html

And remember, that boosting will *tend* to move a hit up or down in the
ranking, not position it absolutely.

HTH
Erick

On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote:

 Hi,

 I am trying to use solr for a content match application.

 A content is described by a set of keywords with weights associated, eg.,

 C1: fruit 0.8, apple 0.4, banana 0.2
 C2: music 0.9, pop song 0.6, Britney Spears 0.4

 Those contents would be indexed in solr.
 In the search, I also have a set of keywords with weights:

 Query: Sports 0.8, golf 0.5

 I am trying to find the closest matching contents for this query.

 My question is how to index the contents with weighted scores, and how to
 write search query. I was trying to use boosting, but seems not working
 right.

 Thanks.

 Jianbin






Re: weighted search and index

2010-03-03 Thread Erick Erickson
Then I'm totally lost as to what you're trying to accomplish. Perhaps
a higher-level statement of the problem would help.

Because no matter how often I look at your point 2, I don't see
what relevance the numbers have if you're not using them to
boost at index time. Why are they even there?

Erick

On Wed, Mar 3, 2010 at 8:54 PM, Jianbin Dai j...@huawei.com wrote:

 Thank you very much Erick!

 1. I used boost in search, but I don't know exactly what's the best way to
 boost, for such as Sports 0.8, golf 0.5 in my example, would it be
 sports^0.8 AND golf^0.5 ?


 2. I cannot use boost in indexing. Because the weight of the value changes,
 not the field, look at this example again,

 C1: fruit 0.8, apple 0.4, banana 0.2
 C2: music 0.9, pop song 0.6, Britney Spears 0.4

 There is no good way to boost it during indexing.

 Thanks.

 JB


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Wednesday, March 03, 2010 5:45 PM
 To: solr-user@lucene.apache.org
 Subject: Re: weighted search and index

 You have to provide some more details to get meaningful help.

 You say I was trying to use boosting. How? At index time?
 Search time? Both? Can you provide some code snippets?
 What does your schema look like for the relevant field(s)?

 You say but seems not working right. What does that mean? No hits?
 Hits not ordered as you expect? Have you tried putting debugQuery=on on
 your URL and examined the return values?

 Have you looked at your index with the admin page and/or Luke to see if
 the data in the index is as you expect?

 As far as I know, boosts are multiplicative. So boosting by a value less
 than
 1 will actually decrease the ranking. But see the Lucene scoring, See:

 http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.
 htmlhttp://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.%0Ahtml

 And remember, that boosting will *tend* to move a hit up or down in the
 ranking, not position it absolutely.

 HTH
 Erick

 On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote:

  Hi,
 
  I am trying to use solr for a content match application.
 
  A content is described by a set of keywords with weights associated, eg.,
 
  C1: fruit 0.8, apple 0.4, banana 0.2
  C2: music 0.9, pop song 0.6, Britney Spears 0.4
 
  Those contents would be indexed in solr.
  In the search, I also have a set of keywords with weights:
 
  Query: Sports 0.8, golf 0.5
 
  I am trying to find the closest matching contents for this query.
 
  My question is how to index the contents with weighted scores, and how to
  write search query. I was trying to use boosting, but seems not working
  right.
 
  Thanks.
 
  Jianbin
 
 
 




RE: weighted search and index

2010-03-03 Thread Jianbin Dai
Hi Erick,

Each doc contains some keywords that are indexed. However each keyword is
associated with a weight to represent its importance. In my example, 
D1: fruit 0.8, apple 0.4, banana 0.2

The keyword fruit is the most important keyword, which means I really really
want it to be matched in a search result, but banana is less important (It
would be good to be matched though).

Hope that explains.

Thanks.

JB



-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, March 03, 2010 6:23 PM
To: solr-user@lucene.apache.org
Subject: Re: weighted search and index

Then I'm totally lost as to what you're trying to accomplish. Perhaps
a higher-level statement of the problem would help.

Because no matter how often I look at your point 2, I don't see
what relevance the numbers have if you're not using them to
boost at index time. Why are they even there?

Erick

On Wed, Mar 3, 2010 at 8:54 PM, Jianbin Dai j...@huawei.com wrote:

 Thank you very much Erick!

 1. I used boost in search, but I don't know exactly what's the best way to
 boost, for such as Sports 0.8, golf 0.5 in my example, would it be
 sports^0.8 AND golf^0.5 ?


 2. I cannot use boost in indexing. Because the weight of the value
changes,
 not the field, look at this example again,

 C1: fruit 0.8, apple 0.4, banana 0.2
 C2: music 0.9, pop song 0.6, Britney Spears 0.4

 There is no good way to boost it during indexing.

 Thanks.

 JB


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Wednesday, March 03, 2010 5:45 PM
 To: solr-user@lucene.apache.org
 Subject: Re: weighted search and index

 You have to provide some more details to get meaningful help.

 You say I was trying to use boosting. How? At index time?
 Search time? Both? Can you provide some code snippets?
 What does your schema look like for the relevant field(s)?

 You say but seems not working right. What does that mean? No hits?
 Hits not ordered as you expect? Have you tried putting debugQuery=on on
 your URL and examined the return values?

 Have you looked at your index with the admin page and/or Luke to see if
 the data in the index is as you expect?

 As far as I know, boosts are multiplicative. So boosting by a value less
 than
 1 will actually decrease the ranking. But see the Lucene scoring, See:


http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.

htmlhttp://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Simila
rity.%0Ahtml

 And remember, that boosting will *tend* to move a hit up or down in the
 ranking, not position it absolutely.

 HTH
 Erick

 On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote:

  Hi,
 
  I am trying to use solr for a content match application.
 
  A content is described by a set of keywords with weights associated,
eg.,
 
  C1: fruit 0.8, apple 0.4, banana 0.2
  C2: music 0.9, pop song 0.6, Britney Spears 0.4
 
  Those contents would be indexed in solr.
  In the search, I also have a set of keywords with weights:
 
  Query: Sports 0.8, golf 0.5
 
  I am trying to find the closest matching contents for this query.
 
  My question is how to index the contents with weighted scores, and how
to
  write search query. I was trying to use boosting, but seems not working
  right.
 
  Thanks.
 
  Jianbin
 
 
 





Confused with Shards multicore search results

2010-03-03 Thread JavaGuy84

Hi,

I finally got shards work with multicore but now I am facing a different
issue. 

I have 2 seperate schema / data config files for each core. I also have
different unique id for each schema.xml file.  

I indexed both the cores and I was able to successfully search independently
on each core but when I used Shards, I didnt get what I expected. For ex:

http://localhost:8990/solr/core0/select?q=1565 returned 1 row
http://localhost:8990/solr/core1/select?q=1565 returned 1 row

When I tried this
http://localhost:8990/solr/core0/select/?q=1565shards=localhost:8990/solr/core0,localhost:8990/solr/core1

It again returned just one row.. but I would think that it should return 2
rows if I have different unique id for each document. 

Is there any configuration I need to do in order to make it searchable
across multiple indexex? any primary / slave configuration? any help would
be of great help to me.

Thanks a lot in advance.

Thanks,
Barani
-- 
View this message in context: 
http://old.nabble.com/Confused-with-Shards-multicore-search-results-tp27776478p27776478.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Implementing hierarchical facet

2010-03-03 Thread Andy
This dynamicfield feature is great. Didn't know about it.

Thanks!

--- On Wed, 3/3/10, Geert-Jan Brits gbr...@gmail.com wrote:

From: Geert-Jan Brits gbr...@gmail.com
Subject: Re: Implementing hierarchical facet
To: solr-user@lucene.apache.org
Date: Wednesday, March 3, 2010, 5:04 AM

you could always define 1 dynamicfield and encode the hierarchy level in the
fieldname:

dynamicField name=_loc_hier_* type=string stored=false indexed=true
omitNorms=true/
using:
facet=onfacet.field={!key=Location}_loc_hier_cityfq=_loc_hier_country:somecountryid
...
adding cityarea later for instance would be as simple as:
facet=onfacet.field={!key=Location}_loc_hier_cityareafq=_loc_hier_city:somecityid

Cheers,
Geert-Jan


2010/3/3 Andy angelf...@yahoo.com

 Thanks. I didn't know about the {!key=Location} trick.

 Thanks everyone for your help. From what I could gather, there're 3
 approaches:

 1) SOLR-64
 Pros:
 - can have arbitrary levels of hierarchy without modifying schema
 Cons:
 - each combination of all the levels in the hierarchy will result in a
 separate filter cache. This number could be huge, which would lead to poor
 performance

 2) SOLR-792
 Pros:
 - each level of the hierarchy separately results in filter cache. Much
 smaller number of filter cache. Better performance.
 Cons:
 - Only 2 levels are supported

 3) Separate fields for each hierarchy levels
 Pros:
 - same as SOLR-792. Good performance
 Cons:
 - can only handle a fixed number of levels in the hierarchy. Adding any
 levels beyond that requires schema modification

 Does that sound right?

 Option 3 is probably the best match for my use case. Is there any trick to
 make it able to deal with arbitrary number of levels?

 Thanks.

 --- On Tue, 3/2/10, Geert-Jan Brits gbr...@gmail.com wrote:

 From: Geert-Jan Brits gbr...@gmail.com
 Subject: Re: Implementing hierarchical facet
 To: solr-user@lucene.apache.org
 Date: Tuesday, March 2, 2010, 8:02 PM

 Using Solr 1.4: even less changes to the frontend:

 facet=onfacet.field={!key=Location}countryid
 ...
 facet=onfacet.field={!key=Location}cityidfq=countryid:somecountryid
 etc.

 will consistently render the resulting facet under the name Location .


 2010/3/3 Geert-Jan Brits gbr...@gmail.com

  If it's a requirement to let Solr handle the facet-hierarchy please
  disregard this post, but
  an alternative would be to have your App control when to ask for which
  'facet-level' (e.g: country, state, city) in the hierarchy.
 
  as follows,
 
  each doc has 3 seperate fields (indexed=true, stored=false):
  - countryid
  - stateid
  - cityid
 
  facet on country:
  facet=onfacet.field=countryid
 
  facet on state ( country selected. functionally you probably don't want
 to
  show states without the user having selected a country anyway)
  facet=onfacet.field=countryidfq=countryid:somecountryid
 
  facet on city (state selected, same functional analogy as above)
  facet=onfacet.field=cityidfq=stateid:somestateid
 
  or
 
  facet on city (countryselected, same functional analogy as above)
  facet=onfacet.field=cityidfq=countryid:somecountryid
 
  grab the resulting facat and drop it under Location
 
  pros:
  - reusing fq's (good performance, I've never used hierarchical facets,
 but
  would be surprised if it has a (major) speed increase to this method)
  - flexible (you get multiple hierarchies: country -- state -- city and
  country -- city)
 
  cons:
  - a little more application logic
 
  Hope that helps,
  Geert-Jan
 
 
 
 
 
  2010/3/2 Andy angelf...@yahoo.com
 
  I read that a simple way to implement hierarchical facet is to
 concatenate
  strings with a separator. Something like level1level2level3 with 
 as
  the separator.
 
  A problem with this approach is that the number of facet values will
  greatly increase.
 
  For example I have a facet Location with the hierarchy
  countrystatecity. Using the above approach every single city will lead
 to
  a separate facet value. With tens of thousands of cities in the world
 the
  response from Solr will be huge. And then on the client side I'd have to
  loop through all the facet values and combine those with the same
 country
  into a single value.
 
  Ideally Solr would be aware of the hierarchy structure and send back
  responses accordingly. So at level 1 Solr will send back facet values
 based
  on country (100 or so values). Level 2 the facet values will be based on
 the
  states within the selected country (a few dozen values). Next level will
 be
  cities within that state. and so on.
 
  Is it possible to implement hierarchical facet this way using Solr?
 
 
 
 
 
 
 








  

Re: 2 Cores, 1 Table, 2 DataImporter -- Import at the same time ?

2010-03-03 Thread Lance Norskog
No, a core is a lucene index. Two DataImportHandler sessions to the
same core will run on the same index.

You should use lockType of simple or native. 'single' should only be
used on a read-only index.

From the stack trace it looks like you're only using one index in
solr/core. You have to configure two separate cores with separate core
directories. Check out the example/multicore directory for how that
works.

On Wed, Mar 3, 2010 at 6:39 AM, stocki st...@shopgate.com wrote:


 okay i change the lockType to single but with no good effect.

 so i think now, that my two DIH are using the same data-Folder. why ist it
 so ? i thought that each DIH use his own index ... ?!

 i think it is not possible to import from one table parallel with more than
 one DIH`s ?!


 myexception:

 java.io.FileNotFoundException:
 /var/lib/tomcat5.5/temp/solr/data/index/_5d.fnm (No such file or directory)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.init(RandomAccessFile.java:212)
        at
 org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.init(SimpleFSDirectory.java:78)
        at
 org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.init(SimpleFSDirectory.java:108)
        at
 org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.init(NIOFSDirectory.java:94)
        at
 org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:70)
        at
 org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:691)
        at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:68)
        at
 org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:116)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608)
        at
 org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:686)
        at
 org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:662)
        at
 org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:954)
        at
 org.apache.lucene.index.IndexWriter.applyDeletes(IndexWriter.java:5190)
        at
 org.apache.lucene.index.IndexWriter.doFlushInternal(IndexWriter.java:4354)
        at
 org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:4192)
        at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:4183)
        at
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2647)
        at
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2601)
        at
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
        at
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
        at
 org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75)
        at
 org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292)
        at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:392)
        at
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
        at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
        at
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
        at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
        at
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)





 Erik Hatcher-4 wrote:

 what's the error you're getting?

 is DIH keeping some static that prevents it from running across two
 cores separately?  if so, that'd be a bug.

       Erik

 On Mar 3, 2010, at 4:12 AM, stocki wrote:


 pleeease help me somebody =( :P




 stocki wrote:

 Hello again ;)

 i install tomcat5.5 on my debian server ...

 i use 2 cores and two different DIH with seperatet Index, one for the
 normal search-feature and the other core for the suggest-feature.

 but i cannot start both DIH with an import command at the same
 time. how
 it this possible ?


 thx


 --
 View this message in context:
 http://old.nabble.com/2-Cores%2C-1-Table%2C-2-DataImporter---%3E-Import-at-the-same-time---tp27756255p27765825.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 View this message in context: 
 http://old.nabble.com/SEVERE%3A-SolrIndexWriter-was-not-closed-prior-to-finalize%28%29%2C-indicates-a-bugPOSSIBLE-RESOURCE-LEAK%21%21%21-tp27756255p27768997.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
Lance Norskog
goks...@gmail.com


facet performance when number of values is large

2010-03-03 Thread Andy
I have a facet field whose values are created by users. So potentially there 
could be a very large number of values. is that going to be a problem 
performance-wise?

A few more questions to help me understand how facet works:
- after the filter cache warmed up, will the (if any) performance problems 
caused by large number of facet values go away?
I thought that would be the case but according to the benchmark here: 
http://wiki.apache.org/solr/HierarchicalFaceting
SOLR-64 still had very poor performance even after the filter caches are warmed 

- In the wiki it was stated that facet.method=fc is excellent for situations 
where the number of indexed values for the field is high. Would that be the 
solution?




  

Re: Escaping options for tika/solr cell extract-only output

2010-03-03 Thread Lance Norskog
You can return it with any of the other writers, like JSON or PHP.

The alternative design decision for the XML output writer would be to
emit using CDATA instead of escaping.

On Wed, Mar 3, 2010 at 12:54 PM, Dan Hertz (Insight 49, LLC)
insigh...@gmail.com wrote:
 Looking at http://wiki.apache.org/solr/ExtractingRequestHandler:

 Extract Only
 the output includes XML generated by Tika (and is hence further escaped by
 Solr's XML)

 ...is there an option to NOT have the resulting TIKA output escaped?

 so lt;headgt; would come back as head/

 If no, what would need to be done to enable this option? Looked into
 SOLR-1274.patch, but didn't see a parameter for such a thing.

 Thanks,

 Dan




-- 
Lance Norskog
goks...@gmail.com


Re: weighted search and index

2010-03-03 Thread Lance Norskog
Boosting by convention is flat at 1.0. Usually people boost with
numbers like 3 or 5 or 20.

On Wed, Mar 3, 2010 at 6:34 PM, Jianbin Dai j...@huawei.com wrote:
 Hi Erick,

 Each doc contains some keywords that are indexed. However each keyword is
 associated with a weight to represent its importance. In my example,
 D1: fruit 0.8, apple 0.4, banana 0.2

 The keyword fruit is the most important keyword, which means I really really
 want it to be matched in a search result, but banana is less important (It
 would be good to be matched though).

 Hope that explains.

 Thanks.

 JB



 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Wednesday, March 03, 2010 6:23 PM
 To: solr-user@lucene.apache.org
 Subject: Re: weighted search and index

 Then I'm totally lost as to what you're trying to accomplish. Perhaps
 a higher-level statement of the problem would help.

 Because no matter how often I look at your point 2, I don't see
 what relevance the numbers have if you're not using them to
 boost at index time. Why are they even there?

 Erick

 On Wed, Mar 3, 2010 at 8:54 PM, Jianbin Dai j...@huawei.com wrote:

 Thank you very much Erick!

 1. I used boost in search, but I don't know exactly what's the best way to
 boost, for such as Sports 0.8, golf 0.5 in my example, would it be
 sports^0.8 AND golf^0.5 ?


 2. I cannot use boost in indexing. Because the weight of the value
 changes,
 not the field, look at this example again,

 C1: fruit 0.8, apple 0.4, banana 0.2
 C2: music 0.9, pop song 0.6, Britney Spears 0.4

 There is no good way to boost it during indexing.

 Thanks.

 JB


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Wednesday, March 03, 2010 5:45 PM
 To: solr-user@lucene.apache.org
 Subject: Re: weighted search and index

 You have to provide some more details to get meaningful help.

 You say I was trying to use boosting. How? At index time?
 Search time? Both? Can you provide some code snippets?
 What does your schema look like for the relevant field(s)?

 You say but seems not working right. What does that mean? No hits?
 Hits not ordered as you expect? Have you tried putting debugQuery=on on
 your URL and examined the return values?

 Have you looked at your index with the admin page and/or Luke to see if
 the data in the index is as you expect?

 As far as I know, boosts are multiplicative. So boosting by a value less
 than
 1 will actually decrease the ranking. But see the Lucene scoring, See:


 http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.

 htmlhttp://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Simila
 rity.%0Ahtml

 And remember, that boosting will *tend* to move a hit up or down in the
 ranking, not position it absolutely.

 HTH
 Erick

 On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote:

  Hi,
 
  I am trying to use solr for a content match application.
 
  A content is described by a set of keywords with weights associated,
 eg.,
 
  C1: fruit 0.8, apple 0.4, banana 0.2
  C2: music 0.9, pop song 0.6, Britney Spears 0.4
 
  Those contents would be indexed in solr.
  In the search, I also have a set of keywords with weights:
 
  Query: Sports 0.8, golf 0.5
 
  I am trying to find the closest matching contents for this query.
 
  My question is how to index the contents with weighted scores, and how
 to
  write search query. I was trying to use boosting, but seems not working
  right.
 
  Thanks.
 
  Jianbin
 
 
 







-- 
Lance Norskog
goks...@gmail.com


Re: Confused with Shards multicore search results

2010-03-03 Thread Lance Norskog
different unique id for each schema.xml file.

All cores should have the same schema file with the same unique id
field and type.

Did you mean that the documents in both cores have a different value
for the unique id field?

On Wed, Mar 3, 2010 at 6:45 PM, JavaGuy84 bbar...@gmail.com wrote:

 Hi,

 I finally got shards work with multicore but now I am facing a different
 issue.

 I have 2 seperate schema / data config files for each core. I also have
 different unique id for each schema.xml file.

 I indexed both the cores and I was able to successfully search independently
 on each core but when I used Shards, I didnt get what I expected. For ex:

 http://localhost:8990/solr/core0/select?q=1565 returned 1 row
 http://localhost:8990/solr/core1/select?q=1565 returned 1 row

 When I tried this
 http://localhost:8990/solr/core0/select/?q=1565shards=localhost:8990/solr/core0,localhost:8990/solr/core1

 It again returned just one row.. but I would think that it should return 2
 rows if I have different unique id for each document.

 Is there any configuration I need to do in order to make it searchable
 across multiple indexex? any primary / slave configuration? any help would
 be of great help to me.

 Thanks a lot in advance.

 Thanks,
 Barani
 --
 View this message in context: 
 http://old.nabble.com/Confused-with-Shards-multicore-search-results-tp27776478p27776478.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
Lance Norskog
goks...@gmail.com


Re: Confused with Shards multicore search results

2010-03-03 Thread JavaGuy84

Thanks a lot for your reply, I will surely try this..

I have a requirement to index 2 diff schema's but need to do a search on
both using a single url.

Is there a way I can have 2 diff schema's / data config file and do a search
on both the indexes using a single URL (like using Shards?)

Thanks,
Barani

Lance Norskog-2 wrote:
 
 different unique id for each schema.xml file.
 
 All cores should have the same schema file with the same unique id
 field and type.
 
 Did you mean that the documents in both cores have a different value
 for the unique id field?
 
 On Wed, Mar 3, 2010 at 6:45 PM, JavaGuy84 bbar...@gmail.com wrote:

 Hi,

 I finally got shards work with multicore but now I am facing a different
 issue.

 I have 2 seperate schema / data config files for each core. I also have
 different unique id for each schema.xml file.

 I indexed both the cores and I was able to successfully search
 independently
 on each core but when I used Shards, I didnt get what I expected. For ex:

 http://localhost:8990/solr/core0/select?q=1565 returned 1 row
 http://localhost:8990/solr/core1/select?q=1565 returned 1 row

 When I tried this
 http://localhost:8990/solr/core0/select/?q=1565shards=localhost:8990/solr/core0,localhost:8990/solr/core1

 It again returned just one row.. but I would think that it should return
 2
 rows if I have different unique id for each document.

 Is there any configuration I need to do in order to make it searchable
 across multiple indexex? any primary / slave configuration? any help
 would
 be of great help to me.

 Thanks a lot in advance.

 Thanks,
 Barani
 --
 View this message in context:
 http://old.nabble.com/Confused-with-Shards-multicore-search-results-tp27776478p27776478.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 -- 
 Lance Norskog
 goks...@gmail.com
 
 

-- 
View this message in context: 
http://old.nabble.com/Confused-with-Shards-multicore-search-results-tp27776478p2152.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Confused with Shards multicore search results

2010-03-03 Thread Otis Gospodnetic
Hi,

I think this will work as long as the fields involved in the search are 
identical.  That's probably not the case with your shards, though.

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
 From: JavaGuy84 bbar...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thu, March 4, 2010 12:49:31 AM
 Subject: Re: Confused with Shards multicore search results
 
 
 Thanks a lot for your reply, I will surely try this..
 
 I have a requirement to index 2 diff schema's but need to do a search on
 both using a single url.
 
 Is there a way I can have 2 diff schema's / data config file and do a search
 on both the indexes using a single URL (like using Shards?)
 
 Thanks,
 Barani
 
 Lance Norskog-2 wrote:
  
  different unique id for each schema.xml file.
  
  All cores should have the same schema file with the same unique id
  field and type.
  
  Did you mean that the documents in both cores have a different value
  for the unique id field?
  
  On Wed, Mar 3, 2010 at 6:45 PM, JavaGuy84 wrote:
 
  Hi,
 
  I finally got shards work with multicore but now I am facing a different
  issue.
 
  I have 2 seperate schema / data config files for each core. I also have
  different unique id for each schema.xml file.
 
  I indexed both the cores and I was able to successfully search
  independently
  on each core but when I used Shards, I didnt get what I expected. For ex:
 
  http://localhost:8990/solr/core0/select?q=1565 returned 1 row
  http://localhost:8990/solr/core1/select?q=1565 returned 1 row
 
  When I tried this
  
 http://localhost:8990/solr/core0/select/?q=1565shards=localhost:8990/solr/core0,localhost:8990/solr/core1
 
  It again returned just one row.. but I would think that it should return
  2
  rows if I have different unique id for each document.
 
  Is there any configuration I need to do in order to make it searchable
  across multiple indexex? any primary / slave configuration? any help
  would
  be of great help to me.
 
  Thanks a lot in advance.
 
  Thanks,
  Barani
  --
  View this message in context:
  
 http://old.nabble.com/Confused-with-Shards-multicore-search-results-tp27776478p27776478.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
  
  
  
  -- 
  Lance Norskog
  goks...@gmail.com
  
  
 
 -- 
 View this message in context: 
 http://old.nabble.com/Confused-with-Shards-multicore-search-results-tp27776478p2152.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Update Index : Updating Specific Fields

2010-03-03 Thread Kranti™ K K Parisa
Hi,

Is there any way to update the index for only the specific fields?

Eg:
Index has ONE document consists of 4 fields,  F1, F2, F3, F4
Now I want to update the value of field F2, so if I send the update xml to
SOLR, can it keep the old field values for F1,F3,F4 and update the new value
specified for F2?

Best Regards,
Kranti K K Parisa


Re: Update Index : Updating Specific Fields

2010-03-03 Thread Walter Underwood
No. --wunder

On Mar 3, 2010, at 10:40 PM, Kranti™ K K Parisa wrote:

 Hi,
 
 Is there any way to update the index for only the specific fields?
 
 Eg:
 Index has ONE document consists of 4 fields,  F1, F2, F3, F4
 Now I want to update the value of field F2, so if I send the update xml to
 SOLR, can it keep the old field values for F1,F3,F4 and update the new value
 specified for F2?
 
 Best Regards,
 Kranti K K Parisa



Too many .cfs files

2010-03-03 Thread mklprasad

HI All,
I set up my 'mergerfactor ' as 10.
i have loaded a 1million docs in to solr,after that iam able to see 14 .cfs
files in my data/index folder.
mergeFactor will not merge after the 11th record comes?

Plese clearify?

Thanks,
Prasad

-- 
View this message in context: 
http://old.nabble.com/Too-many-.cfs-files-tp2508p2508.html
Sent from the Solr - User mailing list archive at Nabble.com.