Solr scraping: Nutch and other alternatives.

2011-10-18 Thread Luis Cappa Banda
Hello everyone.

I've been thinking about a way to retrieve information from a domain (for
example, http://www.ign.com) to process and index. My idea is to use Solr as
a searcher. I'm familiarized with Apache Nutch and I know that the latest
version has a gateway to Solr to retrieve and index information with it. I
tried it and it worked fine, but it's a little bit complex to develop
plugins to process info and index it in a new field desired. Perhaps one of
you have tried another (and better) alternative to data mine web
information. Which is your recommendation? Can you give me any scraping
suggestion?

Thank you very much.

Luis Cappa.


Re: feeding while solr is running ?

2011-10-18 Thread lorenlai
Hello Alireza,

thank you for your reply. I will read the solr tutorial ;-)

Cheers

Loren

--
View this message in context: 
http://lucene.472066.n3.nabble.com/feeding-while-solr-is-running-tp3428500p3430478.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: feeding while solr is running ?

2011-10-18 Thread lorenlai
Hello Robert,

also many thanks to you for the LINKS and the short explanation. ;-)

*hug*  cheers

Loren




--
View this message in context: 
http://lucene.472066.n3.nabble.com/feeding-while-solr-is-running-tp3428500p3430483.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Controlling the order of partial matches based on the position

2011-10-18 Thread lee carroll
this link is on he mailing list recently.


http://www.lucidimagination.com/search/document/dfa18d52e7e8197c/getting_answers_starting_with_a_requested_string_first#b18e9f922c1e4149

On 18 October 2011 00:59, aronitin aro_ni...@yahoo.com wrote:
 Guys,

 It's been almost a week but there are no replies to the question that I
 posted.

 If its a small problem and already answered somewhere, please point me to
 that post. Otherwise please suggest any pointer to handle the requirement
 mentioned in the question,

 Nitin

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Controlling-the-order-of-partial-matches-based-on-the-position-tp3413867p3429823.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr scraping: Nutch and other alternatives.

2011-10-18 Thread Marco Martinez
Hi Luis,

Have you tried the copyField function with custom analyzers and tokenizers?

bye,

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2011/10/18 Luis Cappa Banda luisca...@gmail.com

 Hello everyone.

 I've been thinking about a way to retrieve information from a domain (for
 example, http://www.ign.com) to process and index. My idea is to use Solr
 as
 a searcher. I'm familiarized with Apache Nutch and I know that the latest
 version has a gateway to Solr to retrieve and index information with it. I
 tried it and it worked fine, but it's a little bit complex to develop
 plugins to process info and index it in a new field desired. Perhaps one of
 you have tried another (and better) alternative to data mine web
 information. Which is your recommendation? Can you give me any scraping
 suggestion?

 Thank you very much.

 Luis Cappa.



Re: Controlling the order of partial matches based on the position

2011-10-18 Thread Marco Martinez
Hi,

I would use a custom function query that uses termPositions to calculate the
order of the values in the field to accomplished your requirements.

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2011/10/18 aronitin aro_ni...@yahoo.com

 Guys,

 It's been almost a week but there are no replies to the question that I
 posted.

 If its a small problem and already answered somewhere, please point me to
 that post. Otherwise please suggest any pointer to handle the requirement
 mentioned in the question,

 Nitin

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Controlling-the-order-of-partial-matches-based-on-the-position-tp3413867p3429823.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr scraping: Nutch and other alternatives.

2011-10-18 Thread Markus Jelsma
I'm a bit biased but i would certainly use Nutch as it's the right tool for 
the job, it seems. Developing custom plugins is actually easier than you might 
think.

Solr, with it's extracting request handling, can only help in a very limited 
way.

 Hello everyone.
 
 I've been thinking about a way to retrieve information from a domain (for
 example, http://www.ign.com) to process and index. My idea is to use Solr
 as a searcher. I'm familiarized with Apache Nutch and I know that the
 latest version has a gateway to Solr to retrieve and index information
 with it. I tried it and it worked fine, but it's a little bit complex to
 develop plugins to process info and index it in a new field desired.
 Perhaps one of you have tried another (and better) alternative to data
 mine web
 information. Which is your recommendation? Can you give me any scraping
 suggestion?
 
 Thank you very much.
 
 Luis Cappa.


Re: Question about near query order

2011-10-18 Thread Ahmet Arslan

 analyze term~2
 term analyze~2 
 
 In my case, two queries return different result set.
 Isn't that in your case?

Hmm you are right, I tested with a trunk instance using lucene query parser. 
Results sets were different. If I am not wrong they were same at some version.

I can suggest you two different solutions :

One is to use SurroundQueryParser where you have explicit control over ordered 
versus unordered. But it has its own limitations. See below:

http://wiki.apache.org/solr/SurroundQueryParser

Second one : There is an example in lucene in action book about how to override 
QueryParser and replace PhraseQuery with SpanNearQuery. SpanNearQuery also has 
a Boolean parameter inOrder, You can use that example code. If you pass false 
to its constructor you will obtain unordered phrase query. But this second 
option assumes that you are using lucene query parser. defType=lucene


Re: Solr scraping: Nutch and other alternatives.

2011-10-18 Thread Óscar Marín Miró
Hi Luis, just an opinion (worked with Nutch intensively, 2005-2008).
Web crawling is a bitch, and Nutch won't make it any easier.

Some problems you'll find along the way:

   1. Spidering tunnels/traps
   2. Duplicate and near-duplicate content removal
   3. GET parameter explosion in dynamic pages
   4. Compromises between breadth and depth of crawl (you only have that
   much time, and every site has its unique link geometry)

Nutch has its own set of tools (urlfilters, depth control...) to cope with
each problem, but sometimes you solve, say, 3, and 4 comes back again.

My advice would be to use some popular search engines as a way to mine the
web (you always can ask for all the pages indexed in a domain). They have
done this job, and nicely done. In fact, due to their ranking algorithms
(based on link geometry), a 'popular' page will always be indexed, and to
me, that's a good circumstance (i.e: you can always claim that with your own
web crawler you've covered more url's for a specific site, but what's the
value if the extra url's are *not that important* ?)

If I'm absolutely forced to crawl a site, I use plain old 'curl' or 'wget'.
Open source, tunable via a vast array of parameters and 'black boxes'. I do
not see any justification in deploying 'the nutch monster' just to crawl
some web portion already crawled by popular search engines

On the 'scrapping' / xhtml mining front, 'mechanize' library (python, perl,
ruby, whatever flavour) and 'Beautiful Soup' for python have always fed my
hunger for web scrapping.

Good luck :D


On Tue, Oct 18, 2011 at 9:16 AM, Marco Martinez 
mmarti...@paradigmatecnologico.com wrote:

 Hi Luis,

 Have you tried the copyField function with custom analyzers and tokenizers?

 bye,

 Marco Martínez Bautista
 http://www.paradigmatecnologico.com
 Avenida de Europa, 26. Ática 5. 3ª Planta
 28224 Pozuelo de Alarcón
 Tel.: 91 352 59 42


 2011/10/18 Luis Cappa Banda luisca...@gmail.com

  Hello everyone.
 
  I've been thinking about a way to retrieve information from a domain (for
  example, http://www.ign.com) to process and index. My idea is to use
 Solr
  as
  a searcher. I'm familiarized with Apache Nutch and I know that the latest
  version has a gateway to Solr to retrieve and index information with it.
 I
  tried it and it worked fine, but it's a little bit complex to develop
  plugins to process info and index it in a new field desired. Perhaps one
 of
  you have tried another (and better) alternative to data mine web
  information. Which is your recommendation? Can you give me any scraping
  suggestion?
 
  Thank you very much.
 
  Luis Cappa.
 




-- 
Whether it's science, technology, personal experience, true love, astrology,
or gut feelings, each of us has confidence in something that we will never
fully comprehend.
 --Roy H. William


Re: upgrading 1.4 to 3.x

2011-10-18 Thread deniz
well i made a little diggin on web... so the problem is also described here
too 

https://issues.apache.org/bugzilla/show_bug.cgi?id=40719

basically there was no details in the tomcat logs (maybe in some other logs
but well i dont know)

i came up with the same problem while implementing something else... anyway
i hope this will be helpful for anyone who gets the same error 

thank you for you all who helped me with the issue

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/upgrading-1-4-to-3-x-tp3415044p3430748.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to change default operator in velocity?

2011-10-18 Thread hadi
in solr schema the defaultOperator value is OR but when i use
browse(http://localhost:8983/solr/browse)for searching AND is a
defaultOperator,and that config in solr is not affect on velocity how can i
change the velocity template engine default operators?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3430871.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to retreive multiple documents using one unique field?

2011-10-18 Thread kiran.bodigam
I have four different documents in single xml file(to be indexed), i don't
want inject the unique field  for each and every document .when i search
with with the unique field all the four documents should come in result.i.e
can common unique field should be applied to the all documents?
My xml format :
add
docfield/field/doc
docfield/field/doc
docfield/field/doc
docfield/field/doc
commonuniqueid123/id/commonunique
/add
If i search for 123 all the four documents should come Is it possible ?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-retreive-multiple-documents-using-one-unique-field-tp3430931p3430931.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: help with phrase query

2011-10-18 Thread elisabeth benoit
I think you can use pf2 and pf3 in your requestHandler.

Best regards,
Elisabeth

2011/10/16 Vijay Ramachandran vijay...@gmail.com

 Hello. I have an application where I try to match longer queries
 (sentences)
 to short documents (search phrases). Typically, the documents are 3-5 terms
 in length. I am facing a problem where phrase match in the indicated phrase
 fields via pf doesn't seem to match in most cases, and I am stumped.
 Please help!

 For instance, when my query is should I buy a house now while the rates
 are
 low. We filed BR 2 yrs ago. Rent now, w/ some sch loan debt

 I expect the document buy a house to match much higher than house
 loan rates.
 However, the latter is the document which always matches higher.


 I tried to do this the following way (solr 3.1):
 1. Score phrase matches high
 2. Score single word matches lower
 3. Use dismax with a mm of 1, and very high boost for exact phrase match.

 I used the s text definition in the schema for the single words, and the
 following for the phrase:

fieldType name=shingle class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=0
 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
filter class=solr.ShingleFilterFactory maxShingleSize=3
 outputUnigrams=false/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1
catenateWords=0 catenateNumbers=0 catenateAll=0
 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
filter class=solr.ShingleFilterFactory maxShingleSize=3
 outputUnigrams=false/
  /analyzer
/fieldType

 and my schema fields look like this:

   field name=kw_stopped type=text_en indexed=true omitNorms=True
 /

   !-- keywords almost as is - to provide truer match for full phrases --
   field name=kw_phrases type=shingle indexed=true omitNorms=True
 /

 This is my search handler config:

  requestHandler name=edismax class=solr.SearchHandler default=true
lst name=defaults
 str name=defTypeedismax/str
 str name=echoParamsexplicit/str
 float name=tie0.1/float
 str name=fl
   kpid,advid,campaign,keywords
 /str
 str name=mm1/str
 str name=qf
   kw_stopped^1.0
 /str
 str name=pf
   kw_phrases^50.0
 /str
 int name=ps3/int
 int name=qs3/int
 str name=q.alt*:*/str
 !-- example highlighter config, enable per-query with hl=true --
 str name=hl.flkeywords/str
 !-- for this field, we want no fragmenting, just highlighting --
 str name=f.name.hl.fragsize0/str
 !-- instructs Solr to return the field itself if no query terms are
  found --
 str name=f.name.hl.alternateFieldtitle/str
 str name=f.text.hl.fragmenterregex/str !-- defined below --
/lst
  /requestHandler

 These are the match score debugQuery explanations:

 8.480054E-4 = (MATCH) sum of:
  8.480054E-4 = (MATCH) product of:
0.0031093531 = (MATCH) sum of:
  0.0015556295 = (MATCH) weight(kw_stopped:hous in 1812), product of:
2.8209004E-4 = queryWeight(kw_stopped:hous), product of:
  5.514656 = idf(docFreq=25, maxDocs=2375)
  5.1152787E-5 = queryNorm
5.514656 = (MATCH) fieldWeight(kw_stopped:hous in 1812), product of:
  1.0 = tf(termFreq(kw_stopped:hous)=1)
  5.514656 = idf(docFreq=25, maxDocs=2375)
  1.0 = fieldNorm(field=kw_stopped, doc=1812)
  8.192911E-4 = (MATCH) weight(kw_stopped:rate in 1812), product of:
2.0471694E-4 = queryWeight(kw_stopped:rate), product of:
  4.002068 = idf(docFreq=117, maxDocs=2375)
  5.1152787E-5 = queryNorm
4.002068 = (MATCH) fieldWeight(kw_stopped:rate in 1812), product of:
  1.0 = tf(termFreq(kw_stopped:rate)=1)
  4.002068 = idf(docFreq=117, maxDocs=2375)
  1.0 = fieldNorm(field=kw_stopped, doc=1812)
  7.344327E-4 = (MATCH) weight(kw_stopped:loan in 1812), product of:
1.9382538E-4 = queryWeight(kw_stopped:loan), product of:
  3.7891462 = idf(docFreq=145, maxDocs=2375)
  5.1152787E-5 = queryNorm
3.7891462 = (MATCH) fieldWeight(kw_stopped:loan in 1812), product
 of:
  1.0 = tf(termFreq(kw_stopped:loan)=1)
  3.7891462 = idf(docFreq=145, maxDocs=2375)
  1.0 = fieldNorm(field=kw_stopped, doc=1812)
0.27272728 = coord(3/11)

 for house loan rates vs

 8.480054E-4 = (MATCH) sum of:
  8.480054E-4 = 

IndexBasedSpellChecker on multiple fields

2011-10-18 Thread Simone Tripodi
Hi all guys,
I need to configure the IndexBasedSpellChecker that uses more than
just one field as a spelling dictionary, is it possible to achieve?
In the meanwhile I configured two spellcheckers and let users switch
from a checkeer to another via params on GET request, but looks like
people are not particularly happy about it...
The main problem is that fields I need to speel contain different
informations, I mean the intersection between the two sets could be
empty.
Many thanks in advance, all the best!
Simo

http://people.apache.org/~simonetripodi/
http://simonetripodi.livejournal.com/
http://twitter.com/simonetripodi
http://www.99soft.org/


Re: Instructions for Multiple Server Webapps Configuring with JNDI

2011-10-18 Thread Tod

On 10/14/2011 2:44 PM, Chris Hostetter wrote:


: modified the solr/home accordingly.  I have an empty directory under
: tomcat/webapps named after the solr home directory in the context fragment.

if that empty directory has the same base name as your context fragment
(ie: tomcat/webapps/solr0 and solr0.xml) that may give you problems
... the entire point of using context fragment files is to define webapps
independently of a simple directory based hierarchy in tomcat/webapps ...
if you have a directory there with the same name you create a conflict --
which webapp should it use, the empty one, or the one specified by your
contextt file?



Looks like that was the problem, once I removed the ./webapps/solr0 
directory and started tomcat back up it was recreated correctly.





: I expected to fire up tomcat and have it unpack the war file contents into the
: solr home directory specified in the context fragment, but its empty, as is
: the webapps directory.

that's not what the solr/home env variable is for at all.  tomcat will
put the unpacked war where ever it needs/wants to (in theory it could just
load it in memory) ... the point of the solr/home env variable is for you
to tell the solr.war where to find the configuration files for this
context.


Sorry, my mistake.  I wasn't referring to solr/home I was referring 
literally to the new solr home under tomcat - in this instance 
./webapps/solr0.


One more question, is there a particular advantage of multiple solr 
instances vs. multiple solr cores?



Thanks.


Re: How to change default operator in velocity?

2011-10-18 Thread Jan Høydahl
Hi,

The reason why AND is default with /browse is that it uses the dismax query 
parser, which does not currently respect defaultOperator.
If you want an OR like behaviour, try to add at the end of the url: mm=0 
(which means minumum number of terms that should match=0), e.g.
http://localhost:8983/solr/browse?q=samsung+maxtormm=0

For more about mm, see 
http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29
NB: In trunk (4.0), even dismax will respect the defaultOperator from schema.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 18. okt. 2011, at 12:36, hadi wrote:

 in solr schema the defaultOperator value is OR but when i use
 browse(http://localhost:8983/solr/browse)for searching AND is a
 defaultOperator,and that config in solr is not affect on velocity how can i
 change the velocity template engine default operators?
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3430871.html
 Sent from the Solr - User mailing list archive at Nabble.com.



solr fullpledged

2011-10-18 Thread nagarjuna
Hi everybody 
  i just downloaded solr application and modified the config files
as per my requirement and i successfully got the results and i also
developed a sample client application using javascript and i used my solr
url there to retrieve the results evrything is fine
  now i would like to develop same application as a full pledged i
mean i dont want to give risk to the user i need to develop an application
with gud UI and i should get the config file inputs from user ui and i
should store it onto solr xml files using JAva(jsp,springs).is there any way
pls give me suggestions

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-fullpledged-tp3431187p3431187.html
Sent from the Solr - User mailing list archive at Nabble.com.


Find Documents with field = maxValue

2011-10-18 Thread Alireza Salimi
Hi,

It might be a naive question.
Assume we have a list of Document, each Document contains the information of
a person,
there is a numeric field named 'age', how can we find those Documents whose
*age* field
is *max(age) *in one query.

So far I've found that function queries don't support aggregate functions,
but how about nested queries.
*
*Thanks*
*
-- 
Alireza Salimi
Java EE Developer


Re: How to change default operator in velocity?

2011-10-18 Thread hadi
thanks for your reply,i delete the dismax conf from solrconf.xml and
it works,is it any side effect?

On 10/18/11, Jan Høydahl / Cominvent [via Lucene]
ml-node+s472066n3431189...@n3.nabble.com wrote:


 Hi,

 The reason why AND is default with /browse is that it uses the dismax
 query parser, which does not currently respect defaultOperator.
 If you want an OR like behaviour, try to add at the end of the url: mm=0
 (which means minumum number of terms that should match=0), e.g.
 http://localhost:8983/solr/browse?q=samsung+maxtormm=0

 For more about mm, see
 http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29
 NB: In trunk (4.0), even dismax will respect the defaultOperator from
 schema.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 On 18. okt. 2011, at 12:36, hadi wrote:

 in solr schema the defaultOperator value is OR but when i use
 browse(http://localhost:8983/solr/browse)for searching AND is a
 defaultOperator,and that config in solr is not affect on velocity how can
 i
 change the velocity template engine default operators?


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3430871.html
 Sent from the Solr - User mailing list archive at Nabble.com.



 ___
 If you reply to this email, your message will be added to the discussion
 below:
 http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3431189.html

 To unsubscribe from How to change default operator in velocity?, visit
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3430871code=bWQuYW5iYXJpQGdtYWlsLmNvbXwzNDMwODcxfC02NDQ5ODMwMjM=


--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3431294.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: millions of records problem

2011-10-18 Thread Tom Gullo
Getting a solid-state drive might help

--
View this message in context: 
http://lucene.472066.n3.nabble.com/millions-of-records-problem-tp3427796p3431309.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Find Documents with field = maxValue

2011-10-18 Thread Ahmet Arslan


--- On Tue, 10/18/11, Alireza Salimi alireza.sal...@gmail.com wrote:

 From: Alireza Salimi alireza.sal...@gmail.com
 Subject: Find Documents with field = maxValue
 To: solr-user@lucene.apache.org
 Date: Tuesday, October 18, 2011, 4:10 PM
 Hi,
 
 It might be a naive question.
 Assume we have a list of Document, each Document contains
 the information of
 a person,
 there is a numeric field named 'age', how can we find those
 Documents whose
 *age* field
 is *max(age) *in one query.

May be http://wiki.apache.org/solr/StatsComponent?

Or sort by age?  q=*:*start=0rows=1sort=age desc


performace jetty (jetty.xml)

2011-10-18 Thread Gastone Penzo
Hi,
i just change my solr installation from 1.4 to 3.4..
i can notice that also jetty configuration file (jetty.xml) is changed.
default threads number is higher, theadpool is higher
and other default value are higher. is it normal??

what number of these value do you seems are correct for me?
i have a dedicated machine with 2 solr istances inside
my machine has 8gb of ram and 8 cpu..

i do like 200.000 - 250.000 calls to solr a day...

someone can help me??

- Theads number (min,max and low)
- corepool size and maximum poolsize


*
*


Re: Find Documents with field = maxValue

2011-10-18 Thread Alireza Salimi
Hi Ahmet,

Thanks for your reply, but I want ALL documents with age = max_age.


On Tue, Oct 18, 2011 at 9:59 AM, Ahmet Arslan iori...@yahoo.com wrote:



 --- On Tue, 10/18/11, Alireza Salimi alireza.sal...@gmail.com wrote:

  From: Alireza Salimi alireza.sal...@gmail.com
  Subject: Find Documents with field = maxValue
  To: solr-user@lucene.apache.org
  Date: Tuesday, October 18, 2011, 4:10 PM
  Hi,
 
  It might be a naive question.
  Assume we have a list of Document, each Document contains
  the information of
  a person,
  there is a numeric field named 'age', how can we find those
  Documents whose
  *age* field
  is *max(age) *in one query.

 May be http://wiki.apache.org/solr/StatsComponent?

 Or sort by age?  q=*:*start=0rows=1sort=age desc




-- 
Alireza Salimi
Java EE Developer


Re: performace jetty (jetty.xml)

2011-10-18 Thread Alireza Salimi
Can't you use some profilers to find out about your new performance?
I'm new to Solr, but I think 200,000 req/day is not that many.

On Tue, Oct 18, 2011 at 10:03 AM, Gastone Penzo gastone.pe...@gmail.comwrote:

 Hi,
 i just change my solr installation from 1.4 to 3.4..
 i can notice that also jetty configuration file (jetty.xml) is changed.
 default threads number is higher, theadpool is higher
 and other default value are higher. is it normal??

 what number of these value do you seems are correct for me?
 i have a dedicated machine with 2 solr istances inside
 my machine has 8gb of ram and 8 cpu..

 i do like 200.000 - 250.000 calls to solr a day...

 someone can help me??

 - Theads number (min,max and low)
 - corepool size and maximum poolsize


 *
 *




-- 
Alireza Salimi
Java EE Developer


RE: Find Documents with field = maxValue

2011-10-18 Thread Brandon Ramirez
I don't know anything about your environment, so maybe this doesn't make sense, 
but maybe you can check your source system (database or whatnot) to get the 
max_age, then search for the max_age in your Solr index.

It's not as elegant, but may be a lot easier.

To reduce the risk of interacting with potentially stale data, you may want to 
change your = to = or whatever is appropriate.


Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848 
Software Engineer II | Element K | www.elementk.com


-Original Message-
From: Alireza Salimi [mailto:alireza.sal...@gmail.com] 
Sent: Tuesday, October 18, 2011 10:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Find Documents with field = maxValue

Hi Ahmet,

Thanks for your reply, but I want ALL documents with age = max_age.


On Tue, Oct 18, 2011 at 9:59 AM, Ahmet Arslan iori...@yahoo.com wrote:



 --- On Tue, 10/18/11, Alireza Salimi alireza.sal...@gmail.com wrote:

  From: Alireza Salimi alireza.sal...@gmail.com
  Subject: Find Documents with field = maxValue
  To: solr-user@lucene.apache.org
  Date: Tuesday, October 18, 2011, 4:10 PM Hi,
 
  It might be a naive question.
  Assume we have a list of Document, each Document contains the 
  information of a person, there is a numeric field named 'age', how 
  can we find those Documents whose
  *age* field
  is *max(age) *in one query.

 May be http://wiki.apache.org/solr/StatsComponent?

 Or sort by age?  q=*:*start=0rows=1sort=age desc




--
Alireza Salimi
Java EE Developer


solr/lucene and its database (a silly question)

2011-10-18 Thread lorenlai
Hello expert,

I have just a silly question regarding to Solr/Lucene, pls. 

Where are the importing data stored ? In Lucene or Solr ?
Here is a picture of the architecture.
http://3.bp.blogspot.com/-rTZPN3sm9e0/TjAdqciXHgI/Cs0/N_W_iSAI8cY/s1600/solr_arch.jpg

I mean when importing the data to Lucene. As for my understanding, the data
will be gone through some processes (document processing), then finally it
will store as Index (XML structure???) in Lucene engine ?
Is this correct ? If yes, what kind of Database does Lucene use ? Or how are
the data stored in lucene ?

Thank you for your answer. :-)

Cheers

Loren

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-lucene-and-its-database-a-silly-question-tp3431436p3431436.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr/lucene and its database (a silly question)

2011-10-18 Thread Alireza Salimi
In here:
http://wiki.apache.org/solr/SolrConfigXml#dataDir_parameter

On Tue, Oct 18, 2011 at 10:38 AM, lorenlai loren...@yahoo.com wrote:

 Hello expert,

 I have just a silly question regarding to Solr/Lucene, pls.

 Where are the importing data stored ? In Lucene or Solr ?
 Here is a picture of the architecture.

 http://3.bp.blogspot.com/-rTZPN3sm9e0/TjAdqciXHgI/Cs0/N_W_iSAI8cY/s1600/solr_arch.jpg

 I mean when importing the data to Lucene. As for my understanding, the data
 will be gone through some processes (document processing), then finally it
 will store as Index (XML structure???) in Lucene engine ?
 Is this correct ? If yes, what kind of Database does Lucene use ? Or how
 are
 the data stored in lucene ?

 Thank you for your answer. :-)

 Cheers

 Loren

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/solr-lucene-and-its-database-a-silly-question-tp3431436p3431436.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Alireza Salimi
Java EE Developer


Re: Question about near query order

2011-10-18 Thread Jason, Kim
Thank you for your kind reply.

Is it possible only defType=lucnee in your second suggestion?
I'm using ComplexPhraseQueryParser.
So my defType is complexphrase.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-about-near-query-order-tp3427312p3431465.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr/lucene and its database (a silly question)

2011-10-18 Thread Robert Stewart
SOLR stores all data in the directory you specify in solrconfig.xml in dataDir 
setting.

SOLR uses Lucene to store all the data in one or more proprietary binary files 
called segment files.  As a SOLR user typically you should not be too concerned 
with binary index structure.  You can see details here (some details may be out 
of date):  http://lucene.apache.org/java/2_3_2/fileformats.html

Bob


On Oct 18, 2011, at 10:38 AM, lorenlai wrote:

 Hello expert,
 
 I have just a silly question regarding to Solr/Lucene, pls. 
 
 Where are the importing data stored ? In Lucene or Solr ?
 Here is a picture of the architecture.
 http://3.bp.blogspot.com/-rTZPN3sm9e0/TjAdqciXHgI/Cs0/N_W_iSAI8cY/s1600/solr_arch.jpg
 
 I mean when importing the data to Lucene. As for my understanding, the data
 will be gone through some processes (document processing), then finally it
 will store as Index (XML structure???) in Lucene engine ?
 Is this correct ? If yes, what kind of Database does Lucene use ? Or how are
 the data stored in lucene ?
 
 Thank you for your answer. :-)
 
 Cheers
 
 Loren
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/solr-lucene-and-its-database-a-silly-question-tp3431436p3431436.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: IndexBasedSpellChecker on multiple fields

2011-10-18 Thread Dyer, James
Simone,

You can set up a master dictionary but with a few caveats.  What you'll need 
to do is copyfield all of the fields you want to include in your master 
dictionary into one field and base your IndexBasedSpellChecker dictionary on 
that.  In addition, I would recommend you use the collate feature and set 
spellcheck.maxCollationTries to something greater than zero (5-10 is usually 
good).  Otherwise, you probably will get a lot of ridiculous suggestions from 
it trying to correct words from one field with values from another.  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate for more 
information.

There is still a big problem with approach, however.  Unless you set 
onlyMorePopular=true, Solr will never suggest a correction for a word that 
exists in the dictionary.  By creating a huge master dictionary, you will be 
increasing the chances that Solr will assume your users' misspelled words are 
in fact correct.  One way to work around this is instead of blindly using 
copyField, to hand-pick a subset of your terms for the master field on which 
you base your dictionary.  Another workaround is to use onlyMorePopular, 
although this has its own problems.  See the discussion for SOLR-2585 
(https://issues.apache.org/jira/browse/SOLR-2585), which aims to solve these 
problems.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: simone.trip...@gmail.com [mailto:simone.trip...@gmail.com] On Behalf Of 
Simone Tripodi
Sent: Tuesday, October 18, 2011 7:06 AM
To: solr-user@lucene.apache.org
Subject: IndexBasedSpellChecker on multiple fields

Hi all guys,
I need to configure the IndexBasedSpellChecker that uses more than
just one field as a spelling dictionary, is it possible to achieve?
In the meanwhile I configured two spellcheckers and let users switch
from a checkeer to another via params on GET request, but looks like
people are not particularly happy about it...
The main problem is that fields I need to speel contain different
informations, I mean the intersection between the two sets could be
empty.
Many thanks in advance, all the best!
Simo

http://people.apache.org/~simonetripodi/
http://simonetripodi.livejournal.com/
http://twitter.com/simonetripodi
http://www.99soft.org/


Re: Question about near query order

2011-10-18 Thread Ahmet Arslan
 Is it possible only defType=lucnee in your second
 suggestion?
 I'm using ComplexPhraseQueryParser.
 So my defType is complexphrase.

Oh, then life is easy. Just setting the inOrder parameter to false in 
solrconfig.xml should do the trick.

queryParser name=complexphrase 
class=org.apache.solr.search.ComplexPhraseQParserPlugin
bool name=inOrderfalse/bool
/queryParser




Re: How to change default operator in velocity?

2011-10-18 Thread Jan Høydahl
Rather than deleting the dismax config, I would recommend adding a new entry 
inside your /browse request handler config's lst name=defaults tag:

str name=mm0/str

This will go OR mode, and you will still benefit from all the advantages that 
DisMax gives you for weighted search across different fields. See 
http://wiki.apache.org/solr/DisMaxQParserPlugin to learn more about DisMax.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 18. okt. 2011, at 15:56, hadi wrote:

 thanks for your reply,i delete the dismax conf from solrconf.xml and
 it works,is it any side effect?
 
 On 10/18/11, Jan Høydahl / Cominvent [via Lucene]
 ml-node+s472066n3431189...@n3.nabble.com wrote:
 
 
 Hi,
 
 The reason why AND is default with /browse is that it uses the dismax
 query parser, which does not currently respect defaultOperator.
 If you want an OR like behaviour, try to add at the end of the url: mm=0
 (which means minumum number of terms that should match=0), e.g.
 http://localhost:8983/solr/browse?q=samsung+maxtormm=0
 
 For more about mm, see
 http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29
 NB: In trunk (4.0), even dismax will respect the defaultOperator from
 schema.
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com
 
 On 18. okt. 2011, at 12:36, hadi wrote:
 
 in solr schema the defaultOperator value is OR but when i use
 browse(http://localhost:8983/solr/browse)for searching AND is a
 defaultOperator,and that config in solr is not affect on velocity how can
 i
 change the velocity template engine default operators?
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3430871.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 ___
 If you reply to this email, your message will be added to the discussion
 below:
 http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3431189.html
 
 To unsubscribe from How to change default operator in velocity?, visit
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3430871code=bWQuYW5iYXJpQGdtYWlsLmNvbXwzNDMwODcxfC02NDQ5ODMwMjM=
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3431294.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Query with star returns double type values equal 0

2011-10-18 Thread romain
Hi iorixxx,

I am using lucene

On Monday, October 17, 2011 5:58:31 PM, iorixxx [via Lucene] wrote:
  I am experiencing an unexpected behavior using solr 3.4.0.
 
  if my query includes a star, all the properties of type
  'long' or 'LatLon'
  have 0 as value
  (ex: select/?start=0q=way*rows=10version=2)
 
  Though the same request without stars returns correct
  values
  (ex: select/?start=0q=wayrows=10version=2)
 
  Does anyone have an idea?

 Please keep in mind that wildcard queries are not analyzed.

 What query parser are you using? lucene, dismax, edismax?




 
 If you reply to this email, your message will be added to the 
 discussion below:
 http://lucene.472066.n3.nabble.com/Query-with-star-returns-double-type-values-equal-0-tp3428721p3429578.html
  

 To unsubscribe from Query with star returns double type values equal 
 0, click here 
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3428721code=cm9tYWluLmR1cGFzQGdtYWlsLmNvbXwzNDI4NzIxfDE3MzgwNjIyOTA=.
  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-with-star-returns-double-type-values-equal-0-tp3428721p3432312.html
Sent from the Solr - User mailing list archive at Nabble.com.

Access Document Score in Custom Function Query (ValueSource)

2011-10-18 Thread sangrish

Hi,

 I use the following 2 components in ranking documents:

Normal Query :  myField^2

Custom Function Query(ValueSource): myFunc()

In this value source I compute another score for every document
using some features. I want to  access the score of the query myField^2 
(for a given document) in this same value source.

Ideas?

Thanks
Sid



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Access-Document-Score-in-Custom-Function-Query-ValueSource-tp3432459p3432459.html
Sent from the Solr - User mailing list archive at Nabble.com.


Term Frequency - tf() ?

2011-10-18 Thread Hung Huynh
I've revised the tf() function to always return 1, regardless of the number
of terms it finds.

However, I run into a problem when a stemming words and root words appear
together. These documents get a higher boost than documents with just the
root.
For example: woman walking fast gets tf(woman) = 1
 woman walking fast women walking fast gets tf(woman) = 1
and tf(women) = 1, resulting in higher score than just woman

Is there a way to always return 1 for tf(), regardless of stemming words or
synonyms?

Thanks,
Hung



Dismax boost + payload boost

2011-10-18 Thread Milan Dobrota
Is it possible to combine dismax boost (query time) and payload boost (index
time)?

I've done something very similar to this post
http://sujitpal.blogspot.com/2011/01/payloads-with-solr.html but it seems
that query time boosts get ignored.


Re: Find Documents with field = maxValue

2011-10-18 Thread Otis Gospodnetic
Hi,

Are you just looking for:

age:target age

This will return all documents/records where age field is equal to target age.

But maybe you want

age:[0 TO target age here]

This will include people aged from 0 to target age.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



From: Alireza Salimi alireza.sal...@gmail.com
To: solr-user@lucene.apache.org
Sent: Tuesday, October 18, 2011 10:15 AM
Subject: Re: Find Documents with field = maxValue

Hi Ahmet,

Thanks for your reply, but I want ALL documents with age = max_age.


On Tue, Oct 18, 2011 at 9:59 AM, Ahmet Arslan iori...@yahoo.com wrote:



 --- On Tue, 10/18/11, Alireza Salimi alireza.sal...@gmail.com wrote:

  From: Alireza Salimi alireza.sal...@gmail.com
  Subject: Find Documents with field = maxValue
  To: solr-user@lucene.apache.org
  Date: Tuesday, October 18, 2011, 4:10 PM
  Hi,
 
  It might be a naive question.
  Assume we have a list of Document, each Document contains
  the information of
  a person,
  there is a numeric field named 'age', how can we find those
  Documents whose
  *age* field
  is *max(age) *in one query.

 May be http://wiki.apache.org/solr/StatsComponent?

 Or sort by age?  q=*:*start=0rows=1sort=age desc




-- 
Alireza Salimi
Java EE Developer




Re: performace jetty (jetty.xml)

2011-10-18 Thread Otis Gospodnetic
Gastone,

Those numbers are probably OK.  Let us know if you have any actual problems 
with Solr 3.4.  Oh, and use the solr-user mailing list instead please.

Otis


Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



From: Gastone Penzo gastone.pe...@gmail.com
To: solr-user@lucene.apache.org; d...@lucene.apache.org
Sent: Tuesday, October 18, 2011 10:03 AM
Subject: performace jetty (jetty.xml)


Hi,
i just change my solr installation from 1.4 to 3.4..
i can notice that also jetty configuration file (jetty.xml) is changed.
default threads number is higher, theadpool is higher
and other default value are higher. is it normal??

what number of these value do you seems are correct for me?
i have a dedicated machine with 2 solr istances inside
my machine has 8gb of ram and 8 cpu..

i do like 200.000 - 250.000 calls to solr a day...

someone can help me??

- Theads number (min,max and low)
- corepool size and maximum poolsize









Re: Find Documents with field = maxValue

2011-10-18 Thread Sujit Pal
Hi Alireza,

Would this work? Sort the results by age desc, then loop through the
results as long as age == age[0].

-sujit

On Tue, 2011-10-18 at 15:23 -0700, Otis Gospodnetic wrote:
 Hi,
 
 Are you just looking for:
 
 age:target age
 
 This will return all documents/records where age field is equal to target age.
 
 But maybe you want
 
 age:[0 TO target age here]
 
 This will include people aged from 0 to target age.
 
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/
 
 
 
 From: Alireza Salimi alireza.sal...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, October 18, 2011 10:15 AM
 Subject: Re: Find Documents with field = maxValue
 
 Hi Ahmet,
 
 Thanks for your reply, but I want ALL documents with age = max_age.
 
 
 On Tue, Oct 18, 2011 at 9:59 AM, Ahmet Arslan iori...@yahoo.com wrote:
 
 
 
  --- On Tue, 10/18/11, Alireza Salimi alireza.sal...@gmail.com wrote:
 
   From: Alireza Salimi alireza.sal...@gmail.com
   Subject: Find Documents with field = maxValue
   To: solr-user@lucene.apache.org
   Date: Tuesday, October 18, 2011, 4:10 PM
   Hi,
  
   It might be a naive question.
   Assume we have a list of Document, each Document contains
   the information of
   a person,
   there is a numeric field named 'age', how can we find those
   Documents whose
   *age* field
   is *max(age) *in one query.
 
  May be http://wiki.apache.org/solr/StatsComponent?
 
  Or sort by age?  q=*:*start=0rows=1sort=age desc
 
 
 
 
 -- 
 Alireza Salimi
 Java EE Developer
 
 
 



Re: How to retreive multiple documents using one unique field?

2011-10-18 Thread Otis Gospodnetic
This won't work.  But you could add all 4 docs with the same 123 value in 
their id fields, just comment out uniqueKey field.  Don't ask me what will or 
will not happen when you later try updating a document with id:123...

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



From: kiran.bodigam kiran.bodi...@gmail.com
To: solr-user@lucene.apache.org
Sent: Tuesday, October 18, 2011 7:14 AM
Subject: How to retreive multiple documents using one unique field?

I have four different documents in single xml file(to be indexed), i don't
want inject the unique field  for each and every document .when i search
with with the unique field all the four documents should come in result.i.e
can common unique field should be applied to the all documents?
My xml format :
add
docfield/field/doc
docfield/field/doc
docfield/field/doc
docfield/field/doc
commonuniqueid123/id/commonunique
/add
If i search for 123 all the four documents should come Is it possible ?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-retreive-multiple-documents-using-one-unique-field-tp3430931p3430931.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: OS Cache - Solr

2011-10-18 Thread Otis Gospodnetic
Maybe your Solr Document cache is big and that's consuming a big part of that 
JVM heap?
If you want to be able to run with a smaller heap, consider making your caches 
smaller.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



From: Sujatha Arun suja.a...@gmail.com
To: solr-user@lucene.apache.org
Sent: Tuesday, October 18, 2011 12:53 AM
Subject: Re: OS Cache - Solr

Hello Jan,

Thanks for your response and  clarification.

We are monitoring the JVM cache utilization and we are currently using about
18 GB of the 20 GB assigned to JVM. Out total index size being abt 14GB

Regards
Sujatha

On Tue, Oct 18, 2011 at 1:19 AM, Jan Høydahl jan@cominvent.com wrote:

 Hi Sujatha,

 Are you sure you need 20Gb for Tomcat? Have you profiled using JConsole or
 similar? Try with 15Gb and see how it goes. The reason why this is
 beneficial is that you WANT your OS to have available memory for disk
 caching. If you have 17Gb free after starting Solr, your OS will be able to
 cache all index files in memory and you get very high search performance.
 With your current settings, there is only 12Gb free for both caching the
 index and for your MySql activities.  Chances are that when you backup
 MySql, the cached part of your Solr index gets flushed from disk caches and
 need to be re-cached later.

 How to interpret memory stats vary between OSes, and seing 163Mb free may
 simply mean that your OS has used most RAM for various caches and paging,
 but will flush it once an application asks for more memory. Have you seen
 http://wiki.apache.org/solr/SolrPerformanceFactors ?

 You should also slim down your index maximally by setting stored=false and
 indexed=false wherever possible. I would also upgrade to a more current Solr
 version.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 On 17. okt. 2011, at 19:51, Sujatha Arun wrote:

  Hello
 
  I am trying to understand the  OS cache utilization of Solr .Our server
 has
  several solr instances on a server .The total combined Index size of all
  instances is abt 14 Gb and the size of the maximum single Index is abt
 2.5
  GB .
 
  Our Server has Quad processor with 32 GB RAM .Out of which 20 GB has been
  assigned to  JVM. We are running solr1.3  on tomcat 5.5 and Java 1.6
 
  Our current Statistics indicate that  solr uses 18-19 GB of 20 GB RAM
  assigned to JVM .However the  Free physical seems to remain constant as
  below.
  Free physical memory = 163 Mb
  Total physical memory = 32,232 Mb,
 
  The server also serves as a backup server for Mysql where the application
 DB
  is backed up and restored .During this activity we see that lot of
 queries
  that nearly take even 10+ minutes to execute .But other wise
  maximum query time is less than  1-2 secs
 
  The physical memory that is free seems to be constant . Why is this
 constant
  and how this will be used between the  Mysql backup and solr while
  backup activity is  happening How much free physical memory should be
  available to OS given out stats.?
 
  Any pointers would be helpful.
 
  Regards
  Sujatha






score based on unique words matching???

2011-10-18 Thread Craig Stadler

Heres my problem :

field1 (text) - subject
q=david bowie changes

Problem : If a record mentions david bowie a lot, it beats out something 
more relevant (more unique matches) ...


A. (now appearing david bowie at the cineplex 7pm david bowie goes on 
stage, then mr. bowie will sign autographs)

B. song :david bowie - changes

(A) ends up more relevant because of the frequency or number of words in 
it.. not cool...

I want it so the number of words matching will trump density/weight

Thanks im a newbie.
-Craig 




Hit search-lucene.com a little harder

2011-10-18 Thread Otis Gospodnetic
Hello folks,

Do you ever use http://search-lucene.com (SL) or http://search-hadoop.com (SH)?

If you do, I'd like to ask you for a small favour:
We are at Lucene Eurocon in Barcelona and we are about to show the Search 
Analytics [1] and Performance Monitoring [2] tools/services we've built and 
that we use on these two sites.
We would like to show the audience various pretty graphs and would love those 
graph to be a little less sparse. :)

So if you use SL and/or SH, please feel free to use them a little extra now, 
if you feel like helping.

[1] http://sematext.com/search-analytics/index.html
[2] http://sematext.com/spm/solr-performance-monitoring/index.html

I think we'll open up both of the above services to the public tomorrow (and 
100% free for undetermined length of time), but if you don't have time to sign 
up and set it up for yourself, yet are interested in reports, graphs, etc., let 
me know and we'll put together a blog post or something and include interesting 
things in it.

Thanks,
Otis


changing base URLs in indexes

2011-10-18 Thread Fred Zimmerman
Hi,

I am getting ready to index a recent copy of Wikipedia's pages-articles
dump.  I have two servers, foo and bar.  On foo.com/mediawiki I have a
Mediawiki install serving up the pages. On bar.com/solr I have my solr
install. I have the pages-articles.xml file from Wikipedia and the solr
instructions  at
http://wiki.apache.org/solr/DataImportHandler#Example:_Indexing_wikipedia.
 It looks pretty straightforward but I have a couple of preparatory
questions.

If I index the pages-articles.xml on bar.com/solr, they will then be
pointing to the relative links on solr.com/mediawiki, which don't exist,
right?  So is there a way to tell solr that the base url for a bunch of
index records is different than what it thinks they are? Or would it be
easier simply to put a solr installation on foo.com?




\

FredZ


Re: Instructions for Multiple Server Webapps Configuring with JNDI

2011-10-18 Thread Shawn Heisey

On 10/18/2011 6:59 AM, Tod wrote:
One more question, is there a particular advantage of multiple solr 
instances vs. multiple solr cores?


One way of doing multiple instances is running more than one copy of 
your container (tomcat/jetty/whatever).  I've never tried to put more 
than one .war file into a container ... I have no idea how to tell each 
one where its solr home is.  It may be possible, but I've never tried.  
Either way, you'd end up with overhead because a certain amount of 
memory is required just to get each copy of Solr started.  There is some 
additional flexibility with multiple containers - they can be easily 
stopped and started independently at the OS level.


With cores, there isn't as much overhead because there's only one 
application running, handling multiple indexes.  There is some ability 
to load/unload each index independently with CoreAdmin, but it's not 
controllable at the OS level.  In a well designed full system that 
includes software and hardware redundancy, being unable to independently 
stop/start an index isn't much of a worry.


Thanks,
Shawn



Re: Question about near query order

2011-10-18 Thread Jason, Kim
Thanks a ton iorixxx.

Jason.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-about-near-query-order-tp3427312p3432922.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: changing base URLs in indexes

2011-10-18 Thread Markus Jelsma
Is this a crawler indexing the pages? If so, i would point it to whatever you 
need. If, for some reason, you cannot, you can modifiy the host/domain in your 
index using pattern char filters or maybe the stored (returned) values using a 
custom update processor.

 Hi,
 
 I am getting ready to index a recent copy of Wikipedia's pages-articles
 dump.  I have two servers, foo and bar.  On foo.com/mediawiki I have a
 Mediawiki install serving up the pages. On bar.com/solr I have my solr
 install. I have the pages-articles.xml file from Wikipedia and the solr
 instructions  at
 http://wiki.apache.org/solr/DataImportHandler#Example:_Indexing_wikipedia.
  It looks pretty straightforward but I have a couple of preparatory
 questions.
 
 If I index the pages-articles.xml on bar.com/solr, they will then be
 pointing to the relative links on solr.com/mediawiki, which don't exist,
 right?  So is there a way to tell solr that the base url for a bunch of
 index records is different than what it thinks they are? Or would it be
 easier simply to put a solr installation on foo.com?
 
 
 
 
 \
 
 FredZ


use lucene to create index(with synonym) and solr query index

2011-10-18 Thread cmd
1.use lucene to create index(with synonym) 
2.config solr open synonym functionality
3.user solr to query lucene index but the result missing the synonym word
why? and how can i do with each other. thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/use-lucene-to-create-index-with-synonym-and-solr-query-index-tp3433124p3433124.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: OS Cache - Solr

2011-10-18 Thread Sujatha Arun
Thanks ,Otis,

This is our Solr Cache  Allocation.We have the same Cache allocation for all
our *200+ instances* in the single Server.Is this too high?

*Query Result Cache*:LRU Cache(maxSize=16384, initialSize=4096,
autowarmCount=1024, )

*Document Cache *:LRU Cache(maxSize=16384, initialSize=16384)


*Filter Cache* LRU Cache(maxSize=16384, initialSize=4096,
autowarmCount=4096, )

Regards
Sujatha

On Wed, Oct 19, 2011 at 4:05 AM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:

 Maybe your Solr Document cache is big and that's consuming a big part of
 that JVM heap?
 If you want to be able to run with a smaller heap, consider making your
 caches smaller.

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/


 
 From: Sujatha Arun suja.a...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, October 18, 2011 12:53 AM
 Subject: Re: OS Cache - Solr
 
 Hello Jan,
 
 Thanks for your response and  clarification.
 
 We are monitoring the JVM cache utilization and we are currently using
 about
 18 GB of the 20 GB assigned to JVM. Out total index size being abt 14GB
 
 Regards
 Sujatha
 
 On Tue, Oct 18, 2011 at 1:19 AM, Jan Høydahl jan@cominvent.com
 wrote:
 
  Hi Sujatha,
 
  Are you sure you need 20Gb for Tomcat? Have you profiled using JConsole
 or
  similar? Try with 15Gb and see how it goes. The reason why this is
  beneficial is that you WANT your OS to have available memory for disk
  caching. If you have 17Gb free after starting Solr, your OS will be able
 to
  cache all index files in memory and you get very high search
 performance.
  With your current settings, there is only 12Gb free for both caching the
  index and for your MySql activities.  Chances are that when you backup
  MySql, the cached part of your Solr index gets flushed from disk caches
 and
  need to be re-cached later.
 
  How to interpret memory stats vary between OSes, and seing 163Mb free
 may
  simply mean that your OS has used most RAM for various caches and
 paging,
  but will flush it once an application asks for more memory. Have you
 seen
  http://wiki.apache.org/solr/SolrPerformanceFactors ?
 
  You should also slim down your index maximally by setting stored=false
 and
  indexed=false wherever possible. I would also upgrade to a more current
 Solr
  version.
 
  --
  Jan Høydahl, search solution architect
  Cominvent AS - www.cominvent.com
  Solr Training - www.solrtraining.com
 
  On 17. okt. 2011, at 19:51, Sujatha Arun wrote:
 
   Hello
  
   I am trying to understand the  OS cache utilization of Solr .Our
 server
  has
   several solr instances on a server .The total combined Index size of
 all
   instances is abt 14 Gb and the size of the maximum single Index is abt
  2.5
   GB .
  
   Our Server has Quad processor with 32 GB RAM .Out of which 20 GB has
 been
   assigned to  JVM. We are running solr1.3  on tomcat 5.5 and Java 1.6
  
   Our current Statistics indicate that  solr uses 18-19 GB of 20 GB RAM
   assigned to JVM .However the  Free physical seems to remain constant
 as
   below.
   Free physical memory = 163 Mb
   Total physical memory = 32,232 Mb,
  
   The server also serves as a backup server for Mysql where the
 application
  DB
   is backed up and restored .During this activity we see that lot of
  queries
   that nearly take even 10+ minutes to execute .But other wise
   maximum query time is less than  1-2 secs
  
   The physical memory that is free seems to be constant . Why is this
  constant
   and how this will be used between the  Mysql backup and solr while
   backup activity is  happening How much free physical memory should be
   available to OS given out stats.?
  
   Any pointers would be helpful.
  
   Regards
   Sujatha
 
 
 
 
 



Re: How to change default operator in velocity?

2011-10-18 Thread hadi
thanks a lot,your answer is great

On 10/18/11, Jan Høydahl / Cominvent [via Lucene]
ml-node+s472066n3431940...@n3.nabble.com wrote:


 Rather than deleting the dismax config, I would recommend adding a new entry
 inside your /browse request handler config's lst name=defaults tag:

 str name=mm0/str

 This will go OR mode, and you will still benefit from all the advantages
 that DisMax gives you for weighted search across different fields. See
 http://wiki.apache.org/solr/DisMaxQParserPlugin to learn more about DisMax.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 On 18. okt. 2011, at 15:56, hadi wrote:

 thanks for your reply,i delete the dismax conf from solrconf.xml and
 it works,is it any side effect?

 On 10/18/11, Jan Høydahl / Cominvent [via Lucene]
 ml-node+s472066n3431189...@n3.nabble.com wrote:


 Hi,

 The reason why AND is default with /browse is that it uses the dismax
 query parser, which does not currently respect defaultOperator.
 If you want an OR like behaviour, try to add at the end of the url: mm=0
 (which means minumum number of terms that should match=0), e.g.
 http://localhost:8983/solr/browse?q=samsung+maxtormm=0

 For more about mm, see
 http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29
 NB: In trunk (4.0), even dismax will respect the defaultOperator from
 schema.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 On 18. okt. 2011, at 12:36, hadi wrote:

 in solr schema the defaultOperator value is OR but when i use
 browse(http://localhost:8983/solr/browse)for searching AND is a
 defaultOperator,and that config in solr is not affect on velocity how
 can
 i
 change the velocity template engine default operators?


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3430871.html
 Sent from the Solr - User mailing list archive at Nabble.com.



 ___
 If you reply to this email, your message will be added to the discussion
 below:
 http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3431189.html

 To unsubscribe from How to change default operator in velocity?, visit
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3430871code=bWQuYW5iYXJpQGdtYWlsLmNvbXwzNDMwODcxfC02NDQ5ODMwMjM=


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3431294.html
 Sent from the Solr - User mailing list archive at Nabble.com.



 ___
 If you reply to this email, your message will be added to the discussion
 below:
 http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3431940.html

 To unsubscribe from Solr, visit
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472067code=bWQuYW5iYXJpQGdtYWlsLmNvbXw0NzIwNjd8LTY0NDk4MzAyMw==


--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3433415.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to update document with solrj?

2011-10-18 Thread hadi
I have indexed some files that do not have any tag or description and i want
to add some field without deleting them,how can i update or add info to my
index files with solrj?
my idea for this issue is query on specific file and delete it and add some
info and re index it but i think it is not a good idea


--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-update-document-with-solrj-tp3433434p3433434.html
Sent from the Solr - User mailing list archive at Nabble.com.


add thumnail image for search result

2011-10-18 Thread hadi
I want to know how can i add thumbnail image for my files when i am indexing
files with solrj?
thanks


--
View this message in context: 
http://lucene.472066.n3.nabble.com/add-thumnail-image-for-search-result-tp3433440p3433440.html
Sent from the Solr - User mailing list archive at Nabble.com.