MISSING LICENSE

2012-03-12 Thread Per Steffensen

Hi

Just tried to ant clean test on latest code from trunk. I get a lot of 
MISSING LICENSE messages - e.g.

[licenses] MISSING LICENSE for the following file:
[licenses]   
.../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-3.3.3.jar

[licenses]   Expected locations below:
[licenses]   = 
.../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-ASL.txt
[licenses]   = 
.../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-BSD.txt
[licenses]   = 
.../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-BSD_LIKE.txt
[licenses]   = 
.../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-CDDL.txt
[licenses]   = 
.../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-CPL.txt
[licenses]   = 
.../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-EPL.txt
[licenses]   = 
.../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-MIT.txt
[licenses]   = 
.../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-MPL.txt
[licenses]   = 
.../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-PD.txt
[licenses]   = 
.../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-SUN.txt
[licenses]   = 
.../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-COMPOUND.txt
[licenses]   = 
.../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-FAKE.txt


$ ant -version
Apache Ant(TM) version 1.8.2 compiled on October 14 2011

What might be wrong?

Regards, Per Steffensen


Performance (responsetime) on request

2012-03-12 Thread Ramo Karahasan
Hi,

 

i've got two virtual machines in the same subnet at the same
hostingprovider. On one machine my webapplication is running, on the second
a solr instance. In solr I use the following



fieldType name=text_auto class=solr.TextField

analyzer type=index

!--tokenizer class=solr.KeywordTokenizerFactory/--

tokenizer class=solr.StandardTokenizerFactory/

filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=25
/

 

filter class=solr.LowerCaseFilterFactory/

!--filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=25 /--

/analyzer

analyzer type=query

!--tokenizer class=solr.KeywordTokenizerFactory /--

tokenizer class=solr.StandardTokenizerFactory/

filter class=solr.LowerCaseFilterFactory/

filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=25
/

/analyzer

/fieldType

 

 

fieldType name=text class=solr.TextField positionIncrementGap=100

analyzer

  tokenizer class=solr.WhitespaceTokenizerFactory/

  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1/

  filter class=solr.LowerCaseFilterFactory/

/analyzer

/fieldType 

 

 

 

If I search from my webapplication in my autosuggest box, I get response
times of ~500ms per request. Is it possible to tune solr, so that I get
faster results?

I have no special cache configuration, nor I don't know what to configure
here.

 

Thanks,

Ramo



SOLR Query Intersection

2012-03-12 Thread balaji
Hi , 

   I am trying to Compare three independent queries,intersection among them
and draw an Venn diagram using the Google CHART .  By using OR I will be
able to get the union of the 3 fields and using AND I will be able to get
the intersection among the three , Is it possible to get the union and
intersection among the fields in a  same query 

For ex :

I have 3 values which is under Multi-valued field browsers  Google ,
Firefox and IE   . I just need to find the no.of documents having only
google , Firefox etc.. and no.of documents having all the three and an
intersection among them like Google  IE  , Google  Firefox

   Is it possible to do with the Query Intersections or do I need to write
separate queries for all the above , If not please suggest how it can be
achieved


Thanks
Balaji 
   

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Query-Intersection-tp3818756p3818756.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR Query Intersection

2012-03-12 Thread Mikhail Khludnev
It sounds like facets http://wiki.apache.org/solr/SolrFacetingOverview .
Doesn't it?

On Mon, Mar 12, 2012 at 1:16 PM, balaji mcabal...@gmail.com wrote:

 Hi ,

   I am trying to Compare three independent queries,intersection among them
 and draw an Venn diagram using the Google CHART .  By using OR I will be
 able to get the union of the 3 fields and using AND I will be able to get
 the intersection among the three , Is it possible to get the union and
 intersection among the fields in a  same query

 For ex :

I have 3 values which is under Multi-valued field browsers  Google ,
 Firefox and IE   . I just need to find the no.of documents having only
 google , Firefox etc.. and no.of documents having all the three and an
 intersection among them like Google  IE  , Google  Firefox

   Is it possible to do with the Query Intersections or do I need to write
 separate queries for all the above , If not please suggest how it can be
 achieved


 Thanks
 Balaji


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SOLR-Query-Intersection-tp3818756p3818756.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


List of recommendation engines with solr

2012-03-12 Thread Rohan
Hi All,

I would require list of recs engine which can be integrated with solr and
also suggest best one out of this.

any comments would be appriciated!!

Thanks,
Rohan

--
View this message in context: 
http://lucene.472066.n3.nabble.com/List-of-recommendation-engines-with-solr-tp3818917p3818917.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to index doc file in solr?

2012-03-12 Thread Rohan
Hi Erick,

Thanks for the valuable comments on this.

See i have few set of word docs file and i would like to index meta data
part includeing the content of the page , so is there any way to complete
this task?

Need your comments on this.

Thanks,
Rohan

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-index-doc-file-in-solr-tp3806543p3818938.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to ignore indexing of duplicated documents?

2012-03-12 Thread Marc Sturlese
http://wiki.apache.org/solr/Deduplication

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-ignore-indexing-of-duplicated-documents-tp3814858p3818973.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCore error

2012-03-12 Thread Nikhila Pala
Hi,

I'm getting some exceptions while shutting the hybris server and the exception 
details are specifies in the file attached to this mail.  Please try to resolve 
it as soon as possible.

Thanks  Regards,
Nikhila Pala
Systems engineer
Infosys Technologies Limited



 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are 
not
to copy, disclose, or distribute this e-mail or its contents to any other 
person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken
every reasonable precaution to minimize this risk, but is not liable for any 
damage
you may sustain as a result of any virus in this e-mail. You should carry out 
your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this 
e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


Re: Solr 4.0

2012-03-12 Thread Jan Høydahl
Hi Robert,

See http://wiki.apache.org/solr/Solr4.0
The developer community is working towards a 4.0-Alpha release expected in a 
few months, however no dates are fixed.
Many already use a snapshot version of TRUNK. You are free to do so, at your 
own risk.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 12. mars 2012, at 03:15, Robert Yu wrote:

 What's status of Solr 4.0? is there anyone start to use it? I heard it
 support real time update index, I'm interested in this feature.
 
 Thanks,
 
 
 
 
 Robert Yu
 
 Platform Service - Backend
 
 Morningstar Shenzhen Ltd.
 
 Morningstar. Illuminating investing worldwide.
 
 +86 755 3311-0223 voice
 +86 137-2377-0925 mobile
 +86 755 - fax
 robert...@morningstar.com
 
 8FL, Tower A, Donghai International Center ( or East Pacific
 International Center)
 
 7888 Shennan Road, Futian district,
 
 Shenzhen, Guangdong province, China 518040
 
 http://cn.morningstar.com http://cn.morningstar.com  
 
 This e-mail contains privileged and confidential information and is
 intended only for the use of the person(s) named above. Any
 dissemination, distribution, or duplication of this communication
 without prior written consent from Morningstar is strictly prohibited.
 If you have received this message in error, please contact the sender
 immediately and delete the materials from any computer.
 



Re: SOLR Query Intersection

2012-03-12 Thread balaji
Hi Mikhail,

   Yes I am trying to get the facets counts for all these and populate the
chart , but comparison between the values is what I am wondering 

Will facets handle all the 3 possible scenarios

Thanks
Balaji

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Query-Intersection-tp3818756p3819111.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Does the lucene support the substring search?

2012-03-12 Thread Ahmet Arslan
 Return to the post, I would like to know about whether the
 lucene support
 the substring search or not.
 As you can see, one field of my document is long string
 filed without any
 spaces. It means the token doesn't work here. Suppose I want
 to search a
 string TARCSV in my documents. I want to return the sample
 record from my
 document set. I try the Wildcard search and Fuzzy search
 both. But neither
 seems work. I am very sure whether I do all things right in
 the index and
 parse stage. Do you any one has the experience in the
 substring search?

Yes it is possible. Two different approaches are described in a recent thread. 
http://search-lucene.com/m/Wicj8UB0gl2

One of them uses both trailing and leading wildcard, e.g. q=*TARCSV*

Other approach makes use of NGramFilterFactry at index time only.

It seems that you will be dealing with extremely long tokens. It is a good idea 
to increase maxTokenLength (default value is 255)
SOLR-2188 Tokens longer than this are silently ignored. 



Re: SOLR Query Intersection

2012-03-12 Thread balaji
Hi,

   I got your point are you suggesting me to run using the *facet.query*
param for the various combinations

Thanks
Balaji

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Query-Intersection-tp3818756p3819165.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR Query Intersection

2012-03-12 Thread Erik Hatcher
I've done exactly this, rendering Venn diagrams using Google Charts from Solr.  
See my presentation here:

http://www.slideshare.net/erikhatcher/rapid-prototyping-with-solr-5675936

See slides 26-29, even with full code in the slides, but the code is also 
available here: 


https://github.com/erikhatcher/solr-rapid-prototyping/tree/master/ApacheCon2010

And, yup, facet.query was leveraged for this.

Erik


On Mar 12, 2012, at 05:16 , balaji wrote:

 Hi , 
 
   I am trying to Compare three independent queries,intersection among them
 and draw an Venn diagram using the Google CHART .  By using OR I will be
 able to get the union of the 3 fields and using AND I will be able to get
 the intersection among the three , Is it possible to get the union and
 intersection among the fields in a  same query 
 
 For ex :
 
I have 3 values which is under Multi-valued field browsers  Google ,
 Firefox and IE   . I just need to find the no.of documents having only
 google , Firefox etc.. and no.of documents having all the three and an
 intersection among them like Google  IE  , Google  Firefox
 
   Is it possible to do with the Query Intersections or do I need to write
 separate queries for all the above , If not please suggest how it can be
 achieved
 
 
 Thanks
 Balaji 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SOLR-Query-Intersection-tp3818756p3818756.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: MISSING LICENSE

2012-03-12 Thread Shawn Heisey

On 3/12/2012 1:24 AM, Per Steffensen wrote:

$ ant -version
Apache Ant(TM) version 1.8.2 compiled on October 14 2011

What might be wrong?


If you check lucene/BUILD.txt in your source, it says to use ant 1.7.1 
or later, but not 1.8.x.  This is from a recent trunk checkout:


Basic steps:
  0) Install JDK 1.6 (or greater), Ant 1.7.1+ (not 1.6.x, not 1.8.x)
  1) Download Lucene from Apache and unpack it
  2) Connect to the top-level of your Lucene installation
  3) Install JavaCC (optional)
  4) Run ant

A previous message on the mailing list about the missing license 
messages (from 2012-02-23) says that some work has been done to get it 
working with ant 1.8, but it's not done yet.  Can you downgrade or 
install the older release in an alternate location?


It looks like ant 1.8 has been out for two years, so newer operating 
systems are going to be shipping with it and it may become difficult to 
get the older ant release.  I know from my own systems that CentOS/RHEL 
6 is still using ant 1.7.1.


Thanks,
Shawn



Re: Performance (responsetime) on request

2012-03-12 Thread Dmitry Kan
If you look at solr admin page / statistics of cache, you could check the
evictions of different types of cache. If some of them are larger than
zero, try minimizing them by increasing the corresponding cache params in
the solrconfig.xml.

On Mon, Mar 12, 2012 at 10:12 AM, Ramo Karahasan 
ramo.karaha...@googlemail.com wrote:

 Hi,



 i've got two virtual machines in the same subnet at the same
 hostingprovider. On one machine my webapplication is running, on the second
 a solr instance. In solr I use the following



 fieldType name=text_auto class=solr.TextField

 analyzer type=index

 !--tokenizer class=solr.KeywordTokenizerFactory/--

 tokenizer class=solr.StandardTokenizerFactory/

 filter class=solr.EdgeNGramFilterFactory minGramSize=2
 maxGramSize=25
 /



 filter class=solr.LowerCaseFilterFactory/

 !--filter class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=25 /--

 /analyzer

 analyzer type=query

 !--tokenizer class=solr.KeywordTokenizerFactory /--

 tokenizer class=solr.StandardTokenizerFactory/

 filter class=solr.LowerCaseFilterFactory/

 filter class=solr.EdgeNGramFilterFactory minGramSize=2
 maxGramSize=25
 /

 /analyzer

 /fieldType





 fieldType name=text class=solr.TextField positionIncrementGap=100

 analyzer

  tokenizer class=solr.WhitespaceTokenizerFactory/

  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1/

  filter class=solr.LowerCaseFilterFactory/

 /analyzer

 /fieldType







 If I search from my webapplication in my autosuggest box, I get response
 times of ~500ms per request. Is it possible to tune solr, so that I get
 faster results?

 I have no special cache configuration, nor I don't know what to configure
 here.



 Thanks,

 Ramo




-- 
Regards,

Dmitry Kan


Re: MISSING LICENSE

2012-03-12 Thread Per Steffensen

Shawn Heisey skrev:

On 3/12/2012 1:24 AM, Per Steffensen wrote:

$ ant -version
Apache Ant(TM) version 1.8.2 compiled on October 14 2011

What might be wrong?


If you check lucene/BUILD.txt in your source, it says to use ant 1.7.1 
or later, but not 1.8.x.  This is from a recent trunk checkout:


Basic steps:
  0) Install JDK 1.6 (or greater), Ant 1.7.1+ (not 1.6.x, not 1.8.x)
  1) Download Lucene from Apache and unpack it
  2) Connect to the top-level of your Lucene installation
  3) Install JavaCC (optional)
  4) Run ant
Ok, thanks. Didnt catch that. Have another checkout of solrtrunk from 
about 2 weeks ago, where I didnt see the problem?!?!?!? In that I am 
able to run ant test etc. without license problems.


A previous message on the mailing list about the missing license 
messages (from 2012-02-23) says that some work has been done to get it 
working with ant 1.8, but it's not done yet.  Can you downgrade or 
install the older release in an alternate location?

Im sure I will manage, now that I know what the problem is. Thanks again.


It looks like ant 1.8 has been out for two years, so newer operating 
systems are going to be shipping with it and it may become difficult 
to get the older ant release.  I know from my own systems that 
CentOS/RHEL 6 is still using ant 1.7.1.


Thanks,
Shawn






Re: MISSING LICENSE

2012-03-12 Thread Yonik Seeley
Over-aggressive license checking code doesn't like jars in extraneous
directories (like the work directory that the war is exploded into
under exampleB).
delete exampleB and the build should work.

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10

On Mon, Mar 12, 2012 at 3:24 AM, Per Steffensen st...@designware.dk wrote:
 Hi

 Just tried to ant clean test on latest code from trunk. I get a lot of
 MISSING LICENSE messages - e.g.
 [licenses] MISSING LICENSE for the following file:
 [licenses]
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-3.3.3.jar
 [licenses]   Expected locations below:
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-ASL.txt
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-BSD.txt
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-BSD_LIKE.txt
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-CDDL.txt
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-CPL.txt
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-EPL.txt
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-MIT.txt
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-MPL.txt
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-PD.txt
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-SUN.txt
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-COMPOUND.txt
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-FAKE.txt

 $ ant -version
 Apache Ant(TM) version 1.8.2 compiled on October 14 2011

 What might be wrong?

 Regards, Per Steffensen


Re: List of recommendation engines with solr

2012-03-12 Thread Gora Mohanty
On 12 March 2012 16:30, Rohan rohan_kumb...@infosys.com wrote:
 Hi All,

 I would require list of recs engine which can be integrated with solr and
 also suggest best one out of this.

 any comments would be appriciated!!

What exactly do you mean by that? Why is integration with Solr
a requirement, and what do you expect to gain by such an integration?
Best also probably depends on the context of your requirements.

There are a variety of open-source recommendation engines.
If you are looking at something from Apache, and in Java, Mahout
might be a good choice.

Regards,
Gora


Re: Faster Solr Indexing

2012-03-12 Thread Erick Erickson
How have you determined that it's the solr add? By timing the call on the
SolrJ side or by looking at the machine where Solr is running? This is the
very first thing you have to answer. You can get a rough ides with any
simple profiler (say Activity Monitor no a Mac, Task Manager on a Windows
box). The point is just to see whether the indexer machine is being
well utilized. I'd guess it's not actually.

One quick experiment would be to try using StreamingUpdateSolrServer
(SUSS), which has the capability of having multiple threads
fire at Solr at once. It is possible that your performance is spent
waiting for I/O.

Once you have that question answered, you can refine. But until you
know which side of the wire the problem is on, you're flying blind.

Both Yandong Peyman:
These times are quite surprising. Running everything locally on my laptop,
I'm indexing between 5-7K documents/second. The source is
the Wikipedia dump.

I'm particularly surprised by the difference Yandong is seeing based
on the various analysis chains. the first thing I'd back off is the
MaxPermSize. 512M is huge for this parameter.
If you're getting that kind of time differential and your CPU isn't
pegged, you're probably swapping in which case you need
to give the processes more memory. I'd just take the MaxPermSize
out completely as a start.

Not sure if you've seen this page, something there might help.
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

But throw a profiler at the indexer as a first step, just to see
where the problem is, CPU or I/O.

Best
Erick

On Sat, Mar 10, 2012 at 4:09 PM, Peyman Faratin pey...@robustlinks.com wrote:
 Hi

 I am trying to index 12MM docs faster than is currently happening in Solr 
 (using solrj). We have identified solr's add method as the bottleneck (and 
 not commit - which is tuned ok through mergeFactor and maxRamBufferSize and 
 jvm ram).

 Adding 1000 docs is taking approximately 25 seconds. We are making sure we 
 add and commit in batches. And we've tried both CommonsHttpSolrServer and 
 EmbeddedSolrServer (assuming removing http overhead would speed things up 
 with embedding) but the differences is marginal.

 The docs being indexed are on average 20 fields long, mostly indexed but none 
 stored. The major size contributors are two fields:

        - content, and
        - shingledContent (populated using copyField of content).

 The length of the content field is (likely) gaussian distributed (few large 
 docs 50-80K tokens, but majority around 2k tokens). We use shingledContent to 
 support phrase queries and content for unigram queries (following the advice 
 of Solr Enterprise search server advice - p. 305, section The Solution: 
 Shingling).

 Clearly the size of the docs is a contributor to the slow adds (confirmed by 
 removing these 2 fields resulting in halving the indexing time). We've tried 
 compressed=true also but that is not working.

 Any guidance on how to support our application logic (without having to 
 change the schema too much) and speed the indexing speed (from current 212 
 days for 12MM docs) would be much appreciated.

 thank you

 Peyman



AW: Performance (responsetime) on request

2012-03-12 Thread Ramo Karahasan
Hi,

this are the results form the solr admin page for cache:


name:   queryResultCache  
class:  org.apache.solr.search.LRUCache  
version:1.0  
description:LRU Cache(maxSize=512, initialSize=512)  
stats:  lookups : 376
hits : 246
hitratio : 0.65
inserts : 130
evictions : 0
size : 130
warmupTime : 0
cumulative_lookups : 2994
cumulative_hits : 1934
cumulative_hitratio : 0.64
cumulative_inserts : 1060
cumulative_evictions : 409

name:   fieldCache  
class:  org.apache.solr.search.SolrFieldCacheMBean  
version:1.0  
description:Provides introspection of the Lucene FieldCache, this is 
**NOT** a cache that is managed by Solr.  
stats:  entries_count : 0
insanity_count : 0

name:   documentCache  
class:  org.apache.solr.search.LRUCache  
version:1.0  
description:LRU Cache(maxSize=512, initialSize=512)  
stats:  lookups : 13416
hits : 11787
hitratio : 0.87
inserts : 1629
evictions : 1089
size : 512
warmupTime : 0
cumulative_lookups : 100012
cumulative_hits : 86959
cumulative_hitratio : 0.86
cumulative_inserts : 13053
cumulative_evictions : 11914

name:   fieldValueCache  
class:  org.apache.solr.search.FastLRUCache  
version:1.0  
description:Concurrent LRU Cache(maxSize=1, initialSize=10, 
minSize=9000, acceptableSize=9500, cleanupThread=false)  
stats:  lookups : 0
hits : 0
hitratio : 0.00
inserts : 0
evictions : 0
size : 0
warmupTime : 0
cumulative_lookups : 0
cumulative_hits : 0
cumulative_hitratio : 0.00
cumulative_inserts : 0
cumulative_evictions : 0

name:   filterCache  
class:  org.apache.solr.search.FastLRUCache  
version:1.0  
description:Concurrent LRU Cache(maxSize=512, initialSize=512, minSize=460, 
acceptableSize=486, cleanupThread=false)  
stats:  lookups : 0
hits : 0
hitratio : 0.00
inserts : 0
evictions : 0
size : 0
warmupTime : 0
cumulative_lookups : 0
cumulative_hits : 0
cumulative_hitratio : 0.00
cumulative_inserts : 0
cumulative_evictions : 0


Is there something tob e optimized?

Thanks,
Ramo

-Ursprüngliche Nachricht-
Von: Dmitry Kan [mailto:dmitry@gmail.com] 
Gesendet: Montag, 12. März 2012 15:06
An: solr-user@lucene.apache.org
Betreff: Re: Performance (responsetime) on request

If you look at solr admin page / statistics of cache, you could check the 
evictions of different types of cache. If some of them are larger than zero, 
try minimizing them by increasing the corresponding cache params in the 
solrconfig.xml.

On Mon, Mar 12, 2012 at 10:12 AM, Ramo Karahasan  
ramo.karaha...@googlemail.com wrote:

 Hi,



 i've got two virtual machines in the same subnet at the same 
 hostingprovider. On one machine my webapplication is running, on the 
 second a solr instance. In solr I use the following



 fieldType name=text_auto class=solr.TextField

 analyzer type=index

 !--tokenizer class=solr.KeywordTokenizerFactory/--

 tokenizer class=solr.StandardTokenizerFactory/

 filter class=solr.EdgeNGramFilterFactory minGramSize=2
 maxGramSize=25
 /



 filter class=solr.LowerCaseFilterFactory/

 !--filter class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=25 /--

 /analyzer

 analyzer type=query

 !--tokenizer class=solr.KeywordTokenizerFactory /--

 tokenizer class=solr.StandardTokenizerFactory/

 filter class=solr.LowerCaseFilterFactory/

 filter class=solr.EdgeNGramFilterFactory minGramSize=2
 maxGramSize=25
 /

 /analyzer

 /fieldType





 fieldType name=text class=solr.TextField 
 positionIncrementGap=100

 analyzer

  tokenizer class=solr.WhitespaceTokenizerFactory/

  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1/

  filter class=solr.LowerCaseFilterFactory/

 /analyzer

 /fieldType







 If I search from my webapplication in my autosuggest box, I get 
 response times of ~500ms per request. Is it possible to tune solr, 
 so that I get faster results?

 I have no special cache configuration, nor I don't know what to 
 configure here.



 Thanks,

 Ramo




--
Regards,

Dmitry Kan



Re: Performance (responsetime) on request

2012-03-12 Thread Dmitry Kan
you can optimize the  documentCache by setting  maxSize to some decent
value, like 2000. Also configure some meaningful warming queries in the
solrconfig.
When increasing the cache size, monitor the RAM usage, as that can starting
increasing as well.
Do you / would you need to use filter queries? Those can speed up search as
well through the usage of filterCache.

Dmitry

On Mon, Mar 12, 2012 at 5:12 PM, Ramo Karahasan 
ramo.karaha...@googlemail.com wrote:

 Hi,

 this are the results form the solr admin page for cache:


 name:   queryResultCache
 class:  org.apache.solr.search.LRUCache
 version:1.0
 description:LRU Cache(maxSize=512, initialSize=512)
 stats:  lookups : 376
 hits : 246
 hitratio : 0.65
 inserts : 130
 evictions : 0
 size : 130
 warmupTime : 0
 cumulative_lookups : 2994
 cumulative_hits : 1934
 cumulative_hitratio : 0.64
 cumulative_inserts : 1060
 cumulative_evictions : 409

 name:   fieldCache
 class:  org.apache.solr.search.SolrFieldCacheMBean
 version:1.0
 description:Provides introspection of the Lucene FieldCache, this is
 **NOT** a cache that is managed by Solr.
 stats:  entries_count : 0
 insanity_count : 0

 name:   documentCache
 class:  org.apache.solr.search.LRUCache
 version:1.0
 description:LRU Cache(maxSize=512, initialSize=512)
 stats:  lookups : 13416
 hits : 11787
 hitratio : 0.87
 inserts : 1629
 evictions : 1089
 size : 512
 warmupTime : 0
 cumulative_lookups : 100012
 cumulative_hits : 86959
 cumulative_hitratio : 0.86
 cumulative_inserts : 13053
 cumulative_evictions : 11914

 name:   fieldValueCache
 class:  org.apache.solr.search.FastLRUCache
 version:1.0
 description:Concurrent LRU Cache(maxSize=1, initialSize=10,
 minSize=9000, acceptableSize=9500, cleanupThread=false)
 stats:  lookups : 0
 hits : 0
 hitratio : 0.00
 inserts : 0
 evictions : 0
 size : 0
 warmupTime : 0
 cumulative_lookups : 0
 cumulative_hits : 0
 cumulative_hitratio : 0.00
 cumulative_inserts : 0
 cumulative_evictions : 0

 name:   filterCache
 class:  org.apache.solr.search.FastLRUCache
 version:1.0
 description:Concurrent LRU Cache(maxSize=512, initialSize=512,
 minSize=460, acceptableSize=486, cleanupThread=false)
 stats:  lookups : 0
 hits : 0
 hitratio : 0.00
 inserts : 0
 evictions : 0
 size : 0
 warmupTime : 0
 cumulative_lookups : 0
 cumulative_hits : 0
 cumulative_hitratio : 0.00
 cumulative_inserts : 0
 cumulative_evictions : 0


 Is there something tob e optimized?

 Thanks,
 Ramo

 -Ursprüngliche Nachricht-
 Von: Dmitry Kan [mailto:dmitry@gmail.com]
 Gesendet: Montag, 12. März 2012 15:06
 An: solr-user@lucene.apache.org
 Betreff: Re: Performance (responsetime) on request

 If you look at solr admin page / statistics of cache, you could check the
 evictions of different types of cache. If some of them are larger than
 zero, try minimizing them by increasing the corresponding cache params in
 the solrconfig.xml.

 On Mon, Mar 12, 2012 at 10:12 AM, Ramo Karahasan 
 ramo.karaha...@googlemail.com wrote:

  Hi,
 
 
 
  i've got two virtual machines in the same subnet at the same
  hostingprovider. On one machine my webapplication is running, on the
  second a solr instance. In solr I use the following
 
 
 
  fieldType name=text_auto class=solr.TextField
 
  analyzer type=index
 
  !--tokenizer class=solr.KeywordTokenizerFactory/--
 
  tokenizer class=solr.StandardTokenizerFactory/
 
  filter class=solr.EdgeNGramFilterFactory minGramSize=2
  maxGramSize=25
  /
 
 
 
  filter class=solr.LowerCaseFilterFactory/
 
  !--filter class=solr.EdgeNGramFilterFactory minGramSize=1
  maxGramSize=25 /--
 
  /analyzer
 
  analyzer type=query
 
  !--tokenizer class=solr.KeywordTokenizerFactory /--
 
  tokenizer class=solr.StandardTokenizerFactory/
 
  filter class=solr.LowerCaseFilterFactory/
 
  filter class=solr.EdgeNGramFilterFactory minGramSize=2
  maxGramSize=25
  /
 
  /analyzer
 
  /fieldType
 
 
 
 
 
  fieldType name=text class=solr.TextField
  positionIncrementGap=100
 
  analyzer
 
   tokenizer class=solr.WhitespaceTokenizerFactory/
 
   filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=1 catenateNumbers=1
  catenateAll=0 splitOnCaseChange=1/
 
   filter class=solr.LowerCaseFilterFactory/
 
  /analyzer
 
  /fieldType
 
 
 
 
 
 
 
  If I search from my webapplication in my autosuggest box, I get
  response times of ~500ms per request. Is it possible to tune solr,
  so that I get faster results?
 
  I have no special cache configuration, nor I don't know what to
  configure here.
 
 
 
  Thanks,
 
  Ramo
 
 


 --
 Regards,

 Dmitry Kan




Re: Strange behavior with search on empty string and NOT

2012-03-12 Thread Erick Erickson
Because Lucene query syntax is not a strict Boolean logic system.
There's a good explanation here:
http://www.lucidimagination.com/blog/2011/12/28/why-not-and-or-and-not/

Adding debugQuery=on to your search is your friend G.. You'll see
that your return (at least on 3.5 with going at /solr/select) returns
this as the parsed query:

str name=parsedquery-name:foobar/str

Solr really doesn't have the semantics for empty strings (or NULL for
that matter) so it just gets dropped out.

Best
Erick

On Sun, Mar 11, 2012 at 11:36 PM, Lan dung@gmail.com wrote:
 I am curious why solr results are inconsistent for the query below for an
 empty string search on a TextField.

 q=name: returns 0 results
 q=name: AND NOT name:FOOBAR return all results in the solr index. Should
 it should not return 0 results too?

 Here is the debugQuery.

 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime1/int
 lst name=params
 str name=debugQueryon/str
 str name=indenton/str
 str name=start0/str
 str name=qname: AND NOT name:BLAH232282/str
 str name=rows0/str
 str name=version2.2/str
 /lst
 /lst
 result name=response numFound=3790790 start=0/
 lst name=debug
 str name=rawquerystringname: AND NOT name:BLAH232282/str
 str name=querystringname: AND NOT name:BLAH232282/str
 str name=parsedquery-PhraseQuery(name:blah 232282)/str
 str name=parsedquery_toString-name:blah 232282/str
 lst name=explain/
 str name=QParserLuceneQParser/str
 lst name=timing
 double name=time1.0/double
 lst name=prepare
 double name=time1.0/double
 lst name=org.apache.solr.handler.component.QueryComponent
 double name=time1.0/double
 /lst
 lst name=org.apache.solr.handler.component.FacetComponent
 double name=time0.0/double
 /lst
 lst name=org.apache.solr.handler.component.MoreLikeThisComponent
 double name=time0.0/double
 /lst
 lst name=org.apache.solr.handler.component.HighlightComponent
 double name=time0.0/double
 /lst
 lst name=org.apache.solr.handler.component.StatsComponent
 double name=time0.0/double
 /lst
 lst name=org.apache.solr.handler.component.DebugComponent
 double name=time0.0/double
 /lst
 /lst
 lst name=process
 double name=time0.0/double
 lst name=org.apache.solr.handler.component.QueryComponent
 double name=time0.0/double
 /lst
 lst name=org.apache.solr.handler.component.FacetComponent
 double name=time0.0/double
 /lst
 lst name=org.apache.solr.handler.component.MoreLikeThisComponent
 double name=time0.0/double
 /lst
 lst name=org.apache.solr.handler.component.HighlightComponent
 double name=time0.0/double
 /lst
 lst name=org.apache.solr.handler.component.StatsComponent
 double name=time0.0/double
 /lst
 lst name=org.apache.solr.handler.component.DebugComponent
 double name=time0.0/double
 /lst
 /lst
 /lst
 /lst
 /response


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Strange-behavior-with-search-on-empty-string-and-NOT-tp3818023p3818023.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: 3 Way Solr Join . . ?

2012-03-12 Thread Erick Erickson
I know it goes against the grain here for a DB
person, but... denormalize. Really. Solr does
many things well, but whenever you start
trying to make it do database-like stuff you need
to back up and re-think things.

Simplest thing: Try indexing one record
for each customer/purchase/complaint
triplet. How many records are we talking here
anyway? 30-40M documents will probably
perform admirably on even a small piece
of hardware.

Best
Erick


On Mon, Mar 12, 2012 at 12:55 AM, Angelyna Bola angelyna.b...@gmail.com wrote:
 Bill,

 So sorry - my example is rapidly showing its short comings. The data I
 am actually working with is complex and obscure so I was trying to
 think of an example that was easy to relate to, but still has all the
 relevant characteristics.

 Let me try a better example:

 Let's suppose a Company is selling products and keeps track of
 complaints (which do not relate to any specific purchase):

 Data:

        Table #1: CUSTOMERS    (parent table)
                City
                State
                Zip

        Table #2: PURCHASES    (child table with foreign key to CUSTOMERS)
                Date
                Product Type
                Quantity

        Table #3: COMPLAINTS   (child table with foreign key to CUSTOMERS)
                Date
                Complaint Type
                Complaint Text
                Remediation

 And the company wants to be able to query how their customers buy
 products and complaints.

 The tricky part is company needs to be able to blend string queries
 with date range queires and integer range queries.

 Query:

        CUSTOMERS in Vermont
        and
        PURCHASES within the last 1 year with a Quantity  75
        and
        COMPLAINTS within the last 2 years with a Complaint Type = XYZ and
 Complaint Text contains the words ABC and EFG

 Problem:

 The problem with multi-valued fields is I loose the ability to do
 range queries over numeric attributes (such as Quantity or Date) when
 they only relate to other specific attributes (such as Product or
 Service Type).

 With the Join feature in Solr Trunk, I have no problem joining
 CUSTOMERS to PURCHASES or alternatively joining CUSTOMERS to
 COMPLAINTS. But I do not see a way of joining across all three.

 Hopefully I have done a better job with this example (appreciate your
 patience in trying to help me - I am not always the best at
 explaining).

 Angelyna


Re: SOLR Query Intersection

2012-03-12 Thread balaji
Hi,

   Thank you guys Erik and Mikhail , You saved my day


Thanks
Balaji

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Query-Intersection-tp3818756p3819571.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to index doc file in solr?

2012-03-12 Thread Erick Erickson
Consider using SolrJ, possibly combined with
Tika (which is what underlies Solr Cel).
http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/

AlthoughExtractingRequestHandler
has the capability of indexing metadata as
well if you map the fields.

See: http://wiki.apache.org/solr/ExtractingRequestHandler

Best
Erick


On Mon, Mar 12, 2012 at 11:09 AM, Rohan rohan_kumb...@infosys.com wrote:
 Hi Erick,

 Thanks for the valuable comments on this.

 See i have few set of word docs file and i would like to index meta data
 part includeing the content of the page , so is there any way to complete
 this task?

 Need your comments on this.

 Thanks,
 Rohan

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-to-index-doc-file-in-solr-tp3806543p3818938.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: I wanna subscribe this maillist

2012-03-12 Thread Erick Erickson
Please follow the instructions here:

http://lucene.apache.org/solr/discussion.html

Best
Erick

On Mon, Mar 12, 2012 at 2:35 AM, 刘翀 lc87...@gmail.com wrote:
 I wanna subscribe this maillist


AW: Performance (responsetime) on request

2012-03-12 Thread Ramo Karahasan
Hi,

thanks for you advice. Do you have any documentation on that? I'm not sure, how 
and where to configure this stuff and what impact it has.

Thans,
Ramo

-Ursprüngliche Nachricht-
Von: Dmitry Kan [mailto:dmitry@gmail.com] 
Gesendet: Montag, 12. März 2012 16:21
An: solr-user@lucene.apache.org
Betreff: Re: Performance (responsetime) on request

you can optimize the  documentCache by setting  maxSize to some decent value, 
like 2000. Also configure some meaningful warming queries in the solrconfig.
When increasing the cache size, monitor the RAM usage, as that can starting 
increasing as well.
Do you / would you need to use filter queries? Those can speed up search as 
well through the usage of filterCache.

Dmitry

On Mon, Mar 12, 2012 at 5:12 PM, Ramo Karahasan  
ramo.karaha...@googlemail.com wrote:

 Hi,

 this are the results form the solr admin page for cache:


 name:   queryResultCache
 class:  org.apache.solr.search.LRUCache
 version:1.0
 description:LRU Cache(maxSize=512, initialSize=512)
 stats:  lookups : 376
 hits : 246
 hitratio : 0.65
 inserts : 130
 evictions : 0
 size : 130
 warmupTime : 0
 cumulative_lookups : 2994
 cumulative_hits : 1934
 cumulative_hitratio : 0.64
 cumulative_inserts : 1060
 cumulative_evictions : 409

 name:   fieldCache
 class:  org.apache.solr.search.SolrFieldCacheMBean
 version:1.0
 description:Provides introspection of the Lucene FieldCache, this is
 **NOT** a cache that is managed by Solr.
 stats:  entries_count : 0
 insanity_count : 0

 name:   documentCache
 class:  org.apache.solr.search.LRUCache
 version:1.0
 description:LRU Cache(maxSize=512, initialSize=512)
 stats:  lookups : 13416
 hits : 11787
 hitratio : 0.87
 inserts : 1629
 evictions : 1089
 size : 512
 warmupTime : 0
 cumulative_lookups : 100012
 cumulative_hits : 86959
 cumulative_hitratio : 0.86
 cumulative_inserts : 13053
 cumulative_evictions : 11914

 name:   fieldValueCache
 class:  org.apache.solr.search.FastLRUCache
 version:1.0
 description:Concurrent LRU Cache(maxSize=1, initialSize=10,
 minSize=9000, acceptableSize=9500, cleanupThread=false)
 stats:  lookups : 0
 hits : 0
 hitratio : 0.00
 inserts : 0
 evictions : 0
 size : 0
 warmupTime : 0
 cumulative_lookups : 0
 cumulative_hits : 0
 cumulative_hitratio : 0.00
 cumulative_inserts : 0
 cumulative_evictions : 0

 name:   filterCache
 class:  org.apache.solr.search.FastLRUCache
 version:1.0
 description:Concurrent LRU Cache(maxSize=512, initialSize=512,
 minSize=460, acceptableSize=486, cleanupThread=false)
 stats:  lookups : 0
 hits : 0
 hitratio : 0.00
 inserts : 0
 evictions : 0
 size : 0
 warmupTime : 0
 cumulative_lookups : 0
 cumulative_hits : 0
 cumulative_hitratio : 0.00
 cumulative_inserts : 0
 cumulative_evictions : 0


 Is there something tob e optimized?

 Thanks,
 Ramo

 -Ursprüngliche Nachricht-
 Von: Dmitry Kan [mailto:dmitry@gmail.com]
 Gesendet: Montag, 12. März 2012 15:06
 An: solr-user@lucene.apache.org
 Betreff: Re: Performance (responsetime) on request

 If you look at solr admin page / statistics of cache, you could check 
 the evictions of different types of cache. If some of them are larger 
 than zero, try minimizing them by increasing the corresponding cache 
 params in the solrconfig.xml.

 On Mon, Mar 12, 2012 at 10:12 AM, Ramo Karahasan  
 ramo.karaha...@googlemail.com wrote:

  Hi,
 
 
 
  i've got two virtual machines in the same subnet at the same 
  hostingprovider. On one machine my webapplication is running, on the 
  second a solr instance. In solr I use the following
 
 
 
  fieldType name=text_auto class=solr.TextField
 
  analyzer type=index
 
  !--tokenizer class=solr.KeywordTokenizerFactory/--
 
  tokenizer class=solr.StandardTokenizerFactory/
 
  filter class=solr.EdgeNGramFilterFactory minGramSize=2
  maxGramSize=25
  /
 
 
 
  filter class=solr.LowerCaseFilterFactory/
 
  !--filter class=solr.EdgeNGramFilterFactory minGramSize=1
  maxGramSize=25 /--
 
  /analyzer
 
  analyzer type=query
 
  !--tokenizer class=solr.KeywordTokenizerFactory /--
 
  tokenizer class=solr.StandardTokenizerFactory/
 
  filter class=solr.LowerCaseFilterFactory/
 
  filter class=solr.EdgeNGramFilterFactory minGramSize=2
  maxGramSize=25
  /
 
  /analyzer
 
  /fieldType
 
 
 
 
 
  fieldType name=text class=solr.TextField
  positionIncrementGap=100
 
  analyzer
 
   tokenizer class=solr.WhitespaceTokenizerFactory/
 
   filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=1 catenateNumbers=1
  catenateAll=0 splitOnCaseChange=1/
 
   filter class=solr.LowerCaseFilterFactory/
 
  /analyzer
 
  /fieldType
 
 
 
 
 
 
 
  If I search from my webapplication in my autosuggest box, I get 
  response times of ~500ms per request. Is it possible to tune solr, 
  so that I get faster results?
 
  I have no special cache configuration, nor I don't know what to 
  configure here.
 
 
 
  Thanks,
 
  

Re: Zookeeper view not displaying on latest trunk

2012-03-12 Thread Stefan Matheis
Jamie, would you mind to give the latest another try, if the Cloud-Tab is 
working as it should? 

On Thursday, February 9, 2012 at 6:57 PM, Mark Miller wrote:

 
 On Feb 9, 2012, at 12:09 PM, Jamie Johnson wrote:
 
  To get this to work I had to modify my solr.xml to add a
  defaultCoreName, then everything worked fine on the old interface
  (/solr/admin). The new interface was still unhappy and looking at the
  response that comes back I see the following
  
  {status: 404, error : Zookeeper is not configured for this Solr
  Core. Please try connecting to an alternate zookeeper address.}
  
  Does the new interface support multiple cores?
 
 It should, but someone else wrote it, so I don't know offhand - sounds like a 
 issue we need to look at.
 
 
  Should the old
  interface require that defaultCoreName be set?
 
 
 
 No - another thing we should look at.
 
  
  On Thu, Feb 9, 2012 at 10:29 AM, Jamie Johnson jej2...@gmail.com 
  (mailto:jej2...@gmail.com) wrote:
   I'm looking at the latest code on trunk and it seems as if the
   zookeeper view does not work. When trying to access the information I
   get the following in the log
   
   
   2012-02-09 10:28:49.030:WARN::/solr/zookeeper.jsp
   java.lang.NullPointerException
   at 
   org.apache.jsp.zookeeper_jsp$ZKPrinter.init(org.apache.jsp.zookeeper_jsp:55)
   at 
   org.apache.jsp.zookeeper_jsp._jspService(org.apache.jsp.zookeeper_jsp:533)
   at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:109)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
   org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:389)
   at 
   org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:486)
   at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:380)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
   at 
   org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
   at 
   org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:280)
   at 
   org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
   org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
   org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
   org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
   org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
   org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
   at 
   org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
   at 
   org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at 
   org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
   at 
   org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
   at 
   org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
  
 
 
 
 - Mark Miller
 lucidimagination.com (http://lucidimagination.com)





RE: solr 3.5 and indexing performance

2012-03-12 Thread Agnieszka Kukałowicz
Hi guys,

I have hit the same problem with Hunspell.
Doing a few tests for 500 000 documents, I've got:

Hunspell from http://code.google.com/p/lucene-hunspell/ with 3.4 version -
125 documents per second
Build Hunspell from 4.0 trunk - 11 documents per second.

All the tests were made on 8 core CPU with 32 GB RAM and index on SSD
disks.
For Solr 3.5 I've tried to change JVM heap size, rambuffersize,
mergefactor but the speed of indexing was about 10 -20 documents per
second.

Is it possible that there is some performance bug with Solr 4.0? According
to previous post the problem exists in 3.5 version.

Best regards
Agnieszka Kukałowicz


 -Original Message-
 From: mizayah [mailto:miza...@gmail.com]
 Sent: Thursday, February 23, 2012 10:19 AM
 To: solr-user@lucene.apache.org
 Subject: Re: solr 3.5 and indexing performance

 Ok i found it.

 Its becouse of Hunspell which now is in solr. Somehow when im using it
 by myself in 3.4 it is a lot of faster then one from 3.5.

 Dont know about differences, but is there any way i use my old Google
 Hunspell jar?

 --
 View this message in context: http://lucene.472066.n3.nabble.com/solr-
 3-5-and-indexing-performance-tp3766653p3769139.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Zookeeper view not displaying on latest trunk

2012-03-12 Thread Jamie Johnson
I have not pulled the latest (I am pulled a week or 2 ago) and it
works on that version.


On Mon, Mar 12, 2012 at 11:40 AM, Stefan Matheis
matheis.ste...@googlemail.com wrote:
 Jamie, would you mind to give the latest another try, if the Cloud-Tab is 
 working as it should?

 On Thursday, February 9, 2012 at 6:57 PM, Mark Miller wrote:


 On Feb 9, 2012, at 12:09 PM, Jamie Johnson wrote:

  To get this to work I had to modify my solr.xml to add a
  defaultCoreName, then everything worked fine on the old interface
  (/solr/admin). The new interface was still unhappy and looking at the
  response that comes back I see the following
 
  {status: 404, error : Zookeeper is not configured for this Solr
  Core. Please try connecting to an alternate zookeeper address.}
 
  Does the new interface support multiple cores?

 It should, but someone else wrote it, so I don't know offhand - sounds like 
 a issue we need to look at.


  Should the old
  interface require that defaultCoreName be set?



 No - another thing we should look at.

 
  On Thu, Feb 9, 2012 at 10:29 AM, Jamie Johnson jej2...@gmail.com 
  (mailto:jej2...@gmail.com) wrote:
   I'm looking at the latest code on trunk and it seems as if the
   zookeeper view does not work. When trying to access the information I
   get the following in the log
  
  
   2012-02-09 10:28:49.030:WARN::/solr/zookeeper.jsp
   java.lang.NullPointerException
   at 
   org.apache.jsp.zookeeper_jsp$ZKPrinter.init(org.apache.jsp.zookeeper_jsp:55)
   at 
   org.apache.jsp.zookeeper_jsp._jspService(org.apache.jsp.zookeeper_jsp:533)
   at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:109)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
   org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:389)
   at 
   org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:486)
   at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:380)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
   at 
   org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
   at 
   org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:280)
   at 
   org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
   org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
   org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
   org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
   org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
   org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
   at 
   org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
   at 
   org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at 
   org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at 
   org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
   at 
   org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
   at 
   org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 



 - Mark Miller
 lucidimagination.com (http://lucidimagination.com)





Re: Knowing which fields matched a search

2012-03-12 Thread Russell Black
Paul,

I would think debugQuery would make it slower too, wouldn't it?  Where is the 
thread you are referring to?  Is there a lucene jira ticket for this?

On Mar 11, 2012, at 9:38 AM, Paul Libbrecht wrote:

 Russel,
 
 there's been a thread on that in the lucene world... it's not really perfect 
 yet.
 The suggestion to debugQuery gives only, to my experience, the explain 
 monster which is good for developers (only).
 
 paul
 
 
 Le 11 mars 2012 à 08:40, William Bell a écrit :
 
 debugQuery tells you.
 
 On Fri, Mar 9, 2012 at 1:05 PM, Russell Black rbl...@fold3.com wrote:
 When searching across multiple fields, is there a way to identify which 
 field(s) resulted in a match without using highlighting or stored fields?
 
 
 
 -- 
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076
 



Relational data

2012-03-12 Thread André Maldonado
Hi.

I need to setup an index that have relational data. This index will be for
houses to rent, where the user will search for date, price, holydays (by
name), etc.

The problem is that the same house can have different prices for different
dates.

If I denormalyze this data, I will show the same house multiple times in
the resultset, and I don't want this.

So, for example:

House  Holyday   Price per day
1  Xmas  $ 75.00
1  July 4  $ 50.00
1  Valentine's   $ 15.00
2  Xmas   $ 50.00
2  July 4  $ 10.00

If I query for all data, I'll get 3 documents for the same house (house 1),
but I just want to show it one time to the end-user.

There is some way to do this in Solr (Without processing it in my app)?

Thank's

*
--
*
*E conhecereis a verdade, e a verdade vos libertará. (João 8:32)*

 *andre.maldonado*@gmail.com andre.maldon...@gmail.com
 (11) 9112-4227

http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
http://www.facebook.com/profile.php?id=10659376883
  http://twitter.com/andremaldonado http://www.delicious.com/andre.maldonado
  https://profiles.google.com/105605760943701739931
http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3
  http://www.youtube.com/andremaldonado


Re: Relational data

2012-03-12 Thread Ahmet Arslan
 The problem is that the same house can have different prices
 for different
 dates.
 
 If I denormalyze this data, I will show the same house
 multiple times in
 the resultset, and I don't want this.
 
 So, for example:
 
 House  Holyday       Price per
 day
 1          Xmas     
     $ 75.00
 1          July 4   
       $ 50.00
 1         
 Valentine's   $ 15.00
 2          Xmas     
      $ 50.00
 2          July 4   
       $ 10.00
 
 If I query for all data, I'll get 3 documents for the same
 house (house 1),
 but I just want to show it one time to the end-user.
 
 There is some way to do this in Solr (Without processing it
 in my app)?


http://wiki.apache.org/solr/FieldCollapsing could work.


Trouble indexing word documents

2012-03-12 Thread rdancy
Hello, I running Solr inside Tomcat and I'm trying to index a word.doc using
curl and I get the following error:

bash-3.2# curl
http://localhost:8585/solr/update/extract?literal.id=1commit=true; -F
myfile=@troubleshooting_performance.doc
htmlheadtitleApache Tomcat/6.0.14 - Error report/title /headbody
HTTP Status 500 - lazy loading error

org.apache.solr.common.SolrException: lazy loading error
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:257)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:239)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:263)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:584)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.solr.common.SolrException: Error loading class
'solr.extraction.ExtractingRequestHandler'
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:389)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:423)
at
org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:459)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:248)
... 16 more
Caused by: java.lang.ClassNotFoundException:
solr.extraction.ExtractingRequestHandler
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373)
... 19 more
HR size=1 noshade=noshadep*type* Status report/pp*message*
ulazy loading error

org.apache.solr.common.SolrException: lazy loading error
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:257)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:239)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:263)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:584)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.solr.common.SolrException: Error loading class
'solr.extraction.ExtractingRequestHandler'

Re: Relational data

2012-03-12 Thread Tomás Fernández Löbbe
You could use the grouping feature, depending on your needs:
http://wiki.apache.org/solr/FieldCollapsing

2012/3/12 André Maldonado andre.maldon...@gmail.com

 Hi.

 I need to setup an index that have relational data. This index will be for
 houses to rent, where the user will search for date, price, holydays (by
 name), etc.

 The problem is that the same house can have different prices for different
 dates.

 If I denormalyze this data, I will show the same house multiple times in
 the resultset, and I don't want this.

 So, for example:

 House  Holyday   Price per day
 1  Xmas  $ 75.00
 1  July 4  $ 50.00
 1  Valentine's   $ 15.00
 2  Xmas   $ 50.00
 2  July 4  $ 10.00

 If I query for all data, I'll get 3 documents for the same house (house 1),
 but I just want to show it one time to the end-user.

 There is some way to do this in Solr (Without processing it in my app)?

 Thank's

 *

 --
 *
 *E conhecereis a verdade, e a verdade vos libertará. (João 8:32)*

  *andre.maldonado*@gmail.com andre.maldon...@gmail.com
  (11) 9112-4227

 http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
 http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
 http://www.facebook.com/profile.php?id=10659376883
  http://twitter.com/andremaldonado 
 http://www.delicious.com/andre.maldonado
  https://profiles.google.com/105605760943701739931
 http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3
  http://www.youtube.com/andremaldonado



Re: Trouble indexing word documents

2012-03-12 Thread Tomás Fernández Löbbe
Make sure the Solr cell jar is in the classpath. You probably have a line
like this in your solrconfig.xml:

  lib dir=../../dist/ regex=apache-solr-cell-\d.*\.jar /

Make sure that points to the right file.

On Mon, Mar 12, 2012 at 2:59 PM, rdancy rda...@wiley.com wrote:

 Hello, I running Solr inside Tomcat and I'm trying to index a word.doc
 using
 curl and I get the following error:

 bash-3.2# curl
 http://localhost:8585/solr/update/extract?literal.id=1commit=true; -F
 myfile=@troubleshooting_performance.doc
 htmlheadtitleApache Tomcat/6.0.14 - Error report/title
 /headbody
 HTTP Status 500 - lazy loading error

 org.apache.solr.common.SolrException: lazy loading error
at

 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:257)
at

 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:239)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at

 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at

 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at

 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at

 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at

 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at

 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at

 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:263)
at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at

 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:584)
at
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)
 Caused by: org.apache.solr.common.SolrException: Error loading class
 'solr.extraction.ExtractingRequestHandler'
at

 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:389)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:423)
at
 org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:459)
at

 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:248)
... 16 more
 Caused by: java.lang.ClassNotFoundException:
 solr.extraction.ExtractingRequestHandler
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at

 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373)
... 19 more
 HR size=1 noshade=noshadep*type* Status report/pp*message*
 ulazy loading error

 org.apache.solr.common.SolrException: lazy loading error
at

 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:257)
at

 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:239)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at

 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at

 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at

 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at

 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at

 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at

 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at

 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:263)
at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at

Solr Monitoring / Stats

2012-03-12 Thread Alex Leonhardt

Hi All,

I was wondering if anyone knows of a free tool to use to monitor 
multiple Solr hosts under one roof ? I found some non functioning cacti 
 munin trial implementation but would really like more direct 
statistics of the JVM itself + all Solr cores (i.e. requests /s , etc.) ?


Does anyone know of one ? Or has a set of JMX URLs that could be used to 
make i.e. munin or cacti use that data ?


I'm currently running psi-probe on each host to have at least some 
overview of whats going on within the JVM.



Thanks!
Alex



















RE: Including an attribute value from a higher level entity when using DIH to index an XML file

2012-03-12 Thread Mike O'Leary
I found an answer to my question, but it comes with a cost. With an XML file 
like this (this is simplified to remove extraneous elements and attributes):

data
  user id=[id-num]
message date=[date][message text]/message
...
  /user
  ...
/data

I can index the user id as a field in documents that represent each of the 
user's messages with this data-config expression:

dataConfig
  dataSource type=FileDataSource encoding=UTF-8 /
  document
entity name=message
processor=XPathEntityProcessor
stream=true
forEach=/data/user/message | /data/user
url=message-data.xml
  field column=id xpath=/data/user/@id commonField=true/
  field column=date xpath=/data/user/message/@date 
dateTimeFormat=-MM-dd'T'hh:mm:ss/
  field column=text xpath=/data/user/message /
   /entity
  /document
/dataConfig

I didn't realize that commonField would work for cases in which the previously 
encountered field is in an element that encompasses the other elements, but it 
does. The forEach value has to be /data/user/message | /data/user in order 
for the user id to be located, since it is not under /data/user/message.

By specifying forEach=/data/user/message | /data/user I am saying that each 
/data/user or /data/user/message element is a document in the index, but I 
don't really want /data/user elements to be treated this way. As luck would 
have it, those documents are filtered out, only because date and text are 
required fields, and they have not been assigned values yet when a document is 
created for a /data/user element, so an exception is thrown. I could live with 
this, but it's kind of ugly.

I don't see any other way of doing what I need to do with embedded XML elements 
though. I tried creating nested entities in the data-config file, but each one 
of them is required to have a url attribute, and I think that caused the input 
file to be read twice.

The only other possibility I could see from reading the DataImportHandler 
documentation was to specify an XSL file and change the XML file's structure so 
that the user id attribute is moved down to be an attribute of the message 
element. I'm not sure it's worth it to do something like that for what seems 
like a small problem, and I wonder how much it would slow down the importing of 
a large XML file.

Are there any other ways of handling cases like this, where an attribute of an 
outer element is to be included in an index document that corresponds to an 
element nested inside it?
Thanks,
Mike

-Original Message-
From: Mike O'Leary [mailto:tmole...@uw.edu] 
Sent: Friday, March 02, 2012 3:30 PM
To: Solr-User (solr-user@lucene.apache.org)
Subject: Including an attribute value from a higher level entity when using DIH 
to index an XML file

I have an XML file that I would like to index, that has a structure similar to 
this:

data
  user id=[id-num]
message date=[date][message text]/message
...
  /user
  ...
/data

I would like to have the documents in the index correspond to the messages in 
the xml file, and have the user's [id-num] value stored as a field in each of 
the user's documents. I think this means that I have to define an entity for 
message that looks like this:

dataConfig
  dataSource type=FileDataSource encoding=UTF-8 /
  document
entity name=message
processor=XPathEntityProcessor
stream=true
forEach=/data/user/message/
url=message-data.xml
  field column=date xpath=/data/user/message/@date 
dateTimeFormat=-MM-dd'T'hh:mm:ss/
  field column=text xpath=/data/user/message /
   /entity
  /document
/dataConfig

but I don't know where to put the field definition for the user id. It would 
look like

field column=id xpath=/data/user/@id /

I can't put it within the message entity, because it is defined with 
forEach=/data/user/message/ and the id field's xpath value is outside of the 
entity's scope. Putting the id field definition there causes a null pointer 
exception. I don't think I want to create a user entity that the message 
entity is nested inside of, or is there a way to do that and still have the 
index documents correspond to messages from the file? Are there one or more 
attributes or values of attribute that I haven't run across in my searching 
that provide a way to do what I need to do?
Thanks,
Mike




Re: MISSING LICENSE

2012-03-12 Thread Erick Erickson
Per:

You've been working with SolrCloud, haven't you? Yonik's right on, removing
exampleB is what I had to do with the exact same problem.

Erick

On Mon, Mar 12, 2012 at 2:33 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 Over-aggressive license checking code doesn't like jars in extraneous
 directories (like the work directory that the war is exploded into
 under exampleB).
 delete exampleB and the build should work.

 -Yonik
 lucenerevolution.com - Lucene/Solr Open Source Search Conference.
 Boston May 7-10

 On Mon, Mar 12, 2012 at 3:24 AM, Per Steffensen st...@designware.dk wrote:
 Hi

 Just tried to ant clean test on latest code from trunk. I get a lot of
 MISSING LICENSE messages - e.g.
 [licenses] MISSING LICENSE for the following file:
 [licenses]
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-3.3.3.jar
 [licenses]   Expected locations below:
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-ASL.txt
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-BSD.txt
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-BSD_LIKE.txt
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-CDDL.txt
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-CPL.txt
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-EPL.txt
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-MIT.txt
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-MPL.txt
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-PD.txt
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-SUN.txt
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-COMPOUND.txt
 [licenses]   =
 .../solr/exampleB/work/Jetty_0_0_0_0_8900_solr.war__solr__dsbrc0/webapp/WEB-INF/lib/zookeeper-LICENSE-FAKE.txt

 $ ant -version
 Apache Ant(TM) version 1.8.2 compiled on October 14 2011

 What might be wrong?

 Regards, Per Steffensen


query to some field in solr for multiple values

2012-03-12 Thread preetesh dubey
How can we perform query to single string type field for multiple values?
e.g.
I have the schema field like

field name=id  type=string indexed=true stored=true
required=true /

I want to query on id field for multiple values like..

q=id:['1', '5', '17']...

in mysql we perform the same query like..

select * from table where id in(1,5,17) 

how can we perform the same query in solr on id field?


-- 
Thanks  Regards
Preetesh Dubey


Re: Relational data

2012-03-12 Thread André Maldonado
Thank's Ahmet and Tomás. It worked like a charm.

*
--
*
*E conhecereis a verdade, e a verdade vos libertará. (João 8:32)*

 *andre.maldonado*@gmail.com andre.maldon...@gmail.com
 (11) 9112-4227

http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
http://www.facebook.com/profile.php?id=10659376883
  http://twitter.com/andremaldonado http://www.delicious.com/andre.maldonado
  https://profiles.google.com/105605760943701739931
http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3
  http://www.youtube.com/andremaldonado




On Mon, Mar 12, 2012 at 2:54 PM, Ahmet Arslan iori...@yahoo.com wrote:

  The problem is that the same house can have different prices
  for different
  dates.
 
  If I denormalyze this data, I will show the same house
  multiple times in
  the resultset, and I don't want this.
 
  So, for example:
 
  House  Holyday   Price per
  day
  1  Xmas
  $ 75.00
  1  July 4
$ 50.00
  1
  Valentine's   $ 15.00
  2  Xmas
   $ 50.00
  2  July 4
$ 10.00
 
  If I query for all data, I'll get 3 documents for the same
  house (house 1),
  but I just want to show it one time to the end-user.
 
  There is some way to do this in Solr (Without processing it
  in my app)?


 http://wiki.apache.org/solr/FieldCollapsing could work.



Re: query to some field in solr for multiple values

2012-03-12 Thread Ahmet Arslan
 I want to query on id field for multiple values like..
 
 q=id:['1', '5', '17']...
 
 in mysql we perform the same query like..
 
 select * from table where id in(1,5,17) 
 
 how can we perform the same query in solr on id field?

q=1 5 17q.op=ORdf=id


Re: Performance (responsetime) on request

2012-03-12 Thread Dmitry Kan
This page should help you:

http://wiki.apache.org/solr/SolrCaching

-- Dmitry

On Mon, Mar 12, 2012 at 5:37 PM, Ramo Karahasan 
ramo.karaha...@googlemail.com wrote:

 Hi,

 thanks for you advice. Do you have any documentation on that? I'm not
 sure, how and where to configure this stuff and what impact it has.

 Thans,
 Ramo

 -Ursprüngliche Nachricht-
 Von: Dmitry Kan [mailto:dmitry@gmail.com]
 Gesendet: Montag, 12. März 2012 16:21
 An: solr-user@lucene.apache.org
 Betreff: Re: Performance (responsetime) on request

 you can optimize the  documentCache by setting  maxSize to some decent
 value, like 2000. Also configure some meaningful warming queries in the
 solrconfig.
 When increasing the cache size, monitor the RAM usage, as that can
 starting increasing as well.
 Do you / would you need to use filter queries? Those can speed up search
 as well through the usage of filterCache.

 Dmitry

 On Mon, Mar 12, 2012 at 5:12 PM, Ramo Karahasan 
 ramo.karaha...@googlemail.com wrote:

  Hi,
 
  this are the results form the solr admin page for cache:
 
 
  name:   queryResultCache
  class:  org.apache.solr.search.LRUCache
  version:1.0
  description:LRU Cache(maxSize=512, initialSize=512)
  stats:  lookups : 376
  hits : 246
  hitratio : 0.65
  inserts : 130
  evictions : 0
  size : 130
  warmupTime : 0
  cumulative_lookups : 2994
  cumulative_hits : 1934
  cumulative_hitratio : 0.64
  cumulative_inserts : 1060
  cumulative_evictions : 409
 
  name:   fieldCache
  class:  org.apache.solr.search.SolrFieldCacheMBean
  version:1.0
  description:Provides introspection of the Lucene FieldCache, this is
  **NOT** a cache that is managed by Solr.
  stats:  entries_count : 0
  insanity_count : 0
 
  name:   documentCache
  class:  org.apache.solr.search.LRUCache
  version:1.0
  description:LRU Cache(maxSize=512, initialSize=512)
  stats:  lookups : 13416
  hits : 11787
  hitratio : 0.87
  inserts : 1629
  evictions : 1089
  size : 512
  warmupTime : 0
  cumulative_lookups : 100012
  cumulative_hits : 86959
  cumulative_hitratio : 0.86
  cumulative_inserts : 13053
  cumulative_evictions : 11914
 
  name:   fieldValueCache
  class:  org.apache.solr.search.FastLRUCache
  version:1.0
  description:Concurrent LRU Cache(maxSize=1, initialSize=10,
  minSize=9000, acceptableSize=9500, cleanupThread=false)
  stats:  lookups : 0
  hits : 0
  hitratio : 0.00
  inserts : 0
  evictions : 0
  size : 0
  warmupTime : 0
  cumulative_lookups : 0
  cumulative_hits : 0
  cumulative_hitratio : 0.00
  cumulative_inserts : 0
  cumulative_evictions : 0
 
  name:   filterCache
  class:  org.apache.solr.search.FastLRUCache
  version:1.0
  description:Concurrent LRU Cache(maxSize=512, initialSize=512,
  minSize=460, acceptableSize=486, cleanupThread=false)
  stats:  lookups : 0
  hits : 0
  hitratio : 0.00
  inserts : 0
  evictions : 0
  size : 0
  warmupTime : 0
  cumulative_lookups : 0
  cumulative_hits : 0
  cumulative_hitratio : 0.00
  cumulative_inserts : 0
  cumulative_evictions : 0
 
 
  Is there something tob e optimized?
 
  Thanks,
  Ramo
 
  -Ursprüngliche Nachricht-
  Von: Dmitry Kan [mailto:dmitry@gmail.com]
  Gesendet: Montag, 12. März 2012 15:06
  An: solr-user@lucene.apache.org
  Betreff: Re: Performance (responsetime) on request
 
  If you look at solr admin page / statistics of cache, you could check
  the evictions of different types of cache. If some of them are larger
  than zero, try minimizing them by increasing the corresponding cache
  params in the solrconfig.xml.
 
  On Mon, Mar 12, 2012 at 10:12 AM, Ramo Karahasan 
  ramo.karaha...@googlemail.com wrote:
 
   Hi,
  
  
  
   i've got two virtual machines in the same subnet at the same
   hostingprovider. On one machine my webapplication is running, on the
   second a solr instance. In solr I use the following
  
  
  
   fieldType name=text_auto class=solr.TextField
  
   analyzer type=index
  
   !--tokenizer class=solr.KeywordTokenizerFactory/--
  
   tokenizer class=solr.StandardTokenizerFactory/
  
   filter class=solr.EdgeNGramFilterFactory minGramSize=2
   maxGramSize=25
   /
  
  
  
   filter class=solr.LowerCaseFilterFactory/
  
   !--filter class=solr.EdgeNGramFilterFactory minGramSize=1
   maxGramSize=25 /--
  
   /analyzer
  
   analyzer type=query
  
   !--tokenizer class=solr.KeywordTokenizerFactory /--
  
   tokenizer class=solr.StandardTokenizerFactory/
  
   filter class=solr.LowerCaseFilterFactory/
  
   filter class=solr.EdgeNGramFilterFactory minGramSize=2
   maxGramSize=25
   /
  
   /analyzer
  
   /fieldType
  
  
  
  
  
   fieldType name=text class=solr.TextField
   positionIncrementGap=100
  
   analyzer
  
tokenizer class=solr.WhitespaceTokenizerFactory/
  
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
   generateNumberParts=1 catenateWords=1 catenateNumbers=1
   catenateAll=0 splitOnCaseChange=1/
  
filter 

Re: Trouble indexing word documents

2012-03-12 Thread rdancy
I see the line - lib dir=../../dist/ regex=apache-solr-cell-\d.*\.jar /
but I don't see any solr cell jars, only Tika jars. I moved all the jars
over to my classpath directory. I'm using version lucidworks-solr-3.2.0_01.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Trouble-indexing-word-documents-tp3819949p3820472.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MISSING LICENSE

2012-03-12 Thread Per Steffensen

Thank you both for your kind help.

Regards, Steff

Erick Erickson skrev:

Per:

You've been working with SolrCloud, haven't you? Yonik's right on, removing
exampleB is what I had to do with the exact same problem.

Erick

On Mon, Mar 12, 2012 at 2:33 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
  

Over-aggressive license checking code doesn't like jars in extraneous
directories (like the work directory that the war is exploded into
under exampleB).
delete exampleB and the build should work.

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10




Re: Display of highlighted search result should start with the beginning of the sentence that contains the search string.

2012-03-12 Thread Koorosh Vakhshoori
Hi Koji,
  I am Shyam's coworker. After some looking into this issue, I believe the
problem of chopped word has to do with
org.apache.lucene.search.vectorhighlight.SimpleFragListBuilder class'
'margin' field. It is set to 6 by default. My understanding is having margin
value of greater than zero results in truncated word when the highlighted
term is too close to beginning of a document. I was able to reset the
'margin' field by creating my custom version of
org.apache.solr.highlight.SimpleFragListBuilder and passing zero for
'margin' when calling the Lucene's SimpleFragListBuilder constructor. My
testing shows the problem has been fixed. Do you concur?

  Now couple of questions. Not sure what the purpose of this field is, could
you give the use case for it? Also could it be exposed as a parameter in
Solr so it could be set to some other value?

Thanks,

Koorosh


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Display-of-highlighted-search-result-should-start-with-the-beginning-of-the-sentence-that-contains-t-tp3722912p3820516.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCore error

2012-03-12 Thread Erick Erickson
You attachment didn't come through, the mail server often
strips this stuff. Please either inline it or put it up on some
publicly accessible place

Best
Erick

On Sun, Mar 11, 2012 at 10:51 PM, Nikhila Pala nikhila_p...@infosys.comwrote:

 Hi,

 ** **

 I’m getting some exceptions while shutting the hybris server and the
 exception details are specifies in the file attached to this mail.  Please
 try to resolve it as soon as possible.

 ** **

 Thanks  Regards,

 Nikhila Pala

 Systems engineer

 Infosys Technologies Limited

 ** **

 ** **

  CAUTION - Disclaimer *
 This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
 for the use of the addressee(s). If you are not the intended recipient, please
 notify the sender by e-mail and delete the original message. Further, you are 
 not
 to copy, disclose, or distribute this e-mail or its contents to any other 
 person and
 any such actions are unlawful. This e-mail may contain viruses. Infosys has 
 taken
 every reasonable precaution to minimize this risk, but is not liable for any 
 damage
 you may sustain as a result of any virus in this e-mail. You should carry out 
 your
 own virus checks before opening the e-mail or attachment. Infosys reserves the
 right to monitor and review the content of all messages sent to or from this 
 e-mail
 address. Messages sent to or from this e-mail address may be stored on the
 Infosys e-mail system.
 ***INFOSYS End of Disclaimer INFOSYS***




Re: Trouble indexing word documents

2012-03-12 Thread Tomás Fernández Löbbe
it should be in
lucidworks-solr-3.2.0_01/dist/lucidworks-solr-cell-3.2.0_01.jar, don't
you have that one?

On Mon, Mar 12, 2012 at 5:44 PM, rdancy rda...@wiley.com wrote:

 I see the line - lib dir=../../dist/ regex=apache-solr-cell-\d.*\.jar
 /
 but I don't see any solr cell jars, only Tika jars. I moved all the jars
 over to my classpath directory. I'm using version lucidworks-solr-3.2.0_01.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Trouble-indexing-word-documents-tp3819949p3820472.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Additional Query with MLT

2012-03-12 Thread Jamie Johnson
Is there a way to provide an additional query constraint to the MLT
component?  My particular use case is I want to get similar documents,
but limit them to the documents a user can actually see based on some
authorization query.  Is this currently possible?


Incomplete documents with parent child DB relationship

2012-03-12 Thread Tim Hurring
I'm new to SOLR and have managed to get some basic indexing and querying
working. However I haven't been able to successfully implement the indexing
of a parent child database relationship.

My db-data-config.xml is:

dataConfig
dataSource driver=com.ibm.as400.access.AS400JDBCDriver
url=jdbc:as400://FAB/SV95TNDTA;;naming=system; user=SV95TNGLB
password=GLOBAL95TN /
document
entity name=client query=SELECT #1ABCD, #1C8TX, #1AFTX, #1A7NA,
#1A8NA FROM REP   
  field column=#1ABCD name=id /
  field column=#1C8TX name=surname /
  field column=#1AFTX name=forenames /
  field column=#1A7NA name=ird_number /
  field column=#1A8NA name=gst_number /
  entity name=idreference query=select M6ABR from ABM6CPP where
M6ABCD='${client.id}'
field column=M6ABR name=id_reference /
  /entity
/entity
/document
/dataConfig

Most 'client' records will have one or more 'idreference' records. SOLR
seems to import the data successfully (see status below) but when I do a *:*
search there are no 'id_reference' elements in any document (see below at
bottom):

response
lst name=responseHeader
int name=status0/int
int name=QTime0/int
/lst
lst name=initArgs
lst name=defaults
str name=configdb-data-config.xml/str
/lst
/lst
str name=commandstatus/str
str name=statusidle/str
str name=importResponse/
lst name=statusMessages
str name=Total Requests made to DataSource13594/str
str name=Total Rows Fetched13593/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-03-13 13:15:07/str
str name=
Indexing completed. Added/Updated: 13593 documents. Deleted 0 documents.
/str
str name=Committed2012-03-13 13:15:36/str
str name=Optimized2012-03-13 13:15:36/str
str name=Total Documents Processed13593/str
str name=Time taken 0:0:29.804/str
/lst
str name=WARNING
This response format is experimental. It is likely to change in the future.
/str
/response







result name=response numFound=13593 start=0 maxScore=1.0
doc
float name=score1.0/float
str name=forenamesJohn David/str
str name=gst_number/str
str name=id012345/str
str name=ird_number/str
str name=surnameSagers/str
/doc
doc
float name=score1.0/float
str name=forenamesMark James/str
str name=gst_number/str
str name=id000426/str
str name=ird_number/str
str name=surnameKirby/str
/doc
...

Any assistance would be greatly appreciated.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Incomplete-documents-with-parent-child-DB-relationship-tp3820963p3820963.html
Sent from the Solr - User mailing list archive at Nabble.com.


Can solr-langid(Solr3.5.0) detect multiple languages in one text?

2012-03-12 Thread bing
Hi, all, 

I am using solr-langid(Solr3.5.0) to do language detection, and I hope
multiple languages in one text can be detected. 

The example text is: 
咖哩起源於印度。印度民間傳說咖哩是佛祖釋迦牟尼所創,由於咖哩的辛辣與香味可以幫助遮掩羊肉的腥騷,此舉即為用以幫助不吃豬肉與牛肉的印度人。在泰米爾語中,「kari」是「醬」的意思。在馬來西亞,kari也稱dal(當在mamak檔)。早期印度被蒙古人所建立的莫臥兒帝國(Mughal
Empire)所統治過,其間從波斯(現今的伊朗)帶來的飲食習慣,從而影響印度人的烹調風格直到現今。
Curry (plural, Curries) is a generic term primarily employed in Western
culture to denote a wide variety of dishes originating in Indian, Pakistani,
Bangladeshi, Sri Lankan, Thai or other Southeast Asian cuisines. Their
common feature is the incorporation of more or less complex combinations of
spices and herbs, usually (but not invariably) including fresh or dried hot
capsicum peppers, commonly called chili or cayenne peppers.

I want the text can be separated into two parts, and the part in Chinese
goes to text_zh-tw while the other one text_en. Can I do something like
that? 

Thank you. 

Best Regards, 
Bing 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-solr-langid-Solr3-5-0-detect-multiple-languages-in-one-text-tp3821210p3821210.html
Sent from the Solr - User mailing list archive at Nabble.com.


Highlighting a font without bold or italic modes

2012-03-12 Thread Lance Norskog
How do you highlight terms in languages without boldface or italic
modes? Maybe raise the text size a couple of sizes just for that word?


-- 
Lance Norskog
goks...@gmail.com


RE: List of recommendation engines with solr

2012-03-12 Thread Rohan
Hi Gora,

Thanks a lot for your valuable comments, really appreciated.
Yeah , You got me correctly I am exactly  looking for Mahout as I am  using 
Java as my business layer with Apache solr.

Thanks,
Rohan

From: Gora Mohanty-3 [via Lucene] 
[mailto:ml-node+s472066n3819480...@n3.nabble.com]
Sent: Monday, March 12, 2012 8:28 PM
To: Rohan Ashok Kumbhar
Subject: Re: List of recommendation engines with solr

On 12 March 2012 16:30, Rohan [hidden 
email]/user/SendEmail.jtp?type=nodenode=3819480i=0 wrote:
 Hi All,

 I would require list of recs engine which can be integrated with solr and
 also suggest best one out of this.

 any comments would be appriciated!!

What exactly do you mean by that? Why is integration with Solr
a requirement, and what do you expect to gain by such an integration?
Best also probably depends on the context of your requirements.

There are a variety of open-source recommendation engines.
If you are looking at something from Apache, and in Java, Mahout
might be a good choice.

Regards,
Gora


If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/List-of-recommendation-engines-with-solr-tp3818917p3819480.html
To unsubscribe from List of recommendation engines with solr, click 
herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3818917code=Um9oYW5fS3VtYmhhckBpbmZvc3lzLmNvbXwzODE4OTE3fC0xMjUwNDUyNDI1.
NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are 
not
to copy, disclose, or distribute this e-mail or its contents to any other 
person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken
every reasonable precaution to minimize this risk, but is not liable for any 
damage
you may sustain as a result of any virus in this e-mail. You should carry out 
your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this 
e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


--
View this message in context: 
http://lucene.472066.n3.nabble.com/List-of-recommendation-engines-with-solr-tp3818917p3821268.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: How to index doc file in solr?

2012-03-12 Thread Rohan
Thanks Erick ,really appreciated.

From: Erick Erickson [via Lucene] 
[mailto:ml-node+s472066n3819585...@n3.nabble.com]
Sent: Monday, March 12, 2012 9:05 PM
To: Rohan Ashok Kumbhar
Subject: Re: How to index doc file in solr?

Consider using SolrJ, possibly combined with
Tika (which is what underlies Solr Cel).
http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/

AlthoughExtractingRequestHandler
has the capability of indexing metadata as
well if you map the fields.

See: http://wiki.apache.org/solr/ExtractingRequestHandler

Best
Erick


On Mon, Mar 12, 2012 at 11:09 AM, Rohan [hidden 
email]/user/SendEmail.jtp?type=nodenode=3819585i=0 wrote:

 Hi Erick,

 Thanks for the valuable comments on this.

 See i have few set of word docs file and i would like to index meta data
 part includeing the content of the page , so is there any way to complete
 this task?

 Need your comments on this.

 Thanks,
 Rohan

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-to-index-doc-file-in-solr-tp3806543p3818938.html
 Sent from the Solr - User mailing list archive at Nabble.com.


If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/How-to-index-doc-file-in-solr-tp3806543p3819585.html
To unsubscribe from How to index doc file in solr?, click 
herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3806543code=Um9oYW5fS3VtYmhhckBpbmZvc3lzLmNvbXwzODA2NTQzfC0xMjUwNDUyNDI1.
NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are 
not
to copy, disclose, or distribute this e-mail or its contents to any other 
person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken
every reasonable precaution to minimize this risk, but is not liable for any 
damage
you may sustain as a result of any virus in this e-mail. You should carry out 
your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this 
e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-index-doc-file-in-solr-tp3806543p3821271.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using multiple DirectSolrSpellcheckers for a query

2012-03-12 Thread Nalini Kartha
Hi James/Robert,

Thanks for the responses.

Robert: What is it about the current APIs that makes this hard? How
much/what kind of refactoring would open this up?

James: I didn't quite understand the usage you suggested. I thought that
the spellcheck.q param shouldn't include field names, etc and that the
purpose of specifying this param is to avoid the extra parsing out of the
field names, etc. from the q param to get the query terms for spell
checking. This is based on this bit in the SpellCheckComponent wiki -

 The spellcheck.q parameter is intended to be the original query, minus
any extra markup like field names, boosts, etc.

Did I misunderstand something?

I agree that it's impossible to know if the query run should be corrected
to sun or running in the example I gave but I guess I'm asking more
from the angle of how to avoid correcting terms that will be matched
because they exist in other more processed fields that are being searched.
Since the recommendation is to build spellcheck fields from minimally
processed source fields, seems like this would be a common problem?

And another kind of unrelated question - all the examples of spellcheck
dictionaries I've seen in sample solrconfig.xmls have minPrefix set to 1.
Is this for performance reasons? And with this setting, we wouldn't get
run as a correction for eon right?

Thanks,
Nalini

On Wed, Mar 7, 2012 at 11:04 AM, Robert Muir rcm...@gmail.com wrote:

 On Wed, Jan 25, 2012 at 12:55 PM, Nalini Kartha nalinikar...@gmail.com
 wrote:
 
  Is there any reason why Solr doesn't support using multiple spellcheckers
  for a query? Is it because of performance overhead?
 

 Thats not the case really, see
 https://issues.apache.org/jira/browse/SOLR-2926

 I think the issue is that the spellchecker APIs need to be extended to
 allow this to happen easier, there is no real hard
 performance/technical/algorithmic issue, its just a matter of
 refactoring spellchecker APIs to allow this!

 --
 lucidimagination.com