date:20110922

Re: Sort five random "Top Offers" to the top

2011-09-22 Thread Doug McKenzie

Could you not just do your normal search with and add a filter query on?
fq=topoffer:true

That would then return only results with top offer : true and then use
whatever shuffling / randomising you like in your application.
Alternately you could even add sorting on relevance to show the top 5
closest matches to the query rows=5&sort=score desc

On 21/09/2011 21:26, Sujit Pal wrote:

Hi MOuli,

AFAIK (and I don't know that much about Solr), this feature does not
exist out of the box in Solr. One way to achieve this could be to
construct a DocSet with topoffer:true and intersect it with your result
DocSet, then select the first 5 off the intersection, randomly shuffle
them, sublist [0:5], and move the sublist to the top of the results like
QueryElevationComponent does. Actually you may want to take a look at
QueryElevationComponent code for inspiration (this is where I would have
looked if I had to implement something similar).

-sujit

On Wed, 2011-09-21 at 06:54 -0700, MOuli wrote:

Hey Community.

I got a Lucene/Solr Index with many offers. Some of them are marked by a
flag field "topoffer" that they are top offers. Now I want so sort randomly
5 of this offers on the top.

For Example
HTC Sensation
- topoffer = true
HTC Desire
- topoffer = false
Samsung Galaxy S2
- topoffer = ture
IPhone 4
- topoffer = true
...

When i search for a Handy then i want that first 3 offers are HTC Sensation,
Samsung Galaxy S2 and the iPhone 4.

Does anyone have an idea?

PS.: I hope my english is not to bad

--
View this message in context:
http://lucene.472066.n3.nabble.com/Sort-five-random-Top-Offers-to-the-top-tp3355469p3355469.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
Become a Firebox Fan on Facebook: http://facebook.com/firebox
And Follow us on Twitter: http://twitter.com/firebox

Firebox has been nominated for Retailer of the Year in the 2011 Stuff Awards.
Who will win? It's up to you! Visit http://www.stuff.tv/awards and place your
vote. We'll do a special dance if it's us.

Firebox HQ is MOVING HOUSE! We're migrating from Streatham Hill to shiny new
digs in Shoreditch. As of 3rd October please update your records to:
Firebox.com, 6.10 The Tea Building, 56 Shoreditch High Street, London, E1 6JJ

Global Head Office: Firebox House, Ardwell Road, London SW2 4RT
Firebox.com Ltd is registered in England and Wales, company number 3874477
Registered Company Address: 41 Welbeck Street London W1G 8EA Firebox.com

Any views expressed in this email are those of the individual sender, except
where the sender expressly, and with authority, states them to be the views of
Firebox.com Ltd.

Distributed Search question/feedback

2011-09-22 Thread dan sutton

Hi,

Does SolrCloud use Distributed search as described
http://wiki.apache.org/solr/DistributedSearch or is it different
entirely?

Does SolrCloud suffer from the same limitation as Distributed search
(inefficient to use a high "start" parameter, and presumably high CPU
highlighting all those docs etc among other issues).

Our search mainly comprises of searches with a country, and
occationally across a continent or worldwide, so I'm thinking it's
probably simpler to have a pan index for worldwide and continent
searches, and seperate country indicies (and these placed closer to
each country for example).

Any pointers for those who've been down the distributed path appreciated!

Cheers,
Dan

Re: Slow autocomplete(terms)

2011-09-22 Thread roySolr

Hello Erick,

Thanks for your answer but i have some problems with the ngramfilter.

My conf look like this:


  



  
  



  


I see this in the analysis:

"manchester"

ma  an  nc  ch  he  es  st  te  er  man 
anc nch che hes est ste ter mancanchnche
cheshestestestermanch   anche   nches   chest   heste   ester   
manche  anches  nchest
cheste  hester  manches anchest ncheste chester manchestancheste
nchester

When i use terms i see all this results back in the response. So i type
"ches" i got this:

ches
nches
anches
nchest
ncheste

I want one suggestion with a total keyword: "manchester". Is this possible?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocomplete-terms-performance-problem-tp3351352p3358126.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sort five random "Top Offers" to the top

2011-09-22 Thread MOuli

This was my first idea, but I want a solution which is handled by solr. 

When I didn't find a solution then I have to use something like that.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sort-five-random-Top-Offers-to-the-top-tp3355469p3358148.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sort five random "Top Offers" to the top

2011-09-22 Thread MOuli

Hmm is it possible for me to write my own search component?

I just downloaded the solr sources and need some informations how the search
components work. Is there anything out there which can help me?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sort-five-random-Top-Offers-to-the-top-tp3355469p3358152.html
Sent from the Solr - User mailing list archive at Nabble.com.

Optimize fails with OutOfMemory Exception - sun.nio.ch.FileChannelImpl.map involved

2011-09-22 Thread Ralf Matulat


Good morning!
Recently we slipped into an OOME by optimizing our index. It looks like 
it's regarding to the nio class and the memory-handling.
I'll try to describe the environment, the error and what we did to solve 
the problem. Nevertheless, none of our approaches was successful.


The environment:

- Tested with both, SOLR 3.3 & 3.4
- SuSE SLES 11 (X64)virtual machine with 16GB RAM
- ulimi: virtual memory 14834560 (14GB)
- Java: java-1_6_0-ibm-1.6.0-124.5
- Apache Tomcat/6.0.29

- Index Size (on filesystem): ~5GB, 1.1 million text documents.

The error:
First, building the index from scratch with a mysql DIH, with an empty 
index-Dir works fine.
Building an index with &command=full-import, when the old segment files 
still in place, fails with an OutOfMemoryException. Same as optimizing 
the index.

Doing an optimize fails after some time with:

SEVERE: java.io.IOException: background merge hit exception: 
_6p(3.4):Cv1150724 _70(3.4):Cv667 _73(3.4):Cv7 _72(3.4):Cv4 _71(3.4):Cv1 
into _74 [optimize]
at 
org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2552)
at 
org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2472)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:410)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
at 
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:107)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:61)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

at java.lang.Thread.run(Thread.java:735)
Caused by: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:765)
at 
org.apache.lucene.store.MMapDirectory$MMapIndexInput.(MMapDirectory.java:264)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:216)
at 
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:89)
at 
org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115)
at 
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:710)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4378)

at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3917)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)

Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:762)
... 9 more

Then we changed mergeScheduler and mergePolicy to




which lead to a slightly different error-message:

SEVERE: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:765)
at 
org.apache.lucene.store.MMapDirectory$MMapIndexInput.(MMapDirectory.java:264)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:216)
at 
org.apache.lucene.index.TermVectorsReader.(TermVectorsReader.java:85)
at 
org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:221)
at 
org.apache.lucene.index.SegmentReader.get(SegmentReader.java:117)
at 
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:710)

Saxon + classpath issues

2011-09-22 Thread penela

Hi,

I'm trying to make my XsltResponseWriter to use Saxon 9.1 as its default
transformer (working with Solr 3.3 on Tomcat 7).


After reading http://wiki.apache.org/solr/XsltResponseWriter I've added 
-Djavax.xml.transform.TransformerFactory=net.sf.saxon.TransformerFactoryImpl
to Tomcat launch options and put all saxon*.jar in contrib/saxon/lib (same
level as contrib/extraction/lib or contrib/dataimporthandler/lib that are
being properly used).

I've added this folder to solrconfig.xml:
  
  
  
  
   

The load seems to work fine, and saxon libraries are supposedly loaded to
the system:
2011-09-22 11:00:47,977 INFO [solr.core.SolrConfig] - [Thread-2] - : Adding
specified lib dirs to ClassLoader
2011-09-22 11:00:48,017 INFO [solr.core.SolrResourceLoader] - [Thread-2] - :
Adding 'file:/Library/Solr/contrib/extraction/lib/asm-3.1.jar' to
classloader
...
2011-09-22 11:00:48,124 INFO [solr.core.SolrResourceLoader] - [Thread-2] - :
Adding 'file:/Library/Solr/contrib/saxon/lib/saxon9-dom.jar' to classloader
2011-09-22 11:00:48,124 INFO [solr.core.SolrResourceLoader] - [Thread-2] - :
Adding 'file:/Library/Solr/contrib/saxon/lib/saxon9-dom4j.jar' to
classloader
2011-09-22 11:00:48,124 INFO [solr.core.SolrResourceLoader] - [Thread-2] - :
Adding 'file:/Library/Solr/contrib/saxon/lib/saxon9-jdom.jar' to classloader
2011-09-22 11:00:48,124 INFO [solr.core.SolrResourceLoader] - [Thread-2] - :
Adding 'file:/Library/Solr/contrib/saxon/lib/saxon9-s9api.jar' to
classloader
2011-09-22 11:00:48,125 INFO [solr.core.SolrResourceLoader] - [Thread-2] - :
Adding 'file:/Library/Solr/contrib/saxon/lib/saxon9-sql.jar' to classloader
2011-09-22 11:00:48,125 INFO [solr.core.SolrResourceLoader] - [Thread-2] - :
Adding 'file:/Library/Solr/contrib/saxon/lib/saxon9-xom.jar' to classloader
2011-09-22 11:00:48,125 INFO [solr.core.SolrResourceLoader] - [Thread-2] - :
Adding 'file:/Library/Solr/contrib/saxon/lib/saxon9-xpath.jar' to
classloader
2011-09-22 11:00:48,125 INFO [solr.core.SolrResourceLoader] - [Thread-2] - :
Adding 'file:/Library/Solr/contrib/saxon/lib/saxon9-xqj.jar' to classloader
2011-09-22 11:00:48,125 INFO [solr.core.SolrResourceLoader] - [Thread-2] - :
Adding 'file:/Library/Solr/contrib/saxon/lib/saxon9.jar' to classloader

However, when trying to use the XSLT writer I keep getting:
2011-09-22 11:20:03,300 ERROR [solr.servlet.SolrDispatchFilter] -
[http-bio-8080-exec-10] - :
javax.xml.transform.TransformerFactoryConfigurationError: Provider
net.sf.saxon.TransformerFactoryImpl not found
at
javax.xml.transform.TransformerFactory.newInstance(TransformerFactory.java:108)

Any ideas?

Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Saxon-classpath-issues-tp3358200p3358200.html
Sent from the Solr - User mailing list archive at Nabble.com.

StopWords coming in Top 10 terms despite using StopFilterFactory

2011-09-22 Thread Pranav Prakash

Hi List,

I included StopFilterFactory and I  can see it taking action in the Analyzer
Interface. However, when I go to Schema Analyzer, I see those stop words in
the top 10 terms. Is this normal?
















*Pranav Prakash*

"temet nosce"

Twitter  | Blog  |
Google

Re: fieldCache problem OOM exception

2011-09-22 Thread erolagnab

Sorry to pull this up again, but I've faced a similar issue and would like to
share the solution.

In my situation, I uses SolrQueryRequest, SolrCore, SolrQueryResponse to
explicitly perform the search.
The gotcha from my code is that I didn't call SolrQueryRequest.close() hence
the increasing memory in FieldCache everytime index is updated. Calling
SolrQueryRequest.close() solves the problem, you should see items disappear
from FieldCache (JMX) as soon as new searcher is registered.

My corrected code is


SolrQueryRequest request = buildSolrQueryRequest();
try {
 SolrQueryResponse response = new SolrQueryResponse();
 SolrRequestHandler handler = getSolrRequestHandler();
 core.execute(handler, request, response);
 return response;
} finally {
 request.close();
}



--
View this message in context: 
http://lucene.472066.n3.nabble.com/fieldCache-problem-OOM-exception-tp3067057p3358290.html
Sent from the Solr - User mailing list archive at Nabble.com.

Bad Request accessing solr on linux

2011-09-22 Thread Kissue Kissue

Hi,

I am using solr 3.3 running on a linux box. For some reason when i make a
request to solr on my windows box, i do not get bad request error but when i
run it on my linux box, i get bad request. On the linux box, i have both my
application and solr deployed on the same tomcat instance.

Below is the error:

Bad Request

request:
http://172.16.2.26:8080/solr/catalogue/select?q=paper&rows=10&start=0&fl=*,score&fq={!tag=catalogueId}catalogueId:
"Angle Springs Pricing"catalogueId: "Edmundsons Electrical Ltd"catalogueId:
"Edmundsons Lamps and Tubes"catalogueId: "fisher-punchout"catalogueId:
"Freds prices"catalogueId: "Getech-keele-punchout"catalogueId:
"id001"catalogueId: "ID-001"catalogueId: "ID-1001"catalogueId:
"ID-1003"catalogueId: "Insight-punchout"catalogueId:
"lyrecouk123"catalogueId: "onecall19"catalogueId: "QC Supplies -
Prices"catalogueId: "RS-punchout"catalogueId: "Sigma-punchout"catalogueId:
"SLS-punchout"catalogueId: "Spring Personnel"catalogueId:
"supplies-team-punchout"catalogueId: "The BSS Group PLC"catalogueId: "Tower
Supplies - Pricing"catalogueId:
"xma013"&hl=true&hl.snippets=1&wt=javabin&version=2

Any opinion on what is wrong with the request?

Thanks.

How to get the fields that match the request?

2011-09-22 Thread Nicolas Martin


Hi everyBody,

I need your help to get more information in my solR query's response.

i've got a simple input text which allows me to query several fields in 
the same query.


So my query  looks like this
"q=email:martyn+OR+name:martynn+OR+commercial:martyn ..."

Is it possible in the response to know the fields where "martynn" has 
been found ?


Thanks a Lot :-)

Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-22 Thread Michael McCandless

On Wed, Sep 21, 2011 at 10:10 PM, Michael Sokolov  wrote:

> I wonder if config-file validation would be helpful here :) I posted a patch
> in SOLR-1758 once.

Big +1.

We should aim for as stringent config file checking as possible.

Mike McCandless

http://blog.mikemccandless.com

Re: SolrCloud state

2011-09-22 Thread Yury Kats

On 9/21/2011 1:45 PM, Miguel Coxo wrote:

> What i would like to know is if a shard master fails will the replica be
> "promoted" to a master. Or will it remain search only and only recover when
> a new master is setup.

Replica will not be promoted. Search would still work.

> Also how is the document indexing distributed by the shards? Can i add a new
> shard dynamically?

There's no distributed indexing yet, only distributed search. There's work
being done on DI, but it's not complete. It's up to your application to
distribute indexing. Whether you can add shards or not depends on how your
application is implemented and what your requirements are. You need to
make sure that every doc is indexed always in the same place.

Search for empty string in 1.4.1 vs 3.4

2011-09-22 Thread Shanmugavel SRD

Hi,
  I am using SOLR 1.4.1. When I search for empty string in a string field,
q=tag_facet:"", it return documents with values in tag_facet.
  When I use the same query q=tag_facet:"", in SOLR 3.4, it is returning
only documents with "" string in tag_facet. 
  SOLR 3.4 works as expected. I just want to know whether it is an issue in
SOLR 1.4.1. Please advise.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-for-empty-string-in-1-4-1-vs-3-4-tp3358444p3358444.html
Sent from the Solr - User mailing list archive at Nabble.com.

SpellCheck Print Multiple Collations

2011-09-22 Thread Kudzanai

Hi,

This is probably a very basic question but how do I get the returned
collations.

My spell check request is 

http://localhost:8983/solr/autocomplete/select?spellcheck.q=ipood%20tough&spellcheck=true&spellcheck.collate=true&spellcheck.build=true&spellcheck.maxCollations=3&rows=3&spellcheck.count=5

Part of my response is:

ipod tough
ipad tough
wood tough

My results are accurate but now how do i get the collations. What method do
i use in the API?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SpellCheck-Print-Multiple-Collations-tp3358391p3358391.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ANTLR SOLR query/filter parser

2011-09-22 Thread Roman Chyla

Hi, I agree that people can register arbitrary qparsers, however  the
question might have been understoo differently - about the ANLR parser
that can handle what solr qparser does (and that one is looking at
_query_: and similar stuff -- or at local params, which is what can be
copy&pasted into the business logic of the new parser; ie. the
solution might be similar to what is already done in solr qparser)

I think I'm going to try just that :)

So here is my working ANTLR grammar for Lucene in case anybody is interested:
https://github.com/romanchyla/montysolr/tree/master/src/java/org/apache/lucene/queryParser/iqp/antlr

And I plan to build now a wrapper that calls this parser to parse the
query, get the tree, then translate the tree into lucene query object.
The local stuff {} may not even be part of the grammar -- some unclear
ideas in here, but they will be sorted out...

roman

On Wed, Aug 17, 2011 at 9:26 PM, Chris Hostetter
 wrote:
>
> : I'm looking for an ANTLR parser that consumes solr queries and filters.
> : Before I write my own, thought I'd ask if anyone has one they are
> : willing to share or can point me to one?
>
> I'm pretty sure that this will be imposisble to do in the general case --
> arbitrary QParser instances (that support arbitrary syntax) can be
> registered in the solrconfig.xml and specified using either localparams or
> defType.  so even if you did write a parser that understood all of the
> rules of all of hte default QParsers, and even if you made your parser
> smart enough to know how to look at other params (ie: defType, or
> variable substitution of "type") to understand which subset of parse rules
> to use, that still might give false positives or false failures if hte
> user registered their own QParser using a new name (or changed the
> names used in registrating existing parsers)
>
> The main question i have is: why are you looking for an ANTLR paser to do
> this?  what is your goal?
>
> https://people.apache.org/~hossman/#xyproblem
> Your question appears to be an "XY Problem" ... that is: you are dealing
> with "X", you are assuming "Y" will help you, and you are asking about "Y"
> without giving more details about the "X" so that we can understand the
> full issue.  Perhaps the best solution doesn't involve "Y" at all?
> See Also: http://www.perlmonks.org/index.pl?node_id=542341
>
>
>
>
> -Hoss
>

Re: Optimize fails with OutOfMemory Exception - sun.nio.ch.FileChannelImpl.map involved

2011-09-22 Thread Michael McCandless

Are you sure you are using a 64 bit JVM?

Are you sure you really changed your vmem limit to unlimited?  That
should have resolved the OOME from mmap.

Or: can you run "cat /proc/sys/vm/max_map_count"?  This is a limit on
the total number of maps in a single process, that Linux imposes.  But
the default limit is usually high (64K), so it'd be surprising if you
are hitting that unless it's lower in your env.

The amount of [free] RAM on the machine should have no bearing on
whether mmap succeeds or fails; it's the available address space (32
bit is tiny; 64 bit is immense) and then any OS limits imposed.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Sep 22, 2011 at 5:27 AM, Ralf Matulat  wrote:
> Good morning!
> Recently we slipped into an OOME by optimizing our index. It looks like it's
> regarding to the nio class and the memory-handling.
> I'll try to describe the environment, the error and what we did to solve the
> problem. Nevertheless, none of our approaches was successful.
>
> The environment:
>
> - Tested with both, SOLR 3.3 & 3.4
> - SuSE SLES 11 (X64)virtual machine with 16GB RAM
> - ulimi: virtual memory 14834560 (14GB)
> - Java: java-1_6_0-ibm-1.6.0-124.5
> - Apache Tomcat/6.0.29
>
> - Index Size (on filesystem): ~5GB, 1.1 million text documents.
>
> The error:
> First, building the index from scratch with a mysql DIH, with an empty
> index-Dir works fine.
> Building an index with &command=full-import, when the old segment files
> still in place, fails with an OutOfMemoryException. Same as optimizing the
> index.
> Doing an optimize fails after some time with:
>
> SEVERE: java.io.IOException: background merge hit exception:
> _6p(3.4):Cv1150724 _70(3.4):Cv667 _73(3.4):Cv7 _72(3.4):Cv4 _71(3.4):Cv1
> into _74 [optimize]
>        at
> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2552)
>        at
> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2472)
>        at
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:410)
>        at
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
>        at
> org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
>        at
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:107)
>        at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:61)
>        at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
>        at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>        at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>        at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>        at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>        at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>        at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>        at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>        at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>        at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
>        at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
>        at
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>        at java.lang.Thread.run(Thread.java:735)
> Caused by: java.io.IOException: Map failed
>        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:765)
>        at
> org.apache.lucene.store.MMapDirectory$MMapIndexInput.(MMapDirectory.java:264)
>        at
> org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:216)
>        at
> org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:89)
>        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115)
>        at
> org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:710)
>        at
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4378)
>        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3917)
>        at
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
>        at
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)
> Caused by: java.lang.OutOfMemoryError: Map failed
>        at sun.nio.ch.FileChannelImpl.map0(Native Method)
>

Re: Optimize fails with OutOfMemory Exception - sun.nio.ch.FileChannelImpl.map involved

2011-09-22 Thread Ralf Matulat


Dear Mike,
thanks for your your reply.
Just a couple of minutes we found a solution or - to be honest - where 
we went wrong.
Our failure was the use of ulimit. We missed, that ulimit sets the vmem 
for each shell seperatly. So we set 'ulimit -v unlimited' on a shell, 
thinking that we've done the job correctly.
As we recognized our mistake, we added 'ulimit -v unlimited' to our  
init-Skript of the tomcat-instance and now it looks like everything 
works as aspected.

Need some further testing with the java versions, but I'm quite optimistic.
Best regards
Ralf

Am 22.09.2011 14:46, schrieb Michael McCandless:

Are you sure you are using a 64 bit JVM?

Are you sure you really changed your vmem limit to unlimited?  That
should have resolved the OOME from mmap.

Or: can you run "cat /proc/sys/vm/max_map_count"?  This is a limit on
the total number of maps in a single process, that Linux imposes.  But
the default limit is usually high (64K), so it'd be surprising if you
are hitting that unless it's lower in your env.

The amount of [free] RAM on the machine should have no bearing on
whether mmap succeeds or fails; it's the available address space (32
bit is tiny; 64 bit is immense) and then any OS limits imposed.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Sep 22, 2011 at 5:27 AM, Ralf Matulat  wrote:

Good morning!
Recently we slipped into an OOME by optimizing our index. It looks like it's
regarding to the nio class and the memory-handling.
I'll try to describe the environment, the error and what we did to solve the
problem. Nevertheless, none of our approaches was successful.

The environment:

- Tested with both, SOLR 3.3&  3.4
- SuSE SLES 11 (X64)virtual machine with 16GB RAM
- ulimi: virtual memory 14834560 (14GB)
- Java: java-1_6_0-ibm-1.6.0-124.5
- Apache Tomcat/6.0.29

- Index Size (on filesystem): ~5GB, 1.1 million text documents.

The error:
First, building the index from scratch with a mysql DIH, with an empty
index-Dir works fine.
Building an index with&command=full-import, when the old segment files
still in place, fails with an OutOfMemoryException. Same as optimizing the
index.
Doing an optimize fails after some time with:

SEVERE: java.io.IOException: background merge hit exception:
_6p(3.4):Cv1150724 _70(3.4):Cv667 _73(3.4):Cv7 _72(3.4):Cv4 _71(3.4):Cv1
into _74 [optimize]
at
org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2552)
at
org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2472)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:410)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
at
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
at
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:107)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:61)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:735)
Caused by: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:765)
at
org.apache.lucene.store.MMapDirectory$MMapIndexInput.(MMapDirectory.java:264)
at
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:216)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:89)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115)
at
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:710)
at

Re: Solr Indexing - Null Values in date field

2011-09-22 Thread mechravi25

Hi,

Thanks for the suggestions. This is the option I tried.

I changed the data type in my source to date and then indexed the field once
again.

for the particular field , in my query in dataimport file, I gave the
following condition IFNULL(startdate,NULL). 

The document was indexed sucessfully. But the field startdate was not
present in the document.

I have few other records in my source where in there is a value present in
the startdate but when I index that I am getting this exception

org.apache.solr.common.SolrException: Invalid Date String:'2011-09-21
18:28:32.733'
at org.apache.solr.schema.DateField.parseMath(DateField.java:163)
at 
org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:171)
at org.apache.solr.schema.SchemaField.createField(SchemaField.java:95)
at
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204)
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at 
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:618)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:261)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:185)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)


Please help.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Indexing-Null-Values-in-date-field-tp3355068p3358752.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: SpellCheck Print Multiple Collations

2011-09-22 Thread Dyer, James

If using SolrJ,

use QueryResponse.getSpellCheckResponse().getCollatedResults() .  This returns 
a List .  On each Collation object, getCollationQueryString() will 
return the corrected queries.  

Note that unless you specify "spellcheck.maxCollationTries", the collations 
might not return anything if re-queried.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: Kudzanai [mailto:kudzanai.vudzij...@gmail.com] 
Sent: Thursday, September 22, 2011 6:03 AM
To: solr-user@lucene.apache.org
Subject: SpellCheck Print Multiple Collations

Hi,

This is probably a very basic question but how do I get the returned
collations.

My spell check request is 

http://localhost:8983/solr/autocomplete/select?spellcheck.q=ipood%20tough&spellcheck=true&spellcheck.collate=true&spellcheck.build=true&spellcheck.maxCollations=3&rows=3&spellcheck.count=5

Part of my response is:

ipod tough
ipad tough
wood tough

My results are accurate but now how do i get the collations. What method do
i use in the API?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SpellCheck-Print-Multiple-Collations-tp3358391p3358391.html
Sent from the Solr - User mailing list archive at Nabble.com.

Snippets and Boundaryscanner in Highlighter

2011-09-22 Thread O. Klein

Im testing the new Boundaryscanner in the highlighter, but I can't get it to
show more then 1 snippet.

2

Bug or am I doing something wrong?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Snippets-and-Boundaryscanner-in-Highlighter-tp3358898p3358898.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: SpellCheck Print Multiple Collations

2011-09-22 Thread Kudzanai

I am using solrJ.

Here is what my method looks like.

 List suggestedTermsList = new ArrayList();
if(aQueryResponse == null) {
  return suggestedTermsList;
}

try {
  SpellCheckResponse spellCheckResponse =
aQueryResponse.getSpellCheckResponse();
  if(spellCheckResponse == null) {
throw new Exception("No SpellCheckResponse in QueryResponse");
  }  
  
 List collationList =
spellCheckResponse.getCollatedResults();
  
  for(Collation c : collationList){
suggestedTermsList.add(c.getCollationQueryString());
  }
  
}catch(Exception e) {
  Trace.Log("SolrSpellCheck",Trace.HIGH, "Exception: " +
e.getMessage());
}
return suggestedTermsList;
  }

My response header is like so:

spellcheck={suggestions={ipood={numFound=5,startOffset=0,endOffset=5,suggestion=[ipod,
ipad, wood, food, pod]},collation=ipod tough,collation=ipad
tough,collation=wood tough,collation=food tough}}}


I get 4 collations  [collation=ipod tough,collation=ipad
tough,collation=wood tough,collation=food tough] ,
which I want to add to a List suggestedTermsList which I then return to the
calling code. Right now my ArrayList has 4 collations but it only has the
last collation repeated 4 times. i.e food tough - four times.

spellcheck.maxCollationTries set to 1 causes my QueryResponse to be null.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/SpellCheck-Print-Multiple-Collations-tp3358391p3358930.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: SpellCheck Print Multiple Collations

2011-09-22 Thread Dyer, James

Try adding "spellcheck.collateExtendedResults=true" to your query (without 
"maxCollationTries") to see if solrj correctly returns all 4 collations in that 
case.  In any case, if solrj is returning the last collation 4 times, this is 
likely a bug.

The likely reason why "spellcheck.maxCollationTries=1" results in a null is 
that the first collation it tried didn't result in any hits.  Because you're 
only allowing 1 try it won't attempt to check any alternatives and instead 
returns nothing.  Generally if using this parameter, you'd want to set it at 
least to whatever value you've got for "maxCollations", possibly a few higher.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Kudzanai [mailto:kudzanai.vudzij...@gmail.com] 
Sent: Thursday, September 22, 2011 9:20 AM
To: solr-user@lucene.apache.org
Subject: RE: SpellCheck Print Multiple Collations

I am using solrJ.

Here is what my method looks like.

 List suggestedTermsList = new ArrayList();
if(aQueryResponse == null) {
  return suggestedTermsList;
}

try {
  SpellCheckResponse spellCheckResponse =
aQueryResponse.getSpellCheckResponse();
  if(spellCheckResponse == null) {
throw new Exception("No SpellCheckResponse in QueryResponse");
  }  
  
 List collationList =
spellCheckResponse.getCollatedResults();
  
  for(Collation c : collationList){
suggestedTermsList.add(c.getCollationQueryString());
  }
  
}catch(Exception e) {
  Trace.Log("SolrSpellCheck",Trace.HIGH, "Exception: " +
e.getMessage());
}
return suggestedTermsList;
  }

My response header is like so:

spellcheck={suggestions={ipood={numFound=5,startOffset=0,endOffset=5,suggestion=[ipod,
ipad, wood, food, pod]},collation=ipod tough,collation=ipad
tough,collation=wood tough,collation=food tough}}}


I get 4 collations  [collation=ipod tough,collation=ipad
tough,collation=wood tough,collation=food tough] ,
which I want to add to a List suggestedTermsList which I then return to the
calling code. Right now my ArrayList has 4 collations but it only has the
last collation repeated 4 times. i.e food tough - four times.

spellcheck.maxCollationTries set to 1 causes my QueryResponse to be null.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/SpellCheck-Print-Multiple-Collations-tp3358391p3358930.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: SpellCheck Print Multiple Collations

2011-09-22 Thread Kudzanai

spellcheck.collateExtendedResults = true seems to have sorted my problem. 

My other parameters are:

spellcheck =  true
spellcheck.count =  aNumResults
spellcheck.q =  SEARCH TEXT
spellcheck.build=  true
spellcheck.collate=  true
spellcheck.maxCollations= 4
spellcheck.collateExtendedResults = true
suggestionCount = 5;
rows = 0


It seems to work perfectly now. Thanks a lot.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SpellCheck-Print-Multiple-Collations-tp3358391p3358970.html
Sent from the Solr - User mailing list archive at Nabble.com.

Multi-word searches in multi-valued fields

2011-09-22 Thread Olson, Ron

Hi all-

I'm not clear on how to allow a user to search a multi-valued field with 
multiple words and return only those documents where all the words are together 
in one value, and not spread over multiple values.

If I do a literal search on the "company name" field for "smith trucking" (with 
the quotes), then it works because it's looking for only "smith trucking", and 
it finds it, great. However, if I put in "trucking smith", then I get no 
results. If I try using something like (+trucking +smith), then I get documents 
where one document might have "joe's trucking" and "bob smith" in the resulting 
array of names.

So I guess what I need is an exact match, regardless of word positioning (i.e. 
"smith trucking" and "trucking smith" should find only those documents that 
have that those two words in one value of the resulting array).

I've been going through the wiki and it seems like this is probably a 
super-simple thing, but I'm clearly just not getting it; I just can't figure 
out the right syntax to make this work.

Thanks for any info.

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.

Re: How to get the fields that match the request?

2011-09-22 Thread Tanner Postert

this would be useful to me as well.

even when searching with q=test, I know it defaults to the default search
field, but it would helpful to know what field(s) match the query term.

On Thu, Sep 22, 2011 at 3:29 AM, Nicolas Martin wrote:

> Hi everyBody,
>
> I need your help to get more information in my solR query's response.
>
> i've got a simple input text which allows me to query several fields in the
> same query.
>
> So my query  looks like this
> "q=email:martyn+OR+name:**martynn+OR+commercial:martyn ..."
>
> Is it possible in the response to know the fields where "martynn" has been
> found ?
>
> Thanks a Lot :-)
>

Re: How to get the fields that match the request?

2011-09-22 Thread Nicolas Martin

yes, highlights can help to do that, but if you wants to paginate your 
results, you can't use hl.


It'd be great to have a scoring average by fields...




On 22/09/2011 17:37, Tanner Postert wrote:

this would be useful to me as well.

even when searching with q=test, I know it defaults to the default search
field, but it would helpful to know what field(s) match the query term.

On Thu, Sep 22, 2011 at 3:29 AM, Nicolas Martinwrote:

   

Hi everyBody,

I need your help to get more information in my solR query's response.

i've got a simple input text which allows me to query several fields in the
same query.

So my query  looks like this
"q=email:martyn+OR+name:**martynn+OR+commercial:martyn ..."

Is it possible in the response to know the fields where "martynn" has been
found ?

Thanks a Lot :-)

Re: Optimize fails with OutOfMemory Exception - sun.nio.ch.FileChannelImpl.map involved

2011-09-22 Thread Michael McCandless

OK, excellent.  Thanks for bringing closure,

Mike McCandless

http://blog.mikemccandless.com

On Thu, Sep 22, 2011 at 9:00 AM, Ralf Matulat  wrote:
> Dear Mike,
> thanks for your your reply.
> Just a couple of minutes we found a solution or - to be honest - where we
> went wrong.
> Our failure was the use of ulimit. We missed, that ulimit sets the vmem for
> each shell seperatly. So we set 'ulimit -v unlimited' on a shell, thinking
> that we've done the job correctly.
> As we recognized our mistake, we added 'ulimit -v unlimited' to our
>  init-Skript of the tomcat-instance and now it looks like everything works
> as aspected.
> Need some further testing with the java versions, but I'm quite optimistic.
> Best regards
> Ralf
>
> Am 22.09.2011 14:46, schrieb Michael McCandless:
>>
>> Are you sure you are using a 64 bit JVM?
>>
>> Are you sure you really changed your vmem limit to unlimited?  That
>> should have resolved the OOME from mmap.
>>
>> Or: can you run "cat /proc/sys/vm/max_map_count"?  This is a limit on
>> the total number of maps in a single process, that Linux imposes.  But
>> the default limit is usually high (64K), so it'd be surprising if you
>> are hitting that unless it's lower in your env.
>>
>> The amount of [free] RAM on the machine should have no bearing on
>> whether mmap succeeds or fails; it's the available address space (32
>> bit is tiny; 64 bit is immense) and then any OS limits imposed.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Thu, Sep 22, 2011 at 5:27 AM, Ralf Matulat
>>  wrote:
>>>
>>> Good morning!
>>> Recently we slipped into an OOME by optimizing our index. It looks like
>>> it's
>>> regarding to the nio class and the memory-handling.
>>> I'll try to describe the environment, the error and what we did to solve
>>> the
>>> problem. Nevertheless, none of our approaches was successful.
>>>
>>> The environment:
>>>
>>> - Tested with both, SOLR 3.3&  3.4
>>> - SuSE SLES 11 (X64)virtual machine with 16GB RAM
>>> - ulimi: virtual memory 14834560 (14GB)
>>> - Java: java-1_6_0-ibm-1.6.0-124.5
>>> - Apache Tomcat/6.0.29
>>>
>>> - Index Size (on filesystem): ~5GB, 1.1 million text documents.
>>>
>>> The error:
>>> First, building the index from scratch with a mysql DIH, with an empty
>>> index-Dir works fine.
>>> Building an index with&command=full-import, when the old segment files
>>> still in place, fails with an OutOfMemoryException. Same as optimizing
>>> the
>>> index.
>>> Doing an optimize fails after some time with:
>>>
>>> SEVERE: java.io.IOException: background merge hit exception:
>>> _6p(3.4):Cv1150724 _70(3.4):Cv667 _73(3.4):Cv7 _72(3.4):Cv4 _71(3.4):Cv1
>>> into _74 [optimize]
>>>        at
>>> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2552)
>>>        at
>>> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2472)
>>>        at
>>>
>>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:410)
>>>        at
>>>
>>> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
>>>        at
>>>
>>> org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
>>>        at
>>>
>>> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:107)
>>>        at
>>>
>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:61)
>>>        at
>>>
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
>>>        at
>>>
>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
>>>        at
>>>
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
>>>        at
>>>
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>>        at
>>>
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>>        at
>>>
>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>>        at
>>>
>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>>        at
>>>
>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>>>        at
>>>
>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>>        at
>>>
>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>>        at
>>>
>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>>>        at
>>>
>>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
>>>        at
>>>
>>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
>>>        at
>>> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>>>        at java.

mlt content stream help

2011-09-22 Thread dan whelan

I would like to use MLT and the content stream feature in solr like on 
this page:


http://wiki.apache.org/solr/MoreLikeThisHandler

How should the request handler / solrconfig be setup?

I enabled streaming and I set a requestHandler up by copying the default 
request handler and I changed the name to:


name="/mlt"

but when accessing the url like the example on the wiki I get a NPE 
because q is not supplied


I'm sure I am just doing it wrong just not sure what.

Thanks,

dan

Re: How to get the fields that match the request?

2011-09-22 Thread Rahul Warawdekar

Hi,

Before considering highlighting to address this requirement, you also need
to consider the performance implications of highlighting for large text
fields.

On Thu, Sep 22, 2011 at 11:42 AM, Nicolas Martin wrote:

> yes, highlights can help to do that, but if you wants to paginate your
> results, you can't use hl.
>
> It'd be great to have a scoring average by fields...
>
>
>
>
>
> On 22/09/2011 17:37, Tanner Postert wrote:
>
>> this would be useful to me as well.
>>
>> even when searching with q=test, I know it defaults to the default search
>> field, but it would helpful to know what field(s) match the query term.
>>
>> On Thu, Sep 22, 2011 at 3:29 AM, Nicolas Martin**
>> wrote:
>>
>>
>>
>>> Hi everyBody,
>>>
>>> I need your help to get more information in my solR query's response.
>>>
>>> i've got a simple input text which allows me to query several fields in
>>> the
>>> same query.
>>>
>>> So my query  looks like this
>>> "q=email:martyn+OR+name:martynn+OR+commercial:martyn ..."
>>>
>>> Is it possible in the response to know the fields where "martynn" has
>>> been
>>> found ?
>>>
>>> Thanks a Lot :-)
>>>
>>>
>>>
>>
>>
>
>


-- 
Thanks and Regards
Rahul A. Warawdekar

Autosuggest best practice / feedback

2011-09-22 Thread Doug McKenzie


Hi there,

I'm relatively new to Solr and have been playing around with it for a 
few weeks now. I've got a system setup now that I'm currently quite 
happy with and is returning some decent results (although there's always 
room for improvement). Just hoping to get some feedback on the setup


Currently running 2 seperate Solr engines, one tasked with storing 
products and their various info, the other is storing previous site 
searches and is being used for auto suggest functionality.


The auto suggest schema :

positionIncrementGap="100">




words="stopwords_en.txt" enablePositionIncrement="true"/>
maxGramSize="15" side="front"/>








Stopwords is being used to filter out rude words from previous searches 
(is this the best way of doing things?)


Also looking at implementing a "Did you mean?" suggestor which will 
probably search against a WhitespaceTokened field of the same data 
rather than this one.


Any thoughts / feedback / comments / criticism / biscuits appreciated

Cheers
Doug

--
Become a Firebox Fan on Facebook: http://facebook.com/firebox
And Follow us on Twitter: http://twitter.com/firebox

Firebox has been nominated for Retailer of the Year in the 2011 Stuff Awards. 
Who will win? It's up to you! Visit http://www.stuff.tv/awards and place your 
vote. We'll do a special dance if it's us.

Firebox HQ is MOVING HOUSE! We're migrating from Streatham Hill to  shiny new 
digs in Shoreditch. As of 3rd October please update your records to:
Firebox.com, 6.10 The Tea Building, 56 Shoreditch High Street, London, E1 6JJ

Global Head Office: Firebox House, Ardwell Road, London SW2 4RT
Firebox.com Ltd is registered in England and Wales, company number 3874477
Registered Company Address: 41 Welbeck Street London W1G 8EA Firebox.com

Any views expressed in this email are those of the individual sender, except 
where the sender expressly, and with authority, states them to be the views of 
Firebox.com Ltd.

Lot of ORs in a query and De Morgan Law

2011-09-22 Thread Emmanuel Espina

I have queries with a big big amount of OR terms. The AND terms are much
more convenient to handle because they can be turned into several filter
queries and cached.

Thinking in innovative solutions I recalled the De Morgan Laws
http://en.wikipedia.org/wiki/De_Morgan's_laws of Boolean Algebras, and
considering that Set theory can be considered a generalized Boolean algebra
(with: intersection as . (dot),  union as +,  complement as negation,  full
set as 1 and empty set as 0) can be an innovative way of solving it

With regular filter queries, all the fq filters are merged into one (using
intersection of sets) and in the results of the query we only consider the
documents contained in that resulting set. If we had an "inverse filter
query" that merges all the filters into one (performing an intersection
among the filter sets in the regular way) and then considering in the
results only documents NOT contained in the resulting set, we could
implement multiple OR queries in a much more performant way by aplying De
Morgan law to the query.

What are your opinions on this?

Thanks
Emmanuel

Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-22 Thread Shawn Heisey


On 9/21/2011 4:22 PM, Michael Ryan wrote:

I think the problem is that the  config needs to be inside of the
  config, rather than after it as your have.


Thank you, that took care of it.  With mergeFactor set to 8 and the TMP 
options set to 35, it merges after 35 segments.


Shawn

Re: Optimize fails with OutOfMemory Exception - sun.nio.ch.FileChannelImpl.map involved

2011-09-22 Thread Shawn Heisey


Michael,

What is the best central place on an rpm-based distro (CentOS 6 in my 
case) to raise the vmem limit for specific user(s), assuming it's not 
already correct?  I'm using /etc/security/limits.conf to raise the open 
file limit for the user that runs Solr:


ncindex hardnofile  65535
ncindex softnofile  49151

Thanks,
Shawn


On 9/22/2011 9:56 AM, Michael McCandless wrote:

OK, excellent.  Thanks for bringing closure,

Mike McCandless

http://blog.mikemccandless.com

On Thu, Sep 22, 2011 at 9:00 AM, Ralf Matulat  wrote:

Dear Mike,
thanks for your your reply.
Just a couple of minutes we found a solution or - to be honest - where we
went wrong.
Our failure was the use of ulimit. We missed, that ulimit sets the vmem for
each shell seperatly. So we set 'ulimit -v unlimited' on a shell, thinking
that we've done the job correctly.
As we recognized our mistake, we added 'ulimit -v unlimited' to our
  init-Skript of the tomcat-instance and now it looks like everything works
as aspected.

Re: Optimize fails with OutOfMemory Exception - sun.nio.ch.FileChannelImpl.map involved

2011-09-22 Thread Michael McCandless

Unfortunately I really don't know ;)  Every time I set forth to figure
things like this out I seem to learn some new way...

Maybe someone else knows?

Mike McCandless

http://blog.mikemccandless.com

On Thu, Sep 22, 2011 at 2:15 PM, Shawn Heisey  wrote:
> Michael,
>
> What is the best central place on an rpm-based distro (CentOS 6 in my case)
> to raise the vmem limit for specific user(s), assuming it's not already
> correct?  I'm using /etc/security/limits.conf to raise the open file limit
> for the user that runs Solr:
>
> ncindex         hard    nofile  65535
> ncindex         soft    nofile  49151
>
> Thanks,
> Shawn
>
>
> On 9/22/2011 9:56 AM, Michael McCandless wrote:
>>
>> OK, excellent.  Thanks for bringing closure,
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Thu, Sep 22, 2011 at 9:00 AM, Ralf Matulat
>>  wrote:
>>>
>>> Dear Mike,
>>> thanks for your your reply.
>>> Just a couple of minutes we found a solution or - to be honest - where we
>>> went wrong.
>>> Our failure was the use of ulimit. We missed, that ulimit sets the vmem
>>> for
>>> each shell seperatly. So we set 'ulimit -v unlimited' on a shell,
>>> thinking
>>> that we've done the job correctly.
>>> As we recognized our mistake, we added 'ulimit -v unlimited' to our
>>>  init-Skript of the tomcat-instance and now it looks like everything
>>> works
>>> as aspected.
>>>
>
>

Re: StopWords coming in Top 10 terms despite using StopFilterFactory

2011-09-22 Thread Shawn Heisey


On 9/22/2011 3:54 AM, Pranav Prakash wrote:

Hi List,

I included StopFilterFactory and I  can see it taking action in the Analyzer
Interface. However, when I go to Schema Analyzer, I see those stop words in
the top 10 terms. Is this normal?

















You've got CommonGramsFilterFactory and StopFilterFactory both using 
stopwords.txt, which is a confusing configuration.  Normally you'd want 
one or the other, not both ... but if you did legitimately have both, 
you'd want them to each use a different wordlist.


The commongrams filter turns each found occurrence of a word in the file 
into two tokens - one prepended with the token before it, one appended 
with the token after it.  If it's the first or last term in a field, it 
only produces one token.  When it gets to the stopfilter, the combined 
terms no longer match what's in stopwords.txt, so no action is taken.


If I had to guess, what you are seeing in the top 10 terms is the 
concatenation of your most common stopword with another word.  If it 
were English, I would guess that to be "of_the" or something similar.  
If my guess is wrong, then I'm not sure what's going on, and some 
cut/paste of what you're actually seeing might be in order.  Did you do 
delete and do a full reindex after you changed your schema?


Thanks,
Shawn

JdbcDataSource and threads

2011-09-22 Thread Vazquez, Maria (STM)

Hi!

So as of 3.4 JdbcDataSource doesn't work with threads, correct?

 

https://issues.apache.org/jira/browse/SOLR-2233

 

I'm using Microsoft SQL Server, my data-config.xml has a lot of very
complex SQL queries and it takes a long time to index.

I'm migrating from Lucene to Solr and the Lucene code uses threads so it
takes little time to index, now in Solr if I add threads=xx to my
rootEntity I get lots of errors about connections being closed.

 

Thanks a lot,

Maria

Re: strange copied field problem

2011-09-22 Thread Chris Hostetter


: No probs. I would still hope someone would comment on you thread with
: some expert opinions about making a copy of a copy :)

https://wiki.apache.org/solr/SchemaXml#Copy_Fields

"The copy is done at the stream source level and no copy 
feeds into another copy."

If solr tried to chain copyFields, it might copy more then you actually 
wanted to, and/or it could easily get into infinite loop type situations.

As implemented, you can be very explicit about what you want, but the 
trade off is that you *have* to be very explicit about what you want.

-Hoss

Re: Sort five random "Top Offers" to the top

2011-09-22 Thread Sujit Pal

Not the OP, but this is /much/ simpler, although at the expense of
making 2 calls to solr. But the upside is that no customization is
required.

On Thu, 2011-09-22 at 09:43 +0100, Doug McKenzie wrote:
> Could you not just do your normal search with and add a filter query on? 
> fq=topoffer:true
> 
> That would then return only results with top offer : true and then use 
> whatever shuffling / randomising you like in your application. 
> Alternately you could even add sorting on relevance to show the top 5 
> closest matches to the query rows=5&sort=score desc
> 
> 
> 
> On 21/09/2011 21:26, Sujit Pal wrote:
> > Hi MOuli,
> >
> > AFAIK (and I don't know that much about Solr), this feature does not
> > exist out of the box in Solr. One way to achieve this could be to
> > construct a DocSet with topoffer:true and intersect it with your result
> > DocSet, then select the first 5 off the intersection, randomly shuffle
> > them, sublist [0:5], and move the sublist to the top of the results like
> > QueryElevationComponent does. Actually you may want to take a look at
> > QueryElevationComponent code for inspiration (this is where I would have
> > looked if I had to implement something similar).
> >
> > -sujit
> >
> > On Wed, 2011-09-21 at 06:54 -0700, MOuli wrote:
> >> Hey Community.
> >>
> >> I got a Lucene/Solr Index with many offers. Some of them are marked by a
> >> flag field "topoffer" that they are top offers. Now I want so sort randomly
> >> 5 of this offers on the top.
> >>
> >> For Example
> >> HTC Sensation
> >>   - topoffer = true
> >> HTC Desire
> >>   - topoffer = false
> >> Samsung Galaxy S2
> >>   - topoffer = ture
> >> IPhone 4
> >>   - topoffer = true
> >> ...
> >>
> >> When i search for a Handy then i want that first 3 offers are HTC 
> >> Sensation,
> >> Samsung Galaxy S2 and the iPhone 4.
> >>
> >>
> >> Does anyone have an idea?
> >>
> >> PS.: I hope my english is not to bad
> >>
> >> --
> >> View this message in context: 
> >> http://lucene.472066.n3.nabble.com/Sort-five-random-Top-Offers-to-the-top-tp3355469p3355469.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> --
> Become a Firebox Fan on Facebook: http://facebook.com/firebox
> And Follow us on Twitter: http://twitter.com/firebox
> 
> Firebox has been nominated for Retailer of the Year in the 2011 Stuff Awards. 
> Who will win? It's up to you! Visit http://www.stuff.tv/awards and place your 
> vote. We'll do a special dance if it's us.
> 
> Firebox HQ is MOVING HOUSE! We're migrating from Streatham Hill to  shiny new 
> digs in Shoreditch. As of 3rd October please update your records to:
> Firebox.com, 6.10 The Tea Building, 56 Shoreditch High Street, London, E1 6JJ
> 
> Global Head Office: Firebox House, Ardwell Road, London SW2 4RT
> Firebox.com Ltd is registered in England and Wales, company number 3874477
> Registered Company Address: 41 Welbeck Street London W1G 8EA Firebox.com
>  
> Any views expressed in this email are those of the individual sender, except 
> where the sender expressly, and with authority, states them to be the views 
> of Firebox.com Ltd.

Re: Sort five random "Top Offers" to the top

2011-09-22 Thread Sujit Pal

I have a few blog posts on this...
http://sujitpal.blogspot.com/2011/04/custom-solr-search-components-2-dev.html 
http://sujitpal.blogspot.com/2011/04/more-fun-with-solr-component.html 
http://sujitpal.blogspot.com/2011/02/solr-custom-search-requesthandler.html 

but its quite simple, just look at some of the ones already in there.

If you need books, check out the Apache Solr 3.1 Cookbook - it has a
chapter on how to do this.

-sujit

On Thu, 2011-09-22 at 02:13 -0700, MOuli wrote:
> Hmm is it possible for me to write my own search component?
> 
> I just downloaded the solr sources and need some informations how the search
> components work. Is there anything out there which can help me?
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Sort-five-random-Top-Offers-to-the-top-tp3355469p3358152.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: JdbcDataSource and threads

2011-09-22 Thread Rahul Warawdekar

Hi,

Have you applied the patch that is provided with the Jira you mentioned ?
https://issues.apache.org/jira/browse/SOLR-2233

Please apply the patch and check if you are getting the same exceptions.
It has worked well for me till now.

On Thu, Sep 22, 2011 at 3:17 PM, Vazquez, Maria (STM) <
maria.vazq...@dexone.com> wrote:

> Hi!
>
> So as of 3.4 JdbcDataSource doesn't work with threads, correct?
>
>
>
> https://issues.apache.org/jira/browse/SOLR-2233
>
>
>
> I'm using Microsoft SQL Server, my data-config.xml has a lot of very
> complex SQL queries and it takes a long time to index.
>
> I'm migrating from Lucene to Solr and the Lucene code uses threads so it
> takes little time to index, now in Solr if I add threads=xx to my
> rootEntity I get lots of errors about connections being closed.
>
>
>
> Thanks a lot,
>
> Maria
>
>

-- 
Thanks and Regards
Rahul A. Warawdekar

Re: Sort five random "Top Offers" to the top

2011-09-22 Thread Sujit Pal

Sorry hit send too soon. Personally, given the use case, I think I would
still prefer the two query approach. It seems way too much work to do a
handler (unless you want to learn how to do it) to support this.

On Thu, 2011-09-22 at 12:31 -0700, Sujit Pal wrote:
> I have a few blog posts on this...
> http://sujitpal.blogspot.com/2011/04/custom-solr-search-components-2-dev.html 
> http://sujitpal.blogspot.com/2011/04/more-fun-with-solr-component.html 
> http://sujitpal.blogspot.com/2011/02/solr-custom-search-requesthandler.html 
> 
> but its quite simple, just look at some of the ones already in there.
> 
> If you need books, check out the Apache Solr 3.1 Cookbook - it has a
> chapter on how to do this.
> 
> -sujit
> 
> On Thu, 2011-09-22 at 02:13 -0700, MOuli wrote:
> > Hmm is it possible for me to write my own search component?
> > 
> > I just downloaded the solr sources and need some informations how the search
> > components work. Is there anything out there which can help me?
> > 
> > --
> > View this message in context: 
> > http://lucene.472066.n3.nabble.com/Sort-five-random-Top-Offers-to-the-top-tp3355469p3358152.html
> > Sent from the Solr - User mailing list archive at Nabble.com.

ScriptTransformer question

2011-09-22 Thread Pulkit Singhal

Hello,

I'm using DIH in the trunk version and I have placed breakpoints in
the Solr code.
I can see that the value for a row being fed into the
ScriptTransformer instance is:
{buybackPlans.buybackPlan.type=[PSP-PRP],
buybackPlans.buybackPlan.name=[2-Year Buy Back Plan],
buybackPlans.buybackPlan.sku=[2490748],
$forEach=/products/product/buybackPlans/buybackPlan,
buybackPlans.buybackPlan.price=[]}

Now price cannot be empty because Solr will complain so the following
script should be running but it doesn't do anything!!!
Can anyone spot the issue here?
function skipEmptyFieldsInBuybackPlans(row) {
var buybackPlans_buybackPlan_price = row.get(
'buybackPlans.buybackPlan.price' );
if ( buybackPlans_buybackPlan_price == null ||
 buybackPlans_buybackPlan_price == '' ||
 buybackPlans_buybackPlan_price.length == 0)
{
row.remove( 'buybackPlans.buybackPlan.price' );
}
return row;
}
I would hate to have to get the rhino javascript engine source code
and step-through that.
I'm sure I'm being really dumb and am hoping that someone on the Solr
mailing list can help me spot the issue :)

Thanks!
- Pulkit

public constructor for KStemmer

2011-09-22 Thread Ofer Fort

He all,
I was very happy to see that Kstemmer implementation was added to lucene.
I was wondering why is the constructor not public?
I have a case where i want to create an analyzer that uses the stemmer
itself, and in order to construct a new instance, it has to be in the same
package and be loaded by the same classloader.
I know i can just change the source file, or add my jar to the solr.war, but
i was wondering if there is a reason why this constructor was not added to
the class.

thanks
ofer

Re: Snippets and Boundaryscanner in Highlighter

2011-09-22 Thread Koji Sekiguchi


(11/09/22 23:09), O. Klein wrote:

Im testing the new Boundaryscanner in the highlighter, but I can't get it to
show more then 1 snippet.

2

Bug or am I doing something wrong?


I think your content_text is too short to get more than one snippets?

Try the following with solr example (I'm using trunk):

1.
http://localhost:8983/solr/select?q=SD+AND+battery&fq=&fl=includes&hl=on&hl.fl=includes&hl.useFastVectorHighlighter=true&hl.snippets=2

you request 2 snippets, but Solr will return 1 snippet:


  

  32MB SD card, USB cable, AV cable, battery 

  


2.
http://localhost:8983/solr/select?q=SD+AND+battery&fq=&fl=includes&hl=on&hl.fl=includes&hl.useFastVectorHighlighter=true&hl.snippets=2&hl.fragsize=18

now you request 2 snippets with shorter fragsize option, then Solr can return 2 
snippets:


  

  32MB SD card, USB cable
  cable, battery 

  


koji
--
Check out "Query Log Visualizer" for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/

Re: SolrCloud state

2011-09-22 Thread Otis Gospodnetic

Hi,

We just published a post on this topic the other day, have a look:

http://blog.sematext.com/2011/09/14/solr-digest-spring-summer-2011-part-2-solr-cloud-and-near-real-time-search/

Otis


Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



>
>From: Miguel Coxo 
>To: solr-user@lucene.apache.org
>Sent: Wednesday, September 21, 2011 1:45 PM
>Subject: SolrCloud state
>
>Hi there.
>
>I'm starting a new project using solr and i would like to know if solr is
>able to setup a cluster with fault tolerance.
>
>I'm setting up an environment with two shards. Each shard should have a
>replica.
>
>What i would like to know is if a shard master fails will the replica be
>"promoted" to a master. Or will it remain search only and only recover when
>a new master is setup.
>
>Also how is the document indexing distributed by the shards? Can i add a new
>shard dynamically?
>
>All the best, Miguel Coxo.
>
>
>

Re: Snippets and Boundaryscanner in Highlighter

2011-09-22 Thread O. Klein

Thanx for you answer, but you are not using the Boundaryscanner

breakIterator
LINE

was the config I used and with

2

I expect to see 2 lines, but I only see one.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Snippets-and-Boundaryscanner-in-Highlighter-tp3358898p3360398.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Snippets and Boundaryscanner in Highlighter

2011-09-22 Thread Koji Sekiguchi


(11/09/23 7:59), O. Klein wrote:

Thanx for you answer, but you are not using the Boundaryscanner


No. Regardless of specifying BoundaryScanner or not, it is used implicitly
because BaseFragmentsBuilder always use it (SimpleBoundaryScanner is the 
default).

Try to index a long text and highlight the first and the last of the text:

q=A B


A ... very looong text ... B


koji
--
Check out "Query Log Visualizer" for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/

strategy for post-processing answer set

2011-09-22 Thread Fred Zimmerman

>
> Hi,


I would like to take the HTML documents that are the result of a Solr search
and combine them into a single HTML document that combines the body text of
each individual document.  What is a good strategy for this? I am crawling
with Nutch and Carrot2 for clustering.
Fred

Re: strategy for post-processing answer set

2011-09-22 Thread Markus Jelsma

Hi,

Solr support the Velocity template engine and has veyr good support. Ideal for 
generating properly formatted output from the search engine. There's a 
clustering example and it's easy to format documents indexed by Nutch.

http://wiki.apache.org/solr/VelocityResponseWriter

Cheers

> > Hi,
> 
> I would like to take the HTML documents that are the result of a Solr
> search and combine them into a single HTML document that combines the body
> text of each individual document.  What is a good strategy for this? I am
> crawling with Nutch and Carrot2 for clustering.
> Fred

Re: Snippets and Boundaryscanner in Highlighter

2011-09-22 Thread O. Klein

The content_text field is filled with text from pdf's. So this is not the
problem. Besides the regex fragmenter gives back multiple snippets like
expected.

Have you tested to see if a boundaryscanner of type LINE gives back multiple
snippets with your content?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Snippets-and-Boundaryscanner-in-Highlighter-tp3358898p3360499.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: strategy for post-processing answer set

2011-09-22 Thread Fred Zimmerman

can you say a bit more about this? I see Velocity and will download it and
start playing around but I am not quite sure I understand all the steps that
you are suggesting.  Fred


On Thu, Sep 22, 2011 at 19:51, Markus Jelsma wrote:

> Hi,
>
> Solr support the Velocity template engine and has veyr good support. Ideal
> for
> generating properly formatted output from the search engine. There's a
> clustering example and it's easy to format documents indexed by Nutch.
>
> http://wiki.apache.org/solr/VelocityResponseWriter
>
> Cheers
>
> > > Hi,
> >
> > I would like to take the HTML documents that are the result of a Solr
> > search and combine them into a single HTML document that combines the
> body
> > text of each individual document.  What is a good strategy for this? I am
> > crawling with Nutch and Carrot2 for clustering.
> > Fred
>

Re: Snippets and Boundaryscanner in Highlighter

2011-09-22 Thread Koji Sekiguchi


(11/09/23 8:57), O. Klein wrote:

The content_text field is filled with text from pdf's. So this is not the
problem. Besides the regex fragmenter gives back multiple snippets like
expected.


This doesn't show that BoundaryScanner has the bug. Highlighter's fragmenter
and FVH FragmentsBuilder are totally different.


Have you tested to see if a boundaryscanner of type LINE gives back multiple
snippets with your content?


No, I haven't. Do you mean LINE type causes the problem? Can you get two 
snippets
if you use WORD type BreakIteratorBoundaryScanner?

You can implement your own BoundaryScanner instead, if you think
LINE BreakIterator doesn't work as you expected.

koji
--
Check out "Query Log Visualizer" for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/

Re: DIH error when nested db datasource and file data source

2011-09-22 Thread abhayd

Any help?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-error-when-nested-db-datasource-and-file-data-source-tp3345664p3360637.html
Sent from the Solr - User mailing list archive at Nabble.com.

autosuggest combination of data from documents and popular queries

2011-09-22 Thread abhayd

hi 
we already have autosuggest working using solr based on popular search
terms.
we use following approach..
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

Now we want to use data indexed in solr also for autosuggest. with popular
search terms to have higher priority.

can we just copy field containing doc text to a auto suggest filed which
does edgengram analysis?
also we have around 100 K docs in index so performance would be be a
concern?

Any help is really appreciated 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/autosuggest-combination-of-data-from-documents-and-popular-queries-tp3360657p3360657.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Slow autocomplete(terms)

2011-09-22 Thread abhayd

not sure if u already seen this but may be useful
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocomplete-terms-performance-problem-tp3351352p3360663.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: autosuggest combination of data from documents and popular queries

2011-09-22 Thread Otis Gospodnetic

Hello,

>hi 
>we already have autosuggest working using solr based on popular search
>terms.

Just terms of whole queries?  I assume the latter.

>we use following approach..
>http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
>
>Now we want to use data indexed in solr also for autosuggest. with popular
>search terms to have higher priority.
>
>can we just copy field containing doc text to a auto suggest filed which
>does edgengram analysis?

Something doesn't feel right here.  Using data from the index for suggestions 
makes sense - we do that on http://search-lucene.com/ for example.
Popular search terms having high priority and doc text, how does that work?
Oh, you mean if you have a doc with field body whose value is "foo bar baz" 
then, assuming the term "bar" is one of those popular search terms you would 
want "bar" to come up as a suggestion?

That's doable with some coding, yes, but I don't think this would create a very 
good search experience.

Here are some thoughts:
* instead of suggesting popular query terms, suggest popular query strings
* suggest phrases such as query strings, titles from a title field if you have 
it, author names from an author name field if you have it, and other fields of 
that nature
* ...

>also we have around 100 K docs in index so performance would be be a
>concern?


I think that depends on the implementation.  For example, suggestions you see 
on search-lucene.com are powered 
by http://sematext.com/products/autocomplete/index.html and that solution works 
well with millions of suggestions.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

Re: Slow autocomplete(terms)

2011-09-22 Thread Otis Gospodnetic

Hi Roy,

Try edge ngram instead.

See also: http://sematext.com/products/autocomplete/index.html (comes with a 
nice UI, a bunch of configurable things, etc.)


Otis


Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


- Original Message -
> From: roySolr 
> To: solr-user@lucene.apache.org
> Cc: 
> Sent: Thursday, September 22, 2011 5:02 AM
> Subject: Re: Slow autocomplete(terms)
> 
> Hello Erick,
> 
> Thanks for your answer but i have some problems with the ngramfilter.
> 
> My conf look like this:
> 
>  positionIncrementGap="100">
>       
>         
>         
>      minGramSize="2" maxGramSize="8"/>
>       
>       
>         
>         
>          
>       
>     
> 
> I see this in the analysis:
> 
> "manchester"
> 
> ma    an    nc    ch    he    es    st    te    er    man    anc    nch    
> che    hes    est    ste    ter    manc    anch    nche
> ches    hest    este    ster    manch    anche    nches    chest    heste    
> ester    manche    anches    nchest
> cheste    hester    manches    anchest    ncheste    chester    manchest    
> ancheste    nchester
> 
> When i use terms i see all this results back in the response. So i type
> "ches" i got this:
> 
> ches
> nches
> anches
> nchest
> ncheste
> 
> I want one suggestion with a total keyword: "manchester". Is this 
> possible?
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Autocomplete-terms-performance-problem-tp3351352p3358126.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Multi-word searches in multi-valued fields

2011-09-22 Thread Otis Gospodnetic

Ron,

Try "smith trucking"~N  where N is a number like 1 or 2 or 3 ... it's called 
phrase 
slop: http://search-lucene.com/?q=phrase+slop&fc_project=Lucene&fc_project=Solr

Otis


Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


- Original Message -
> From: "Olson, Ron" 
> To: "solr-user@lucene.apache.org" 
> Cc: 
> Sent: Thursday, September 22, 2011 10:49 AM
> Subject: Multi-word searches in multi-valued fields
> 
> Hi all-
> 
> I'm not clear on how to allow a user to search a multi-valued field with 
> multiple words and return only those documents where all the words are 
> together 
> in one value, and not spread over multiple values.
> 
> If I do a literal search on the "company name" field for "smith 
> trucking" (with the quotes), then it works because it's looking for 
> only "smith trucking", and it finds it, great. However, if I put in 
> "trucking smith", then I get no results. If I try using something like 
> (+trucking +smith), then I get documents where one document might have 
> "joe's trucking" and "bob smith" in the resulting array 
> of names.
> 
> So I guess what I need is an exact match, regardless of word positioning 
> (i.e. 
> "smith trucking" and "trucking smith" should find only those 
> documents that have that those two words in one value of the resulting array).
> 
> I've been going through the wiki and it seems like this is probably a 
> super-simple thing, but I'm clearly just not getting it; I just can't 
> figure out the right syntax to make this work.
> 
> Thanks for any info.
> 
> Ron
> 
> DISCLAIMER: This electronic message, including any attachments, files or 
> documents, is intended only for the addressee and may contain CONFIDENTIAL, 
> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
> recipient, you are hereby notified that any use, disclosure, copying or 
> distribution of this message or any of the information included in or with it 
> is  unauthorized and strictly prohibited.  If you have received this message 
> in 
> error, please notify the sender immediately by reply e-mail and permanently 
> delete and destroy this message and its attachments, along with any copies 
> thereof. This message does not create any contractual obligation on behalf of 
> the sender or Law Bulletin Publishing Company.
> Thank you.
>

RE: OutOfMemoryError coming from TermVectorsReader

2011-09-22 Thread Anand.Nigam

Hi,

I am trying to index application log files and some database tables. Size of 
the log files range from 1 MB to 100 MB. Database tables also have few 
thousands of rows.

I have used termvector highlighter for the content of the log files as 
mentioned below:

Heap size : 10 GB 
OS: Linux, 64 bit
Solr version : 3.4.0

Thanks & Regards
Anand



Anand Nigam
RBS Global Banking & Markets
Office: +91 124 492 5506   

-Original Message-
From: Glen Newton [mailto:glen.new...@gmail.com] 
Sent: 19 September 2011 16:52
To: solr-user@lucene.apache.org
Subject: Re: OutOfMemoryError coming from TermVectorsReader

Please include information about your heap size, (and other Java command line 
arguments) as well a platform OS (version, swap size, etc), Java version, 
underlying hardware (RAM, etc) for us to better help you.

>From the information you have given, increasing your heap size should help.

Thanks,
Glen

http://zzzoot.blogspot.com/


On Mon, Sep 19, 2011 at 1:34 AM,   wrote:
> Hi,
>
> I am new to solr. I an trying to index text documents of large size. On 
> searching from indexed documents I am getting following OutOfMemoryError. 
> Please help me in resolving this issue.
>
> The field which stores file content is configured in schema.xml as below:
>
>
>  omitNorms="true" termVectors="true" termPositions="true" 
> termOffsets="true" />
>
> and Highlighting is configured as below:
>
>
> on
>
> ${all.fields.list}
>
> 500
>
> true
>
>
>
> 2011-09-16 09:38:45.763 [http-thread-pool-9091(5)] ERROR - 
> java.lang.OutOfMemoryError: Java heap space
>        at 
> org.apache.lucene.index.TermVectorsReader.readTermVector(TermVectorsRe
> ader.java:503)
>        at 
> org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:2
> 63)
>        at 
> org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:2
> 84)
>        at 
> org.apache.lucene.index.SegmentReader.getTermFreqVector(SegmentReader.
> java:759)
>        at 
> org.apache.lucene.index.DirectoryReader.getTermFreqVector(DirectoryRea
> der.java:510)
>        at 
> org.apache.solr.search.SolrIndexReader.getTermFreqVector(SolrIndexRead
> er.java:234)
>        at 
> org.apache.lucene.search.vectorhighlight.FieldTermStack.(FieldTe
> rmStack.java:83)
>        at 
> org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getFiel
> dFragList(FastVectorHighlighter.java:175)
>        at 
> org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBest
> Fragments(FastVectorHighlighter.java:166)
>        at 
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastV
> ectorHighlighter(DefaultSolrHighlighter.java:509)
>        at 
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(Defaul
> tSolrHighlighter.java:376)
>        at 
> org.apache.solr.handler.component.HighlightComponent.process(Highlight
> Component.java:116)
>        at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sear
> chHandler.java:194)
>        at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> rBase.java:129)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
> java:356)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> .java:252)
>        at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> cationFilterChain.java:256)
>        at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> lterChain.java:215)
>        at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> lve.java:279)
>        at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
> lve.java:175)
>        at 
> org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.ja
> va:655)
>        at 
> org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java
> :595)
>        at 
> com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:98)
>        at 
> com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessi
> onLockingStandardPipeline.java:91)
>        at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
> va:162)
>        at 
> org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.ja
> va:326)
>        at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> :227)
>        at 
> com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerM
> apper.java:170)
>        at 
> com.sun.grizzly.http.ProcessorTask.invokeAdapter(ProcessorTask.java:82
> 2)
>        at 
> com.sun.grizzly.http.ProcessorTask.doProcess(ProcessorTask.java:719)
>        at 
> com.sun.grizzly.http.ProcessorTask.process(ProcessorTask.java:1013)
>
> Thanks & Regards
> Anand Nigam
> Developer
>
>
> **
> * The Royal Bank of Scotland plc.

Re: autocomplete with popularity

2011-09-22 Thread Otis Gospodnetic

Eugeny,

I think you want something more useful and less problematic as Wunder already 
pointed out.

Wouldn't you want your suggestions to be ordered by how close of match they 
are?  And do you really want them to be purely prefix-based like in your 
example?

What if people are searching for Michael Jackson a lot, but a person comes and 
starts typing Jackso would you not want to suggest Michael Jackson?  This 
is not to say you can't mix in popularity or some other factors that you know 
you can rely on.

Try the AutoComplete on http://search-lucene.com/ to see whether that feels 
like the right search experience.  For example, start typing the word "expert". 
 Because matches (sub)strings are bold, you will easily see where in suggested 
phrases this matches.

See also: http://sematext.com/products/autocomplete/index.html - I think one of 
the example configurations that this thing comes with actually does show how to 
mix in something like popularity.


Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


- Original Message -
> From: Sentsov Eugeny 
> To: solr-user@lucene.apache.org
> Cc: 
> Sent: Tuesday, September 20, 2011 1:05 PM
> Subject: autocomplete with popularity
> 
> hello,
> Is there autocomplete which counts requests and sorts suggestions according
> to this count? Ie if users request "redlands" 50 times and  reckless 
> 20
> times then suggestions for "re" should be
> "redlands"
> "reckless"
>

Re: Different Solr versions between Master and Slave(s)

2011-09-22 Thread Otis Gospodnetic

Tommaso,

We had a client (Italians, your countrymen, as a matter of fact) several months 
ago that we migrated from 1.4.* to 3.* if I recall correctly.  We used a tool 
that I think may still be just in JIRA to read docs from old Solr instance and 
index to the new Solr instance.  Ah, of course, all fields need to be stored 
for that to work.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


- Original Message -
> From: Tommaso Teofili 
> To: solr-user@lucene.apache.org
> Cc: 
> Sent: Monday, September 19, 2011 12:17 PM
> Subject: Different Solr versions between Master and Slave(s)
> 
> Hi all,
> while thinking about a migration plan of a Solr 1.4.1 master / slave
> architecture (1 master with N slaves already in production) to Solr 3.x I
> imagined to go for a graceful migration, starting with migrating only
> one/two slaves, making the needed tests on those while still offering the
> indexing and searching capabilities on top of the 1.4.1 instances.
> I did a small test of this migration plan but I see that the 'javabin'
> format used by the replication handler has changed (version 1 in 1.4.1,
> version 2 in 3.x) so the slaves at 3.x seem not able to replicate from the
> master (at 1.4.1).
> Is it possible to use the older 'javabin' version in order to enable
> replication from the master at 1.4.1 towards the slave at 3.x ?
> Or is there a better migration approach that sounds better for the above
> scenario?
> Thanks in advance for your help.
> Cheers,
> Tommaso
>

Re: OutOfMemoryError coming from TermVectorsReader

2011-09-22 Thread Otis Gospodnetic

Anand,

But do you really want the whole log file to be a single Solr document (from a 
cursory look at the thread it seems that is the case).  Why not break up a log 
file into multiple documents? e.g. each log message could be one Solr document. 
 Not only will that solve your memory issues, but I think it also makes more 
sense if the intention is for a person to do a search and then look at the 
matched log messages - much easier if you point a person to a short log doc 
than a giant ones through which the person then has to do a manual find.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


- Original Message -
> From: "anand.ni...@rbs.com" 
> To: solr-user@lucene.apache.org
> Cc: 
> Sent: Thursday, September 22, 2011 11:56 PM
> Subject: RE: OutOfMemoryError coming from TermVectorsReader
> 
> Hi,
> 
> I am trying to index application log files and some database tables. Size of 
> the 
> log files range from 1 MB to 100 MB. Database tables also have few thousands 
> of 
> rows.
> 
> I have used termvector highlighter for the content of the log files as 
> mentioned 
> below:
> 
> Heap size : 10 GB 
> OS: Linux, 64 bit
> Solr version : 3.4.0
> 
> Thanks & Regards
> Anand
> 
> 
> 
> Anand Nigam
> RBS Global Banking & Markets
> Office: +91 124 492 5506  
> 
> -Original Message-
> From: Glen Newton [mailto:glen.new...@gmail.com] 
> Sent: 19 September 2011 16:52
> To: solr-user@lucene.apache.org
> Subject: Re: OutOfMemoryError coming from TermVectorsReader
> 
> Please include information about your heap size, (and other Java command line 
> arguments) as well a platform OS (version, swap size, etc), Java version, 
> underlying hardware (RAM, etc) for us to better help you.
> 
> From the information you have given, increasing your heap size should help.
> 
> Thanks,
> Glen
> 
> http://zzzoot.blogspot.com/
> 
> 
> On Mon, Sep 19, 2011 at 1:34 AM,   wrote:
>>  Hi,
>> 
>>  I am new to solr. I an trying to index text documents of large size. On 
> searching from indexed documents I am getting following OutOfMemoryError. 
> Please 
> help me in resolving this issue.
>> 
>>  The field which stores file content is configured in schema.xml as below:
>> 
>> 
>>   indexed="true" stored="true" 
>>  omitNorms="true" termVectors="true" 
> termPositions="true" 
>>  termOffsets="true" />
>> 
>>  and Highlighting is configured as below:
>> 
>> 
>>  on
>> 
>>  ${all.fields.list}
>> 
>>  500
>> 
>>   name="f.Content.hl.useFastVectorHighlighter">true
>> 
>> 
>> 
>>  2011-09-16 09:38:45.763 [http-thread-pool-9091(5)] ERROR - 
>>  java.lang.OutOfMemoryError: Java heap space
>>         at 
>>  org.apache.lucene.index.TermVectorsReader.readTermVector(TermVectorsRe
>>  ader.java:503)
>>         at 
>>  org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:2
>>  63)
>>         at 
>>  org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:2
>>  84)
>>         at 
>>  org.apache.lucene.index.SegmentReader.getTermFreqVector(SegmentReader.
>>  java:759)
>>         at 
>>  org.apache.lucene.index.DirectoryReader.getTermFreqVector(DirectoryRea
>>  der.java:510)
>>         at 
>>  org.apache.solr.search.SolrIndexReader.getTermFreqVector(SolrIndexRead
>>  er.java:234)
>>         at 
>> 
> org.apache.lucene.search.vectorhighlight.FieldTermStack.(FieldTe
>>  rmStack.java:83)
>>         at 
>>  org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getFiel
>>  dFragList(FastVectorHighlighter.java:175)
>>         at 
>>  org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBest
>>  Fragments(FastVectorHighlighter.java:166)
>>         at 
>>  org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastV
>>  ectorHighlighter(DefaultSolrHighlighter.java:509)
>>         at 
>>  org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(Defaul
>>  tSolrHighlighter.java:376)
>>         at 
>>  org.apache.solr.handler.component.HighlightComponent.process(Highlight
>>  Component.java:116)
>>         at 
>>  org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sear
>>  chHandler.java:194)
>>         at 
>>  org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
>>  rBase.java:129)
>>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
>>         at 
>>  org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
>>  java:356)
>>         at 
>>  org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
>>  .java:252)
>>         at 
>>  org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
>>  cationFilterChain.java:256)
>>         at 
>>  org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
>>  lterChain.java:215)
>>         at 
>>  org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
>>  lve.java:279)
>>         at 
>>  org.apache.catalina.core.StandardContextValve.invoke(Standard

Re: Production Issue: SolrJ client throwing - Element type must be followed by either attribute specifications, ">" or "/>".

2011-09-22 Thread roz dev

Wanted to update the list with our finding.

We reduced the number of documents which are being retrieved from Solr and
this error did not appear again.
Might be the case that due to high number of documents, solr is returning
incomplete documents.

-Saroj


On Wed, Sep 21, 2011 at 12:13 PM, roz dev  wrote:

> Hi All
>
> We are getting this error in our Production Solr Setup.
>
> Message: Element type "t_sort" must be followed by either attribute 
> specifications, ">" or "/>".
> Solr version is 1.4.1
>
> Stack trace indicates that solr is returning malformed document.
>
>
> Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing 
> query
>   at 
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
>   at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
>   at 
> com.gap.gid.search.impl.SearchServiceImpl.executeQuery(SearchServiceImpl.java:232)
>   ... 15 more
> Caused by: org.apache.solr.common.SolrException: parsing error
>   at 
> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:140)
>   at 
> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:101)
>   at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481)
>   at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
>   at 
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
>   ... 17 more
> Caused by: javax.xml.stream.XMLStreamException: ParseError at 
> [row,col]:[3,136974]
> Message: Element type "t_sort" must be followed by either attribute 
> specifications, ">" or "/>".
>   at 
> com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:594)
>   at 
> org.apache.solr.client.solrj.impl.XMLResponseParser.readArray(XMLResponseParser.java:282)
>   at 
> org.apache.solr.client.solrj.impl.XMLResponseParser.readDocument(XMLResponseParser.java:410)
>   at 
> org.apache.solr.client.solrj.impl.XMLResponseParser.readDocuments(XMLResponseParser.java:360)
>   at 
> org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:241)
>   at 
> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:125)
>   ... 21 more
>
>

Re: how work with rss in solr?

2011-09-22 Thread Gora Mohanty

On Fri, Sep 23, 2011 at 11:02 AM, nagarjuna  wrote:
> Hi everybody
>
> Can anybody please explain me how to work with rss in solr?
> what exactly i meant is, i have one blog and i need to get the updated posts
> details using solr?
> please provide me any samples or links.i have little bit knowledge about
> solr indexing and searching but i am not familiar with the rss so,please
> help me

Take a look at the configuration in example/example-DIH/solr/rss
in your Solr source tree.

Regards,
Gora

levenshtein ranked results

2011-09-22 Thread Roland Tollenaar


Hi,

I tried an internet search to find out how to query solr to get the 
results ranked (ordered) by levenshtein distance.


This appears to be possible but I could not find a concrete example as 
to how I would have to formulate the query, or if its a schema setting 
on a particular field, how to set up the schema.


I am new to solr, any help appreciated.

tia.

Roland.

Re: how work with rss in solr?

2011-09-22 Thread nagarjuna

i didnt expect this answer .
Thank u very much for ur reply Gora

i already saw that sampleafter completing my first stage self research
only i posted this post.i clearly know about that there is an example in
solr about the RSS in the example they used the url 
http://rss.slashdot.org/Slashdot/slashdot
http://rss.slashdot.org/Slashdot/slashdot  which will gives the xml response
and they are getting results from thatbut i dont know what is that url
and how can build an url like that for my applicationthat is what i
am expecting...


Thank u 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-work-with-rss-in-solr-tp3360999p3361054.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how work with rss in solr?

2011-09-22 Thread Gora Mohanty

On Fri, Sep 23, 2011 at 11:50 AM, nagarjuna  wrote:
> i didnt expect this answer .
> Thank u very much for ur reply Gora
>
> i already saw that sampleafter completing my first stage self research
> only i posted this post.i clearly know about that there is an example in
> solr about the RSS in the example they used the url
> http://rss.slashdot.org/Slashdot/slashdot
> http://rss.slashdot.org/Slashdot/slashdot  which will gives the xml response
> and they are getting results from thatbut i dont know what is that url
> and how can build an url like that for my applicationthat is what i
> am expecting...
[...]

That is off-topic for this list. You need to set up a RSS feed for
your blogs. The procedure for doing this would depend on your
blog engine, and usually is quite easy. It would be best if you
asked this in a mailing list catering to the blog engine that you
use.

Regards,
Gora

Solr wildcard searching

2011-09-22 Thread jaystang

Hey guys,
Very new to solr.  I'm using the data import handler to pull customer data
out of my database and index it.  All works great so far.  Now I'm trying to
query against a specific field and I seem to be struggling with doing a
wildcard search. See below.

I have several indexed documents with a "customer_name" field containing
"John Doe".  I have a UI that contains a listing of this indexed data as
well has a keyword filter field (filter as you type).  So I would like when
the user starts typing "J", "John Doe will return, and "Jo", "John Doe" will
return, "Joh"... etc, etc...

I've tried the following:

Search: customer_name:Joh*
Returns: The correct "John Doe" Record"

Search: customer_name:John Do*
Returns: No results (nothing returns w/ 2 works since I don't have the
string in quotes.)

Search: customer_name:"Joh*"
Returns: No results

Search: customer_name:"John Do*"
Returns: No results

Search: customer_NAME:"John Doe*"
Returns: The correct "John Doe" Record"

I feel like I'm close, only issue is when there are multiple words.

Any advice would be appreciated.

Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-wildcard-searching-tp3360681p3360681.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how work with rss in solr?

2011-09-22 Thread nagarjuna

yaa Gora i set up rss feed to my blog and i have the following url for the
rss feed of my blog
http://nagarjunaavula.blogspot.com/feeds/posts/default?alt=rss
http://nagarjunaavula.blogspot.com/feeds/posts/default?alt=rss  u can check
this url.then how to use this url in my solr application i am not
sure about the changes needed in the rss-data-config.xmlcan pls list the
changes i need to do in the schema,solrconfig,rss-data-config files



Thank u

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-work-with-rss-in-solr-tp3360999p3361070.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: where is the SOLR_HOME ?

2011-09-22 Thread ahmad ajiloo

Thank you very much! It is working.

Regards

On Wed, Sep 14, 2011 at 4:14 PM, Juan Grande  wrote:

> Hi Ahmad,
>
> While Solr is starting it writes the path to SOLR_HOME to the log. The
> message looks something like:
>
> Sep 14, 2011 9:14:53 AM org.apache.solr.core.SolrResourceLoader 
> >
> INFO: Solr home set to 'solr/'
> >
>
> If you're running the example, SOLR_HOME is usually
> apache-solr-3.3.0/example/solr
>
> Solr also writes a line like the following in the log for every JAR file it
> loads:
>
> Sep 14, 2011 9:14:53 AM org.apache.solr.core.SolrResourceLoader
> > replaceClassLoader
> >
> INFO: Adding
> >
> 'file:/home/jgrande/apache-solr-3.3.0/contrib/extraction/lib/pdfbox-1.3.1.jar'
> > to classloader
> >
>
> With this information you should be able to determine which JAR files Solr
> is loading and I'm pretty sure that it's loading all the files you need.
> The
> problem may be that you must also include
> "apache-solr-analysis-extras-3.3.0.jar" from the "apache-solr-3.3.0/dist"
> directory.
>
> Regards,
>
> *Juan*
>
>
>
> On Wed, Sep 14, 2011 at 12:19 AM, ahmad ajiloo  >wrote:
>
> > Hi
> > In this page (<
> >
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory
> > >
> >
> >
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory
> > )<
> >
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory
> > >said:
> > "Note: to use this filter, see solr/contrib/analysis-extras/README.txt
> for
> > instructions on which jars you need to add to your SOLR_HOME/lib "
> >  I can't find "SOLR_HOME/lib" !
> > 1- Is there: "apache-solr-3.3.0\example\solr" ? there is no directory
> which
> > name is lib
> > I created "example/solr/lib" directory and copied jar files to it and
> > tested
> > this expressions in solrconfig.xml :
> > 
> > 
> >  (for more assurance!!!)
> > but it doesn't work and still has following errors !
> >
> > 2- or: "apache-solr-3.3.0\" ? there is no directory which name is lib
> > 3- or : "apache-solr-3.3.0\example" ? there is a "lib" directory. I
> copied
> > 4
> > libraries exist in "solr/contrib/analysis-extras/
> > " to "apache-solr-3.3.0\example\lib" but some errors exist in loading
> page
> > "
> > http://localhost:8983/solr/admin"; :
> >
> > I use Nutch to crawling the web and fetching web pages. I send data of
> > Nutch
> > to Solr for Indexing. according to Nutch tutorial (
> > http://wiki.apache.org/nutch/NutchTutorial#A6._Integrate_Solr_with_Nutch
> )
> > I
> > should copy schema.xml of Nutch to conf directory of Solr.
> > So I added all of my required Analyzer like
> "*ICUNormalizer2FilterFactory"
> > *to
> > this new shema.xml
> >
> >
> > this is schema.xml :
> >
> >
> -I
> > added bold text to this file
> > 
> > 
> >
> > > sortMissingLast="true"
> >omitNorms="true"/>
> > precisionStep="0"
> >omitNorms="true" positionIncrementGap="0"/>
> > > precisionStep="0"
> >omitNorms="true" positionIncrementGap="0"/>
> > precisionStep="0"
> >omitNorms="true" positionIncrementGap="0"/>
> >
> > >positionIncrementGap="100">
> >
> >
> > >ignoreCase="true" words="stopwords.txt"/>
> > >generateWordParts="1" generateNumberParts="1"
> >catenateWords="1" catenateNumbers="1" catenateAll="0"
> >splitOnCaseChange="1"/>
> >
> > >protected="protwords.txt"/>
> >
> >
> >
> >
> >* > autoGeneratePhraseQueries="false">
> > 
> >
> >  
> >
> >
> >
> >
> > > locale="en" strength="primary"/>
> >
> >
> >
> >  
> >
> > name="nfkc_cf"
> > mode="compose"/>
> >  
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > > id="Traditional-Simplified"/>
> >
> >*
> >
> > >positionIncrementGap="100">
> >
> >
> >
> > >generateWordParts="1" generateNumberParts="1"/>
> >
> >
> >* > positionIncrementGap="100">
> >
> >
> >
> >
> >
> > > positionIncrementGap="100">
> >
> >
> >
> >
> >
> >*
> >
> >
> >
> >
> >
> > indexed="false"/>
> >
> >
> >
> >
> >
> >
> >