date:20110124

Re: Weird behaviour with phrase queries

2011-01-24 Thread Jerome Renard

Erick,

On Mon, Jan 24, 2011 at 9:57 PM, Erick Erickson wrote:

> Hmmm, I don't see any screen shots. Several things:
> 1> If your stopword file has comments, I'm not sure what the effect would
> be.
>

Ha, I thought comments were supported in stopwords.txt


> 2> Something's not right here, or I'm being fooled again. Your withresults
> xml has this line:
> +DisjunctionMaxQuery((meta_text:"ecol d
> ingenieur")~0.01) ()
> and your noresults has this line:
> +DisjunctionMaxQuery((meta_text:"academi
> charpenti")~0.01) DisjunctionMaxQuery((meta_text:"academi
> charpenti"~100)~0.01)
>
> the empty () in the first one often means you're NOT going to your
> configured dismax parser in solrconfig.xml. Yet that doesn't square with
> your custom qt, so I'm puzzled.
>
> Could we see your raw query string on the way in? It's almost as if you
> defined qt in one and defType in the other, which are not equivalent.
>

You are right I fixed this problem (my bad).

3> It may take 12 hours to index, but you could experiment with a smaller
> subset. You say you know that the noresults one should return documents,
> what proof do
> you have? If there's a single document that you know should match this,
> just
> index it and a few others and you should be able to make many runs until
> you
> get
> to the bottom of this...
>
>
I could but I always thought I had to fully re-index after updating
schema.xml. If
I update only few documents will that take the changes into account without
breaking
the rest ?


> And obviously your stemming is happening on the query, are you sure it's
> happening at index time too?
>
>
Since you did not get the screenshots you will find attached the full output
of the analysis
for a phrase that works and for another that does not.

Thanks for your support

Best Regards,

--
Jérôme


analysis-noresults.html.gz
Description: GNU Zip compressed data


analysis-withresults.html.gz
Description: GNU Zip compressed data

Re: old index files not deleted on slave

2011-01-24 Thread feedly team

Interestingly that worked. I deleted the slave index and restarted.
After the first replication I shut down the server, deleted the lock
file and started it again. It seems to be behaving itself now even
though a lock file seems to be recreated. Thanks a lot for the help.
This still seems like a bug though?

I don't have any writers open on the slaves, in fact one slave is only
doing replication right now (no reads) to try to isolate the problem.

On Sat, Jan 22, 2011 at 7:34 PM, Alexander Kanarsky
 wrote:
> I see the file
>
> -rw-rw-r-- 1 feeddo feeddo    0 Dec 15 01:19
> lucene-cdaa80c0fefe1a7dfc7aab89298c614c-write.lock
>
> was created on Dec. 15. At the end of the replication, as far as I
> remember, the SnapPuller tries to open the writer to ensure the old
> files are deleted, and in
> your case it cannot obtain a lock on the index folder on Dec 16,
> 17,18. Can you reproduce the problem if you delete the lock file,
> restart the slave
> and try replication again? Do you have any other Writer(s) open for
> this folder outside of this core?
>
> -Alexander
>
> On Sat, Jan 22, 2011 at 3:52 PM, feedly team  wrote:
>> The file system checked out, I also tried creating a slave on a
>> different machine and could reproduce the issue. I logged SOLR-2329.
>>
>> On Sat, Dec 18, 2010 at 8:01 PM, Lance Norskog  wrote:
>>> This could be a quirk of the native locking feature. What's the file
>>> system? Can you fsck it?
>>>
>>> If this error keeps happening, please file this. It should not happen.
>>> Add the text above and also your solrconfigs if you can.
>>>
>>> One thing you could try is to change from the native locking policy to
>>> the simple locking policy - but only on the child.
>>>
>>> On Sat, Dec 18, 2010 at 4:44 PM, feedly team  wrote:
 I have set up index replication (triggered on optimize). The problem I
 am having is the old index files are not being deleted on the slave.
 After each replication, I can see the old files still hanging around
 as well as the files that have just been pulled. This causes the data
 directory size to increase by the index size every replication until
 the disk fills up.

 Checking the logs, I see the following error:

 SEVERE: SnapPull failed
 org.apache.solr.common.SolrException: Index fetch failed :
        at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:329)
        at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:265)
        at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
        at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
        at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
        at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
 Caused by: org.apache.lucene.store.LockObtainFailedException: Lock
 obtain timed out:
 NativeFSLock@/var/solrhome/data/index/lucene-cdaa80c0fefe1a7dfc7aab89298c614c-write.lock
        at org.apache.lucene.store.Lock.obtain(Lock.java:84)
        at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1065)
        at org.apache.lucene.index.IndexWriter.(IndexWriter.java:954)
        at 
 org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:192)
        at 
 org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:99)
        at 
 org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
        at 
 org.apache.solr.update.DirectUpdateHandler2.forceOpenWriter(DirectUpdateHandler2.java:376)
        at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:471)
        at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:319)
        ... 11 more

 lsof reveals that the file is still opened from the java process.

 I am running 4.0 rev 993367 with patch SOLR-1316. Otherwise, the setup
 is pretty vanilla. The OS is linux, the indexes are on local
 directories, write permissions look ok, nothing unusual in the config
 (default deletion policy, etc.). Contents of the index data dir:

 master:
 -rw-rw-r-- 1 feeddo feeddo  191 Dec 14 01:06 _1lg.fnm
 -rw-rw-r

Re: SolrCloud Questions for MultiCore Setup

2011-01-24 Thread Em


Hi,

just wanted to push this topic again.

Thank you!


Em wrote:
> 
> By the way: although I am asking for SolrCloud explicitly again, I will
> take your advice and try distributed search first to understand the
> concept better.
> 
> Regards
> 
> 
> Em wrote:
>> 
>> Hi Lance,
>> 
>> thanks for your explanation.
>> 
>> As far as I know in distributed search i have to tell Solr what other
>> shards it has to query. So, if I want to query a specific core, present
>> in all my shards, i could tell Solr this by using the shards-param plus
>> specified core on each shard.
>> 
>> Using SolrCloud's distrib=true feature (it sets all the known shards
>> automatically?), a collection should consist only of one type of
>> core-schema, correct?
>> How does SolrCloud knows that shard_x and shard_y are replicas of
>> eachother (I took a look at the  possibility to specify alternative
>> shards if one is not available)? If it does not know that they are
>> replicas of eachother, I should use the syntax of specifying alternative
>> shards for failover due to performance-reasons, because querying 2
>> identical and available cores seems to be wasted capacity, no? 
>> 
>> Thank you!
>> 
> 
> 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Questions-for-MultiCore-Setup-tp2309443p2327089.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Getting started with writing parser

2011-01-24 Thread Dinesh


i don't even know whether the regex expression that i'm using for my log is
correct or no.. i very much worried i couldn't proceed in my project already
1/3 rd of the timing is over.. please help.. this is just the first stage..
after this i have ti setup up all the log to be redirected to SYSLOG and
from there i'll send it to SOLR server.. then i have to analyse all the
data's that i obtained from DNS, DHCP, WIFI, SWITCES.. and i have to prepare
a user based report on his actions.. please help me cause the day's i have
keeps reducing.. my project leader is questioning me a lot.. pls..

-
DINESHKUMAR . M
I am neither especially clever nor especially gifted. I am only very, very
curious.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-started-with-writing-parser-tp2278092p2326917.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr suggester and spell checker

2011-01-24 Thread madhug


Hi, 
I am using the default example in the latest stable build
(apache-solr-4.0-2011-01-23_11-24-01). 

I read the wiki on http://wiki.apache.org/solr/Suggester and my expectation
is that suggester would correct terms in addition to completing terms.
The handler for suggest is configured with spellcheck as true.



  true
  ..


However, the query http://localhost:8983/solr/suggest?q=belkn%20enc
returns belkn encoded (belkn is not corrected to
belkin).

The spellchecker component corrects belkn to belkin though.
http://localhost:8983/solr/spell?q=belkn%20encoded&spellcheck=true&spellcheck.collate=true&spellcheck.build=true
belkin encoded

Would really appreciate any input on how suggester can correct as well as
complete terms in the input.

Thanks
Madhu
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-suggester-and-spell-checker-tp2326907p2326907.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: please help >>Problem with dataImportHandler

2011-01-24 Thread Dinesh


http://pastebin.com/tjCs5dHm

this is the log produced by the solr server

-
DINESHKUMAR . M
I am neither especially clever nor especially gifted. I am only very, very
curious.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/please-help-Problem-with-dataImportHandler-tp2318585p2326659.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Getting started with writing parser

2011-01-24 Thread Dinesh


http://pastebin.com/CkxrEh6h

this is my sample log

-
DINESHKUMAR . M
I am neither especially clever nor especially gifted. I am only very, very
curious.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-started-with-writing-parser-tp2278092p2326646.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Adding weightage to the facets count

2011-01-24 Thread Chris Hostetter


: prod1 has tag called “Light Weight” with weightage 20,
: prod2 has tag called “Light Weight” with weightage 100,
: 
: If i get facet for “Light Weight” , i will get Light Weight (2) ,
: here i need to consider the weightage in to account, and the result will be
: Light Weight (120) 
: 
: How can we achieve this?Any ideas are really helpful.


It's not really possible with Solr out of the box.  Faceting is fast and 
efficient in Solr because it's all done using set intersections (and most 
of the sets can be kept in ram very compactly and reused).  For what you 
are describing you'd need to no only assocaited a weighted payload with 
every TermPosition, but also factor that weight in when doing the 
faceting, which means efficient set operations are now out the window.

If you know java it would be probably be possible to write a custom 
SolrPlugin (a SearchComponent) to do this type of faceting in special 
cases (assuming you indexed in a particular way) but i'm not sure off hte 
top of my head how well it would scale -- the basic algo i'm thinking of 
is (after indexing each facet term wit ha weight payload) to iterate over 
the DocSet of all matching documents in parallel with an iteration over 
a TermPositions, skipping ahead to only the docs that match the query, and 
recording the sum of the payloads for each term.

Hmmm...

except TermPositions iterates over >> tuples, 
so you would have to iterate over every term, and for every term then loop 
over all matching docs ... like i said, not sure how efficient it would 
wind up being.

You might be happier all arround if you just do some sampling -- store the 
tag+weight pairs so thta htey cna be retireved with each doc, and then 
when you get your top facet constraints back, look at the first page of 
results, and figure out what the sun "weight" is for each of those 
constraints based solely on the page#1 results.

i've had happy users using a similar appraoch in the past.

-Hoss

Re: How call I make one request for all cores and get response classified by cores

2011-01-24 Thread Chris Hostetter


: I have a group of subindex, each of which is a core in my solr now. I want
: to make one query for some of them, how can I do that? And classify response
: doc by index, using facet search?

some background:

"multi core" is when you have multiple solr "cores" on one solr instance;
each "core" can have different configs.

"distributed search" is when you execute a search on a "core" and specify 
in the query a list of other cores on other solr instances to treat as 
"shards" and aggregate the results from all of them; each "shard" must 
have identicle schemas.

That said: you can to a distributed search, across a bunch of "shards" 
that are all on the same solr instance.  if you index a constant value in 
each one identifying which sub-indx it comes from, you should have what 
you're looking for.

-Hoss

synonyms file, and example cases

2011-01-24 Thread Cam Bazz

Hello,

I have been looking at the solr synonym file that was an example, I
did not understand some notation:

aaa => 

bbb => 1 2

ccc => 1,2

a\=>a => b\=>b

a\,a => b\,b

fooaaa,baraaa,bazaaa

The first one says search for  when query is aaa. am I correct?
the second one finds "1 2" when query is bbb
the third one is find 1 or 2 when query is ccc

the fourth, and fifth one I have not understood.

the last one, i assume is a group, bidirectional mapping between
fooaaa,baraaa,bazaaa

I am especially interested with this last one, if I do aaa,bbb it will
find aaa and bbb when either aaa or bbb is queryied?

am I correct in those assumptions?

Best regards,
C.B.

Re: Stemming for Finnish language

2011-01-24 Thread Chris Hostetter


: I tried following in my schema.xml, but I got
: org.apache.solr.common.SolrException: Error loading class
: 'solr.FinnishLightStemFilterFactory'

FinnishLightStemFilterFactory is a class that exists in SVN on the 3x and 
trunk branches, but does not exist in the Solr 1.4.1 release (it was added 
later)

if you are trying ot use Solr 1.4.1, this won't work, if you are 
getting this error using a 3x or trunk development version, please 
elaborate on how you are installing/running Solr


-Hoss

Re: Solr set up issues with Magento

2011-01-24 Thread Markus Jelsma

Hi,

You haven't defined the field in Solr's schema.xml configuration so it needs to 
be added first. Perhaps following the tutorial might be a good idea.

http://lucene.apache.org/solr/tutorial.html

Cheers.

> Hello Team:
> 
> 
>   I am in the process of setting up Solr 1.4 with Magento ENterprise
> Edition 1.9.
> 
> When I try to index the products I get the following error message.
> 
> Jan 24, 2011 3:30:14 PM org.apache.solr.update.processor.LogUpdateProcessor
> fini
> sh
> INFO: {} 0 0
> Jan 24, 2011 3:30:14 PM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field
> 'in_stock' at
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.jav
> a:289)
> at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpd
> ateProcessorFactory.java:60)
> at
> org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
> at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
> ntentStreamHandlerBase.java:54)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
> erBase.java:131)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
> .java:338)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
> r.java:241)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl
> icationFilterChain.java:244)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF
> ilterChain.java:210)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperV
> alve.java:240)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextV
> alve.java:161)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.j
> ava:164)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.j
> ava:100)
> at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:
> 550)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineVal
> ve.java:118)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.jav
> a:380)
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java
> 
> :243)
> 
> at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.proce
> ss(Http11Protocol.java:188)
> at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.proce
> ss(Http11Protocol.java:166)
> at
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoin
> t.java:288)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
> utor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> .java:908)
> at java.lang.Thread.run(Thread.java:662)
> 
> Jan 24, 2011 3:30:14 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/update params={wt=json} status=400 QTime=0
> Jan 24, 2011 3:30:14 PM org.apache.solr.update.DirectUpdateHandler2
> rollback INFO: start rollback
> Jan 24, 2011 3:30:14 PM org.apache.solr.update.DirectUpdateHandler2
> rollback INFO: end_rollback
> Jan 24, 2011 3:30:14 PM org.apache.solr.update.processor.LogUpdateProcessor
> fini
> sh
> INFO: {rollback=} 0 16
> Jan 24, 2011 3:30:14 PM org.apache.solr.core.SolrCore execute
> 
> I am a new to both Magento and SOlr. I could have done some thing stupid
> during installation. I really look forward for your help.
> 
> Thank you,
> Sandhya

Solr set up issues with Magento

2011-01-24 Thread solrEvaluation


Hello Team:


  I am in the process of setting up Solr 1.4 with Magento ENterprise Edition
1.9. 

When I try to index the products I get the following error message.

Jan 24, 2011 3:30:14 PM org.apache.solr.update.processor.LogUpdateProcessor
fini
sh
INFO: {} 0 0
Jan 24, 2011 3:30:14 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'in_stock'
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.jav
a:289)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpd
ateProcessorFactory.java:60)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
ntentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
erBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
r.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl
icationFilterChain.java:244)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF
ilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperV
alve.java:240)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextV
alve.java:161)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.j
ava:164)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.j
ava:100)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:
550)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineVal
ve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.jav
a:380)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java
:243)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.proce
ss(Http11Protocol.java:188)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.proce
ss(Http11Protocol.java:166)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoin
t.java:288)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
at java.lang.Thread.run(Thread.java:662)

Jan 24, 2011 3:30:14 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update params={wt=json} status=400 QTime=0
Jan 24, 2011 3:30:14 PM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: start rollback
Jan 24, 2011 3:30:14 PM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: end_rollback
Jan 24, 2011 3:30:14 PM org.apache.solr.update.processor.LogUpdateProcessor
fini
sh
INFO: {rollback=} 0 16
Jan 24, 2011 3:30:14 PM org.apache.solr.core.SolrCore execute

I am a new to both Magento and SOlr. I could have done some thing stupid
during installation. I really look forward for your help.

Thank you,
Sandhya
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-set-up-issues-with-Magento-tp2323858p2323858.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Specifying an AnalyzerFactory in the schema

2011-01-24 Thread Chris Hostetter


: I notice that in the schema, it is only possible to specify a Analyzer class,
: but not a Factory class as for the other elements (Tokenizer, Fitler, etc.).
: This limits the use of this feature, as it is impossible to specify parameters
: for the Analyzer.
: I have looked at the IndexSchema implementation, and I think this requires a
: simple fix. Do I open an issue about it ?

Support for constructing Analyzers directly is very crude, and primarily 
existed for making it easy for people with old indexes and analyzers to 
keep working.

moving foward, Lucene/Solr eventtually won't "ship" concret Analyzers 
implementations at all (at least, that's the last concensus i remember) so 
enhancing support for loading Analyzers (or AnalyzerFactories) doesn't 
make much sense.

Practically speaking, if you have an existing Analyzer that you want to 
use in Solr, instead of writting an "AnalyzerFactory" for it, you could 
just write a "TokenizerFactory" that wraps it instead -- functinally that 
would let you achieve everything ana AnalyzerFactory would, except that 
Solr would already handle letting the schema.xml specify the 
positionIncrementGap (which you could happily ignore if you wanted)


-Hoss

Re: Solr with Unknown Lucene Index?

2011-01-24 Thread Chris Hostetter


: Having found some code that searches a Lucene index, the only analyzers
: referenced are Lucene.Net.Analysis.Standard.StandardAnalyzer.
: 
: How can I map this is Solr? The example schema doesn't seem to mention this,
: and specifying 'text' or 'string' for every field doesn't seem to help.

1) that analyzer seems to be a Lucene.Net analyzer, so the java equivilent 
would be org.apache.lucene.analsys.standard.StandardAnalyzer

2) the example schema.xml demonstrates how to use an existing Analyzer 
implementation...



3) i'm getting the sense from your comments that you aren't very familiar 
with lucene/solr in general.  An important thing to understand is that 
just because the code that created the index only ever uses 
"StandardAnalyzer" doens't mean it will make sense to use that analyzer on 
every field when attempting to search that field from solr -- some fields 
may have been indexed w/o using any analysis, some may be numeric fields 
with special encoding, some may be compressed, etc...

trying to reverse engineer what the schema should look like to open any 
arbitrary index requires a lot of understanding about how that index was 
built -- it's easy to just "dump the terms" found in an index w/o knowing 
anything about where those terms came fom (that's what Luke does) but that 
doens't help your recognize things like "this list of X words were treated 
as stop words, and don't appera in the index, so my query analyzer needs 
to be configured with those same X words"

In short: you can eaisly make solr *read* the index (just like luke) but 
that won't neccessarily help you *use* the index in a meaninigful way.

-Hoss

Re: Weird behaviour with phrase queries

2011-01-24 Thread Erick Erickson

Hmmm, I don't see any screen shots. Several things:
1> If your stopword file has comments, I'm not sure what the effect would
be.
2> Something's not right here, or I'm being fooled again. Your withresults
xml has this line:
+DisjunctionMaxQuery((meta_text:"ecol d
ingenieur")~0.01) ()
and your noresults has this line:
+DisjunctionMaxQuery((meta_text:"academi
charpenti")~0.01) DisjunctionMaxQuery((meta_text:"academi
charpenti"~100)~0.01)

the empty () in the first one often means you're NOT going to your
configured dismax parser in solrconfig.xml. Yet that doesn't square with
your custom qt, so I'm puzzled.

Could we see your raw query string on the way in? It's almost as if you
defined qt in one and defType in the other, which are not equivalent.
3> It may take 12 hours to index, but you could experiment with a smaller
subset. You say you know that the noresults one should return documents,
what proof do
you have? If there's a single document that you know should match this, just
index it and a few others and you should be able to make many runs until you
get
to the bottom of this...

And obviously your stemming is happening on the query, are you sure it's
happening at index time too?

Best
Erick

On Mon, Jan 24, 2011 at 1:51 PM, Jerome Renard wrote:

> Hi Em, Erick
>
> thanks for your feedback.
>
> Em : yes Here is the stopwords.txt I use :
> -
> http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/snowball/french_stop.txt
>
> On Mon, Jan 24, 2011 at 6:58 PM, Erick Erickson 
> wrote:
>
>> Try submitting your query from the admin page with &debugQuery=on and see
>> if that helps. The output is pretty dense, so feel free to cut-paste the
>> results for
>> help.
>>
>> Your stemmers have English as the language, which could also be
>> "interesting".
>>
>>
> Yes, I noticed that this will be fixed.
>
>
>> As Em says, the analysis page may help here, but I'd start by taking out
>> WordDelimiterFilterFactory, SnowballPorterFilterFactory and
>> StopFilterFactory
>> and build back up if you really need them. Although, again, the analysis
>> page
>> that's accessible from the admin page may help greatly (check "debug" in
>> both
>> index and query).
>>
>>
> You will find attached two xml files one with no results (noresult.xml.gz)
> and one with
> a lot of results (withresults.xml.gz). You will also find attached two
> screenshots showing
> there is a highlighted section in the "Index analyzer" section when
> analysing text.
>
>
>> Oh, and you MUST re-index after changing your schema to have a true test.
>>
>>
> Yes, the problem is that reindexing takes around 12 hours which makes it
> really hard
> for testing :/
>
>
> Thanks in advance for your feedback.
>
> Best Regards,
>
> --
> Jérôme
>

Re: searching based on grouping result

2011-01-24 Thread Chris Hostetter


: Subject: searching based on grouping result
: In-Reply-To: <913367.31366...@web121705.mail.ne1.yahoo.com>
: References: <913367.31366...@web121705.mail.ne1.yahoo.com>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.



-Hoss

Re: No system property or default value specified for...

2011-01-24 Thread Chris Hostetter


: I'm trying to dynamically add a core to a multi core system using the
: following command:
: 
: 
http://localhost:8983/solr/admin/cores?action=CREATE&name=items&instanceDir=items&config=data-config.xml&schema=schema.xml&dataDir=data&persist=true
: 
: the data-config.xml looks like this:
: 
: 

I think you are using the config param incorrectly -- it should be the 
solrconfig.xml file you want to use (assuming you don't want the one found 
in the "conf" directory of your instanceDir)

that's the reason you are getting errors about needing to specify system 
props or default values for all those variables, because if that file was 
a solrconfig.xml file they must be specified before the SolrCore can be 
initialized -- but for a DIH data configs that's not neccessary.


-Hoss

Re: please help >>Problem with dataImportHandler

2011-01-24 Thread Chris Hostetter

: this is the error that i'm getting.. no idea of what is it..

Did you follow the instructions in the error message and look at your solr 
log file to see what the "severe errors in solr configuration" might be?

: SimplePostTool: FATAL: Solr returned an error: 
: 
Severe_errors_in_solr_configuration__Check_your_log_files_for_more_detailed_information_on_what_may_be_wrong
...

-Hoss

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-24 Thread Erick Erickson

See below.

On Mon, Jan 24, 2011 at 1:51 PM, johnnyisrael wrote:

>
> Hi,
>
> I am trying out the auto suggest using EdgeNgram.
>
> Using the following tutorial as a reference.
>
>
> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
>
> In the above tutorial, The below two lines has been clearly mentioned,
>
> "Note that it’s necessary to wrap the query in double-quotes as a phrase.
> Otherwise unpredictable and unwanted matches can occur."
>
> When i use double quotes as they said it works perfectly fine. I just want
> to know the reason for this behavior.
>
> Can anyone explain me why it behaves like that?
>
>
The reason here is that if you *don't* make it a phrase, then
you're ORing (or ANDing) the grams. So if you were
searching for won, your search would become
w OR wo OR won, which would match n-grams from
all over the place without regard to whether they appeared
in order.


> I tried the alternate method mentioned in the responses section of the same
> tutorial [StandardTokenizerFactory and LowerCaseFilterFactory combination],
> it does not work fine as expected[bringing unwanted matches].
>
>
Hmmm, I don't think the StandartTokenizer & LowerCase was
being applied as autosuggest, there was a copyField in there
that went to the EdgeNGram (note that I scanned the article)..

Best
Erick



> Is there a best way to overcome this?
>
> Thanks,
>
> Johnny
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/EdgeNgram-Auto-suggest-doubles-ignore-tp2321919p2321919.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Highlighting with/without Term Vectors

2011-01-24 Thread Salman Akram

Just to add one thing, in case it makes a difference.

Max document size on which highlighting needs to be done is few hundred kb's
(in file system). In index its compressed so should be much smaller. Total
documents are more than 100 million.

On Tue, Jan 25, 2011 at 12:42 AM, Salman Akram <
salman.ak...@northbaysolutions.net> wrote:

> Hi,
>
> Does anyone have any benchmarks how much highlighting speeds up with Term
> Vectors (compared to without it)? e.g. if highlighting on 20 documents take
> 1 sec with Term Vectors any idea how long it will take without them?
>
> I need to know since the index used for highlighting has a TVF file of
> around 450GB (approx 65% of total index size) so I am trying to see whether
> the decreasing the index size by dropping TVF would be more helpful for
> performance (less RAM, should be good for I/O too I guess) or keeping it is
> still better?
>
> I know the best way is try it out but indexing takes a very long time so
> trying to see whether its even worthy or not.
>
> --
> Regards,
>
> Salman Akram
>
>


-- 
Regards,

Salman Akram

Highlighting with/without Term Vectors

2011-01-24 Thread Salman Akram

Hi,

Does anyone have any benchmarks how much highlighting speeds up with Term
Vectors (compared to without it)? e.g. if highlighting on 20 documents take
1 sec with Term Vectors any idea how long it will take without them?

I need to know since the index used for highlighting has a TVF file of
around 450GB (approx 65% of total index size) so I am trying to see whether
the decreasing the index size by dropping TVF would be more helpful for
performance (less RAM, should be good for I/O too I guess) or keeping it is
still better?

I know the best way is try it out but indexing takes a very long time so
trying to see whether its even worthy or not.

-- 
Regards,

Salman Akram

Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene

2011-01-24 Thread Simon Wistow

On Mon, Jan 24, 2011 at 10:55:59AM -0800, Em said:
> Could it be possible that your slaves not finished their replicating until
> the new replication-process starts?
> If so, there you got the OOM :).

This was one of my thoughts as well - we're currently running a slave 
which has no queries in it just to see if that exhibits similar 
behaviour.

My reasoning against it is that we're not seeing any 

PERFORMANCE WARNING: Overlapping onDeckSearchers=x

in the logs which is something I'd expect to see.

2 minutes doesn't seem like an unreasonable period of time either - the 
docs at http://wiki.apache.org/solr/SolrReplication suggest 20 seconds.

Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene

2011-01-24 Thread Simon Wistow

On Mon, Jan 24, 2011 at 08:00:53PM +0100, Markus Jelsma said:
> Are you using 3rd-party plugins?

No third party plugins - this is actually pretty much stock tomcat6 + 
solr from Ubuntu. The only difference is that we've adapted the 
directory layout to fit in with our house style

Re: Getting started with writing parser

2011-01-24 Thread Gora Mohanty

On Mon, Jan 24, 2011 at 2:28 PM, Dinesh  wrote:
>
> my solrconfig.xml
>
> http://pastebin.com/XDg0L4di
>
> my schema.xml
>
> http://pastebin.com/3Vqvr3C0
>
> my try.xml
>
> http://pastebin.com/YWsB37ZW
[...]

OK, thanks for the above.

You also need to:
* Give us a sample of your log files (for crying out loud,
  this has got to be the fifth time that I have asked you
  for this).
* Tell us what happens when you run with the above
   configuration. From a cursory look at try.xml, you
   have not really understood how it works, or how to
   configure it for your needs.

Regards,
Gora

Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene

2011-01-24 Thread Markus Jelsma

Are you using 3rd-party plugins?

> We have two slaves replicating off one master every 2 minutes.
> 
> Both using the CMS + ParNew Garbage collector. Specifically
> 
> -server -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing
> 
> but periodically they both get into a GC storm and just keel over.
> 
> Looking through the GC logs the amount of memory reclaimed in each GC
> run gets less and less until we get a concurrent mode failure and then
> Solr effectively dies.
> 
> Is it possible there's a memory leak? I note that later versions of
> Lucene have fixed a few leaks. Our current versions are relatively old
> 
>   Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17
> 18:06:42
> 
>   Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55
> 
> so I'm wondering if upgrading to later version of Lucene might help (of
> course it might not but I'm trying to investigate all options at this
> point). If so what's the best way to go about this? Can I just grab the
> Lucene jars and drop them somewhere (or unpack and then repack the solr
> war file?). Or should I use a nightly solr 1.4?
> 
> Or am I barking up completely the wrong tree? I'm trawling through heap
> logs and gc logs at the moment trying to to see what other tuning I can
> do but any other hints, tips, tricks or cluebats gratefully received.
> Even if it's just "Yeah, we had that problem and we added more slaves
> and periodically restarted them"
> 
> thanks,
> 
> Simon

Re: DIH serialize

2011-01-24 Thread greggallen

UNSUBSCRIBE

On 1/23/11, Papp Richard  wrote:
> Hi all,
>
>
>
>   I wasted the last few hours trying to serialize some column values (from
> mysql) into a Solr column, but I just can't find such a function. I'll use
> the value in PHP - I don't know if it is possible to serialize in PHP style
> at all. This is what I tried and works with a given factor:
>
>
>
> in schema.xml:
>
> stored="true" multiValued="true" />
>
>
>
> in DIH xml:
>
>
>
> 
>
>   
> function my_serialize(row)
>
> {
>
>   row.put('main_timetable', row.toString());
>
>   return row;
>
> }
>
>   ]]>
>
>
>
> .
>
>
>
>   
> transformer="script:my_serialize"
>
> >
>
> .
>
>>
>
>
>
>   Can I use java directly in script (

Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene

2011-01-24 Thread Em


Hi Simon,

I got no experiences with a distributed environment.
However, what you are talking about reminds me on another post on the
mailing list.

Could it be possible that your slaves not finished their replicating until
the new replication-process starts?
If so, there you got the OOM :).

Just a thought, perhaps it helps.

Regards,
Em
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Possible-Memory-Leaks-Upgrading-to-a-Later-Version-of-Solr-or-Lucene-tp2321777p2321959.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: DIH serialize

2011-01-24 Thread Papp Richard

Hi Stefan,

  yes, this is exactly what I intend - I don't want to search in this field
- just quicly return me the result in a serialized form (the search criteria
is on other fields). Well, if I could serialize the data exactly as like the
PHP serialize() does I would be maximally satisfied, but any other form in
which I could compact the data easily into one field I would be pleased.
  Can anyone help me? I guess the  is quite a good way, but I don't
know which function should I use there to compact the data to be easily
usable in PHP. Or any other method?

thanks,
  Rich

-Original Message-
From: Stefan Matheis [mailto:matheis.ste...@googlemail.com] 
Sent: Monday, January 24, 2011 18:23
To: solr-user@lucene.apache.org
Subject: Re: DIH serialize

Hi Rich,

i'm a bit confused after reading your post .. what exactly you wanna try to
achieve? Serializing (like http://php.net/serialize) your complete row into
one field? Don't wanna search in them, just store and deliver them in your
results? Does that make sense? Sounds a bit strange :)

Regards
Stefan

On Mon, Jan 24, 2011 at 10:03 AM, Papp Richard  wrote:

> Hi Dennis,
>
>  thank you for your answer, but didn't understand why you say it doesn't
> need serialization. I'm with the option "C".
>  but the main question is, how to put into one field a result of many
> fields: "SELECT * FROM".
>
> thanks,
>  Rich
>
> -Original Message-
> From: Dennis Gearon [mailto:gear...@sbcglobal.net]
> Sent: Monday, January 24, 2011 02:07
> To: solr-user@lucene.apache.org
> Subject: Re: DIH serialize
>
> Depends on your process chain to the eventual viewer/consumer of the data.
>
> The questions to ask are:
>  A/ Is the data IN Solr going to be viewed or processed in its original
> form:
>  -->set stored = 'true'
> --->no serialization needed.
>  B/ If it's going to be anayzed and searched for separate from any other
> field,
>
>  the analyzing will put it into  an unreadable form. If you need to
see
> it,
> then
> --->set indexed="true" and stored="true"
> --->no serializaton needed.   C/ If it's NOT going to be viewed AS IS,
> and
> it's not going to be searched for AS IS,
>   (i.e. other columns will be how the data is found), and you have
> another,
>
>   serialzable format:
>   -->set indexed="false" and stored="true"
>   -->serialize AS PER THE INTENDED APPLICATION,
>   not sure that Solr can do that at all.
>  C/ If it's NOT going to be viewed AS IS, and it's not going to be
searched
> for
> AS IS,
>   (i.e. other columns will be how the data is found), and you have
> another,
>
>   serialzable format:
>   -->set indexed="false" and stored="true"
>   -->serialize AS PER THE INTENDED APPLICATION,
>   not sure that Solr can do that at all.
>  D/ If it's NOT going to be viewed AS IS, BUT it's going to be searched
for
> AS
> IS,
>   (this column will be how the data is found), and you have another,
>   serialzable format:
>   -->you need to put it into TWO columns
>   -->A SERIALIZED FIELD
>   -->set indexed="false" and stored="true"
>
>  -->>AN UNSERIALIZED FIELD
>   -->set indexed="false" and stored="true"
>   -->serialize AS PER THE INTENDED APPLICATION,
>   not sure that Solr can do that at all.
>
> Hope that helps!
>
>
> Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better
> idea to learn from others' mistakes, so you do not have to make them
> yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>
>
> - Original Message 
> From: Papp Richard 
> To: solr-user@lucene.apache.org
> Sent: Sun, January 23, 2011 2:02:05 PM
> Subject: DIH serialize
>
> Hi all,
>
>
>
>  I wasted the last few hours trying to serialize some column values (from
> mysql) into a Solr column, but I just can't find such a function. I'll use
> the value in PHP - I don't know if it is possible to serialize in PHP
style
> at all. This is what I tried and works with a given factor:
>
>
>
> in schema.xml:
>
>    stored="true" multiValued="true" />
>
>
>
> in DIH xml:
>
>
>
> 
>
>