Re: Configuring Replication

2011-12-26 Thread Ahmet Arslan
> I just configured the master server as it is specified in
> solr replication wiki page, nothing is indexed yet on master
> neither on  slave, 
> And in solr replication wiki page they have mentioned that
> after configuring master server if you hit the following url
> in web browser 
> 
> http://localhost:8983/solr/replication
> [http://localhost:8983/solr is master server's url ]
> 
> you should get response OK
> 
> but unfortunately, i am getting 404 Error with following
> message 
> 
> HTTP Status 404 - /solr/replication

May be you have multi-core set-up? Then you should add the coreName to your URL 
e.g. http://localhost:8983/coreName/replication

What happens when you hit the following ULRs?

http://localhost:8983
http://localhost:8983/solr


Re: Custom Shingle Factory Filter Requirement

2011-12-26 Thread Ahmet Arslan
>   I'm trying to implement an advanced Auto-Suggest
> field. Consider an
> example input String:
> 
>    "Word1 Word2 Word3 Word4 Word5 Word6"
> 
>    I just want this field to auto-suggest
> content based on whatever i type
> (no matter i start typing from word1 or word4). 


To achieve this behavior, you can use StandardTokenizerFactory and 
EdgeNGramFilterFactory and LowerCaseFilterFactory at index time.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory


Re: Configuring Replication

2011-12-26 Thread Ahson Iqbal
Hi Erick

Thanks a lot for your valued response,

I just configured the master server as it is specified in solr replication wiki 
page, nothing is indexed yet on master neither on  slave, 
And in solr replication wiki page they have mentioned that after configuring 
master server if you hit the following url in web browser 

http://localhost:8983/solr/replication [http://localhost:8983/solr is master 
server's url ]

you should get response OK

but unfortunately, i am getting 404 Error with following message 

HTTP Status 404 - /solr/replication

here are the configurations i have done on master server



        

        startup
        commit
        optimize        schema.xml,stopwords.txt,elevate.xml    

        00:00:10
    


now I want to know is there something wrong with the configurations 


Regards
Ahsan



 From: Erick Erickson 
To: solr-user@lucene.apache.org; Ahson Iqbal  
Sent: Friday, December 23, 2011 8:43 PM
Subject: Re: Configuring Replication
 
We need some more details. It might help to review:
http://wiki.apache.org/solr/UsingMailingLists
I don't even know how to go about trying.

Are you clicking on the master? Slave? Have
you indexed any docs on the master? Details matter.

But to your second point, yes this is a common (and recommended)
configuration. Index on the master and search on the slaves.

Best
Erick

On Fri, Dec 23, 2011 at 6:56 AM, Ahson Iqbal  wrote:
> Hi
>
> I want to setup replication, and have 2 question regarding it
>
> 1st i am having issue in configuring replication, I have done all as 
> mentioned http://wiki.apache.org/solr/SolrReplication at master server but 
> whenever I tried to open the url in browser
>
> http://localhost:8983/solr/replication
>
> I got this error
>
> HTTP Status 404 - /solr/replicationbut in tutorial it is mentioned i should 
> get OK status.
>
> 2ndly i have a plan that all indexing will be on master server and all 
> searches will be on slave server, am I doing it right.
>
> please help
>
> Regards
> Ahsan

Re: solr keep old docs

2011-12-26 Thread Mikhail Khludnev
On Tue, Dec 27, 2011 at 12:26 AM, Alexander Aristov <
alexander.aris...@gmail.com> wrote:

> Hi people,
>
> I urgently need your help!
>
> I have solr 3.3 configured and running. I do uncremental indexing 4 times a
> day using bulk updates. Some documents are identical to some extent and I
> wish to skip them, not to index.
> But here is the problem as I could not find a way to tell solr ignore new
> duplicate docs and keep old indexed docs. I don't care that it's new. Just
> determine by ID that such document is in the index already and that's it.
>
> I use solrj for indexing. I have tried setting overwrite=false and dedupe
> apprache but nothing helped me. I either have that a newer doc overwrites
> old one or I get duplicate.
>
> I think it's a very simple and basic feature and it must exist. What did I
> make wrong or didn't do?
>

I guess, because  the mainstream approach is delta-import , when you have
"updated" timestamps in your DB and "last-import" timestamp stored
somewhere. You can check how it works in DIH.


>
> Tried google but I couldn't find a solution there althoght many people
> encounted such problem.
>
>
it's definitely can be done by overriding
o.a.s.update.DirectUpdateHandler2.addDoc(AddUpdateCommand), but I suggest
to start from implementing your own
http://wiki.apache.org/solr/UpdateRequestProcessor - search for PK, bypass
chain call if it's found. Then if you meet performance issues on querying
your PKs one by one, (but only after that) you can batch your searches,
there are couple of optimization techniques for huge disjunction queries
like PK:(2 OR 4 OR 5 OR 6).


> I start considering that I must query index to check if a doc to be added
> is in the index already and do not add it to array but I have so many docs
> that I am affraid it's not a good solution.
>
> Best Regards
> Alexander Aristov
>



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics


solr keep old docs

2011-12-26 Thread Alexander Aristov
Hi people,

I urgently need your help!

I have solr 3.3 configured and running. I do uncremental indexing 4 times a
day using bulk updates. Some documents are identical to some extent and I
wish to skip them, not to index.
But here is the problem as I could not find a way to tell solr ignore new
duplicate docs and keep old indexed docs. I don't care that it's new. Just
determine by ID that such document is in the index already and that's it.

I use solrj for indexing. I have tried setting overwrite=false and dedupe
apprache but nothing helped me. I either have that a newer doc overwrites
old one or I get duplicate.

I think it's a very simple and basic feature and it must exist. What did I
make wrong or didn't do?

Tried google but I couldn't find a solution there althoght many people
encounted such problem.

I start considering that I must query index to check if a doc to be added
is in the index already and do not add it to array but I have so many docs
that I am affraid it's not a good solution.

Best Regards
Alexander Aristov


Custom Shingle Factory Filter Requirement

2011-12-26 Thread Vannia Rajan
Hi,

  I'm trying to implement an advanced Auto-Suggest field. Consider an
example input String:

   "Word1 Word2 Word3 Word4 Word5 Word6"

   I just want this field to auto-suggest content based on whatever i type
(no matter i start typing from word1 or word4). I tried using
ShingleFilterFactory, but it isn't fully satisfying my requirements. With
the ShingleFilterFactory, i'm able to get combinations like:

  "Word1 Word2 Word3..", "Word2 Word3 Word4...", etc., working. But, it
does not create combinations like "Word1 Word4 Word5...". So, it works as
long as i use consecutive words as in the input, but would not work if i
skip any words in between.

  I'm trying to use a custom FilterFactory to satisfy the requirement.
Though i'm a basic Java Developer, who could compile out things (and a PHP
developer for several yrs), I'm not able to get an "Example Filter" that i
could extend to create a new filter/plugin. Looking at the source of
ShingleFilterFactory.java included with SOLR just has a reference to Lucene
class.

 

  There is also another requirement - to join another multi-valued field as
a prefix one after another to this single-valued field to create out
several other combinations of auto-suggest.

  I hope someone could guide me to proceed in the right direction..

-- 
Thanks,
Vanniarajan


Re: solr.home

2011-12-26 Thread Thomas Fischer
Hi Shawn,

thanks for looking into this.
I am using a start-up script for Tomcat, and in that script there was actually 
the line

export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home='/srv/solr'"

which most likely created the problem.
With

export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/srv/solr"

I get

INFO: No /solr/home in JNDI
INFO: using system property solr.solr.home: /srv/solr

and everything seems to work fine, so there obviously was a tightening of the 
syntax somewhere between solr 1.4 and solr 3.5.

Thanks again
Thomas


Am 22.12.2011 um 17:06 schrieb Shawn Heisey:

> On 12/21/2011 4:13 AM, Thomas Fischer wrote:
>> I'm trying to move forward with my solr system from 1.4 to 3.5 and ran into 
>> some problems with solr home.
>> Is this a known problem?
>> 
>> My solr 1.4 gives me the following messages (amongst many many others…) in 
>> catalina.out:
>> 
>> INFO: No /solr/home in JNDI
>> INFO: using system property solr.solr.home: '/srv/solr'
>> INFO: looking for solr.xml: /'/srv/solr'/solr.xml
>> 
>> then finds the solr.xml and proceeds from there (this is multicore).
>> 
>> With solr 3.5 I get:
>> 
>> INFO: No /solr/home in JNDI
>> INFO: using system property solr.solr.home: '/srv/solr'
>> INFO: Solr home set to ''/srv/solr'/'
>> INFO: Solr home set to ''/srv/solr'/./'
>> SCHWERWIEGEND: java.lang.RuntimeException: Can't find resource '' in 
>> classpath or ''/srv/solr'/./conf/', cwd=/
>> 
>> After that solr is somehow started but not aware of the cores present.
>> 
>> This can be solved by putting a solr.xml file into 
>> $CATALINA_HOME/conf/Catalina/localhost/ with
>> > override="true" />
>> which results in
>> INFO: Using JNDI solr.home: /srv/solr
>> and everything seems to run smoothely afterwards, although solr.xml is never 
>> mentioned.
>> 
>> I would like to know when this changed and why, and why solr 3.5 is looking 
>> for solrconfig.xml instead of solr.xml in solr.home
>> 
>> (Am I the only one who finds it confusing to have the three names 
>> solr.solr.home (system property),  solr.home (JNDI), solr/home (Environment 
>> name) for the same object?)
> 
> Here's what I have as a commandline option when starting Jetty:
> 
> -Dsolr.solr.home=/index/solr
> 
> This is what my log from Solr 3.5.0 says at the very beginning.
> 
> Dec 14, 2011 8:42:28 AM org.apache.solr.core.SolrResourceLoader locateSolrHome
> INFO: JNDI not configured for solr (NoInitialContextEx)
> Dec 14, 2011 8:42:28 AM org.apache.solr.core.SolrResourceLoader locateSolrHome
> INFO: using system property solr.solr.home: /index/solr
> Dec 14, 2011 8:42:28 AM org.apache.solr.core.SolrResourceLoader 
> INFO: Solr home set to '/index/solr/'
> 
> Note that in my log it shows the system property without any kind of quotes, 
> but in yours, it is surrounded - '/srv/solr'.  I am guessing that wherever 
> you are defining solr.solr.home, you have included those quotes, and that 
> removing them would probably fix the problem.
> 
> If this is indeed the problem, the newer version is probably interpreting 
> input values much more literally, the old version probably ran the final path 
> value through a parser that took care of removing the quotes for you, but 
> that parser also removed certain characters that some users actually needed.  
> Notice that the quotes are interspersed in the full solr.xml path in your 1.4 
> log.
> 
> Thanks,
> Shawn
> 



Re: feature of FST version of SynonymFilter affects Highlighter

2011-12-26 Thread Robert Muir
On Mon, Dec 26, 2011 at 10:54 AM, Koji Sekiguchi  wrote:

> I don't have JUnit test case. What I tried was:
>
> I have indexing time synonym definition:
>
> nhl, national hockey league
>
> and I indexed "I like national hockey league".
>
> Then I searched nhl with hl=on, I got an unwanted highlight snippet
> "I like national hockey league".
>
> But if I set luceneMatchVersion to LUCENE_33 and re-indexed,
> I got an expected result "I like national hockey league".
>

Thanks Koji, I'll see if I can create a test case for this later
today. SynonymFilter could have a bug with the offsets.

-- 
lucidimagination.com


Re: feature of FST version of SynonymFilter affects Highlighter

2011-12-26 Thread Koji Sekiguchi

(11/12/26 23:58), Robert Muir wrote:

The old one didn't really handle this correctly either.


Sorry, I jumped the gun! I should say offsets, not positions!
So please ignore what I quoted in javadoc in my previous mail, sorry!


Koji, what is the highlighting problem? Can we have a test case?


I don't have JUnit test case. What I tried was:

I have indexing time synonym definition:

nhl, national hockey league

and I indexed "I like national hockey league".

Then I searched nhl with hl=on, I got an unwanted highlight snippet
"I like national hockey league".

But if I set luceneMatchVersion to LUCENE_33 and re-indexed,
I got an expected result "I like national hockey league".

koji
--
http://www.rondhuit.com/en/


Re: feature of FST version of SynonymFilter affects Highlighter

2011-12-26 Thread Robert Muir
The old one didn't really handle this correctly either.

Koji, what is the highlighting problem? Can we have a test case?

2011/12/26 Koji Sekiguchi :
> I found that SynonymFilter javadoc says:
>
> "Matches single or multi word synonyms in a token stream.
> This token stream cannot properly handle position increments != 1"
>
> I think due to the feature, Highlighter doesn't work properly in some cases:
>
> http://www.lucidimagination.com/search/document/c3ed1e0a2b12ddfa#c3ed1e0a2b12ddfa
>
> https://issues.apache.org/jira/browse/SOLR-2845
>
> Can we remove the restriction in some future?
>
> If not, I'd propose we have an option to choose SlowSynonymFilterFactory 
> explicitly
> in schema.xml (we can choose it by setting luceneMatchVersion to 33,
> but it is global).
>
> koji
> --
> http://www.rondhuit.com/en/



-- 
lucidimagination.com


Re: PlainTextEntityProcessor and RegexTransformer in DataImport Handler

2011-12-26 Thread meghana
Thanks Matthew ,

Its really helped a lot. i am about to done with this. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/PlainTextEntityProcessor-and-RegexTransformer-in-DataImport-Handler-tp3608449p3612674.html
Sent from the Solr - User mailing list archive at Nabble.com.


feature of FST version of SynonymFilter affects Highlighter

2011-12-26 Thread Koji Sekiguchi
I found that SynonymFilter javadoc says:

"Matches single or multi word synonyms in a token stream.
This token stream cannot properly handle position increments != 1"

I think due to the feature, Highlighter doesn't work properly in some cases:

http://www.lucidimagination.com/search/document/c3ed1e0a2b12ddfa#c3ed1e0a2b12ddfa

https://issues.apache.org/jira/browse/SOLR-2845

Can we remove the restriction in some future?

If not, I'd propose we have an option to choose SlowSynonymFilterFactory 
explicitly
in schema.xml (we can choose it by setting luceneMatchVersion to 33,
but it is global).

koji
-- 
http://www.rondhuit.com/en/