Re: nested faceting ?

2011-02-01 Thread Grijesh

Another Patch is also available for Hierarchical faceting is

https://issues.apache.org/jira/browse/SOLR-64

You can look at this ,may solve your problem

-
Thanx:
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/nested-faceting-tp2389841p2403601.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR 1.4 and Lucene 3.0.3 index problem

2011-02-01 Thread Grijesh

You can extract the solr.war using java's jar -xvf solr.war  command

change the lucene-2.9.jar with your lucene-3.0.3.jar in WEB-INF/lib
directory

then use jar -cxf solr.war * to again pack the war

deploy that war hope that work

-
Thanx:
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-1-4-and-Lucene-3-0-3-index-problem-tp2396605p2403542.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Lucene 3.0.3 index cannot be read by Solr

2011-02-01 Thread Grijesh

solr1.4 is compatible for lucene2.9 version.
If your index version is 3.0.3 then it can not be read by lucene2.9 version.

You can try to change solr's lucene2.9 jar with your lucene3.0.3 jar and
restart your server
Hope it may work.

-
Thanx:
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Lucene-3-0-3-index-cannot-be-read-by-Solr-tp2396649p2403161.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Lock obtain timed out: NativeFSLock

2011-02-01 Thread Chris Hostetter

: I'm going to go ahead and replay to myself since I solved my problem.  It
: seems I was doing one more update to the data at the end and wasn't doing a
: commit, so it then couldn't write to the other core.  Adding the last commit
: seems to have fixed everything.

sending interleaving updates to multiple cores shouldn't have caused any 
locking related problems -- unless for some reason both cores are using 
the same index (ie: do you have them configured to use the same data 
directory?)

that would be very bad.  (if you are doing this intentionally, please 
explain your goal, because i can't think of any reason why you might want 
two solr cores in the same solr instance wanting to use the same solr 
index)

-Hoss


best practice for solr-power jsp?

2011-02-01 Thread Paul Libbrecht

Hello list,

this was asked again recently but I still see no answer.
What is the best practice to write jsp files that are, for example, search 
results of solr?

The only relevant thing I found is
http://www.ibm.com/developerworks/java/library/j-solr1/
"Search smarter with Apache Solr" at IBM devleoper works by Grant Ingersoll.

But that one is very old.

Imitating it, I would write a servlet that grabs the components and puts them 
as request attributes then dispatch to a jsp.

Is there a better way?
Is such code already part of a more widespread distribution?
I know there's velocity, and that one works well, but testing in velocity is 
really too much a pain.

paul

Solr and Eclipse

2011-02-01 Thread Eric Grobler
Hi

I am a newbie and I am trying to run solr in eclipse.

>From this url
http://wiki.apache.org/solr/HowToContribute#Development_Environment_Tips
there is a subclipse example:

I use Team -> Share Project and this url:
  http://svn.apache.org/repos/asf/lucene/dev/trunk

but I get a "access forbidden for unknown reason error"

I assume using readonly http I do not need credentials?

Also, would it make more sense to rather checkout the project with
the command-line svn and in Eclipse use
"Create project from existing source"?


Thanks
Ericz


Re: Lock obtain timed out: NativeFSLock

2011-02-01 Thread Alex Thurlow
I'm going to go ahead and replay to myself since I solved my problem.  
It seems I was doing one more update to the data at the end and wasn't 
doing a commit, so it then couldn't write to the other core.  Adding the 
last commit seems to have fixed everything.


On 2/1/2011 11:08 AM, Alex Thurlow wrote:
I recently added a second core to my solr setup, and I'm now running 
into this "Lock obtain timed out" error when I try to update one core 
after I've updated another core.


In my update process, I add/update 1000 documents at a time and commit 
in between.  Then at the end, I commit and optimize.  The update of 
the new core has about 150k documents.  If I try to update the old 
core any time after updating the new core (even a couple hours later), 
I get the below error.  I've tried switching to the simple lock, but 
that didn't change anything.  I've tried this on solr 1.4 and 1.4.1 
both with the spatial-solr-2.0-RC2 plugin loaded.


If I restart solr, I can then update the old core again.

Does anyone have any insight for me here?



Feb 1, 2011 10:59:57 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain 
timed out: 
NativeFSLock@./solr/data/index/lucene-088f283afa122cf05ce7eadb1b5ce07b-write.lock

at org.apache.lucene.store.Lock.obtain(Lock.java:85)
at 
org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1545)
at 
org.apache.lucene.index.IndexWriter.(IndexWriter.java:1402)
at 
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:190)
at 
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
at 
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at 
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)

at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at 
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)

at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at 
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at 
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)







Re: Sending binary data as part of a query

2011-02-01 Thread Jay Luker
On Mon, Jan 31, 2011 at 9:22 PM, Chris Hostetter
 wrote:

> that class should probably have been named ContentStreamUpdateHandlerBase
> or something like that -- it tries to encapsulate the logic that most
> RequestHandlers using COntentStreams (for updating) need to worry about.
>
> Your QueryComponent (as used by SearchHandler) should be able to access
> the ContentStreams the same way that class does ... call
> req.getContentStreams().
>
> Sending a binary stream from a remote client depends on how the client is
> implemented -- you can do it via HTTP using the POST body (with or w/o
> multi-part mime) in any langauge you want. If you are using SolrJ you may
> again run into an assumption that using ContentStreams means you are doing
> an "Update" but that's just a vernacular thing ... something like a
> ContentStreamUpdateRequest should work just as well for a query (as long
> as you set the neccessary params and/or request handler path)

Thanks for the help. I was just about to reply to my own question for
the benefit of future googlers when I noticed your response. :)

I actually got this working, much the way you suggest. The client is
python. I created a gist with the script I used for testing [1].

On the solr side my QueryComponent grabs the stream, uses
jzlib.ZInputStream to do the deflating, then translates the incoming
integers in the bitset (my solr schema.xml integer ids) to the lucene
ids and creates a docSetFilter with them.

Very relieved to get this working as it's the basis of a talk I'm
giving next week [2]. :-)

--jay

[1] https://gist.github.com/806397
[2] http://code4lib.org/conference/2011/luker


Re: Malformed XML with exotic characters

2011-02-01 Thread Robert Muir
Hi, it might only be a problem with your xml tools (e.g. firefox).
the problem here is characters outside of the basic multilingual plane
(in this case Gothic).
XML tools typically fall apart on these portions of unicode (in lucene
we recently reverted to a patched/hacked copy of xerces specifically
for this reason).

If you care about characters outside of the basic multilingual plane
actually working, unfortunately you have to start being very very very
particular about what software you use... you can assume most
software/setups WON'T work.
For example, if you were to use mysql's "utf8" character set you would
find it doesn't actually support all of UTF-8! in this case you would
need to use the recent 'utf8mb4' or something instead, that is
actually utf-8!
Thats just one example of a well-used piece of software that suffers
from issues like this, there are others.

Its for reasons like these that if support for these languages is
important to you, I would stick with the most simple/textual methods
for input and output: e.g. using things like CSV and JSON if you can.
I would also fully test every component/jar in your application
individually and once you get it working, don't ever upgrade.

In any case, if you are having problems with characters outside of the
basic multilingual plane, and you suspect its actually a bug in Solr,
please open a JIRA issue, especially if you can provide some way to
reproduce it

On Tue, Feb 1, 2011 at 10:43 AM, Markus Jelsma
 wrote:
> There is an issue with the XML response writer. It cannot cope with some very
> exotic characters or possibly the right-to-left writing systems. The issue can
> be reproduced by indexing the content of the home page of wikipedia as it
> contains a lot of exotic matter. The problem does not affect the JSON response
> writer.
>
> The problem is, i am unsure whether this is a bug in Solr or that perhaps
> Firefox itself trips over.
>
>
> Here's the output of the JSONResponeWriter for a query returning the home
> page:
> {
>  "responseHeader":{
>  "status":0,
>  "QTime":1,
>  "params":{
>        "fl":"url,content",
>        "indent":"true",
>        "wt":"json",
>        "q":"*:*",
>        "rows":"1"}},
>  "response":{"numFound":6744,"start":0,"docs":[
>        {
>         "url":"http://www.wikipedia.org/";,
>         "content":"Wikipedia English The Free Encyclopedia 3 543 000+ 
> articles 日
> 本語 フリー百科事典 730 000+ 記事 Deutsch Die freie Enzyklopädie 1 181 000+ Artikel
> Español La enciclopedia libre 710 000+ artículos Français L’encyclopédie libre
> 1 061 000+ articles Русский Свободная энциклопедия 654 000+ статей Italiano
> L’enciclopedia libera 768 000+ voci Português A enciclopédia livre 669 000+
> artigos Polski Wolna encyklopedia 769 000+ haseł Nederlands De vrije
> encyclopedie 668 000+ artikelen Search  • Suchen  • Rechercher  • Szukaj  •
> Ricerca  • 検索  • Buscar  • Busca  • Zoeken  • Поиск  • Sök  • 搜尋  • Cerca  •
> Søk  • Haku  • Пошук  • Hledání  • Keresés  • Căutare  • 찾기  • Tìm kiếm  • Ara
> • Cari  • Søg  • بحث  • Serĉu  • Претрага  • Paieška  • Hľadať  • Suk  • جستجو
> • חיפוש  • Търсене  • Poišči  • Cari  • Bilnga العربية Български Català Česky
> Dansk Deutsch English Español Esperanto فارسی Français 한국어 Bahasa Indonesia
> Italiano עברית Lietuvių Magyar Bahasa Melayu Nederlands 日本語 Norsk (bokmål)
> Polski Português Română Русский Slovenčina Slovenščina Српски / Srpski Suomi
> Svenska Türkçe Українська Tiếng Việt Volapük Winaray 中文   100 000+   العربية
> • Български  • Català  • Česky  • Dansk  • Deutsch  • English  • Español  •
> Esperanto  • فارسی  • Français  • 한국어  • Bahasa Indonesia  • Italiano  • עברית
> • Lietuvių  • Magyar  • Bahasa Melayu  • Nederlands  • 日本語  • Norsk (bokmål)
> • Polski  • Português  • Русский  • Română  • Slovenčina  • Slovenščina  •
> Српски / Srpski  • Suomi  • Svenska  • Türkçe  • Українська  • Tiếng Việt  •
> Volapük  • Winaray  • 中文   10 000+   Afrikaans  • Aragonés  • Armãneashce  •
> Asturianu  • Kreyòl Ayisyen  • Azərbaycan / آذربايجان ديلی  • বাংলা  • 
> Беларуская
> ( Акадэмічная  • Тарашкевiца )  • বিষ্ণুপ্রিযা় মণিপুরী  • Bosanski  • 
> Brezhoneg  • Чăваш
> • Cymraeg  • Eesti  • Ελληνικά  • Euskara  • Frysk  • Gaeilge  • Galego  •
> ગુજરાતી  • Հայերեն  • हिन्दी  • Hrvatski  • Ido  • Íslenska  • Basa Jawa  • 
> ಕನ್ನಡ  •
> ქართული  • Kurdî / كوردی  • Latina  • Latviešu  • Lëtzebuergesch  • Lumbaart
> • Македонски  • മലയാളം  • मराठी  • नेपाल भाषा  • नेपाली  • Norsk (nynorsk)  • 
> Nnapulitano
> • Occitan  • Piemontèis  • Plattdüütsch  • Ripoarisch  • Runa Simi  • شاہ مکھی
> پنجابی  • Shqip  • Sicilianu  • Simple English  • Sinugboanon  •
> Srpskohrvatski / Српскохрватски  • Basa Sunda  • Kiswahili  • Tagalog  • தமிழ்
> • తెలుగు  • ไทย  • اردو  • Walon  • Yorùbá  • 粵語  • Žemaitėška   1 000+   
> Bahsa
> Acèh  • Alemannisch  • አማርኛ  • Arpitan  • ܐܬܘܪܝܐ  • Avañe’ẽ  • Aymar Aru  •
> Bân-lâm-gú  • Bahasa Banjar  • Basa Banyumasan  • Башҡорт  • भोजपुरी  • Bikol
> Central  • Boarisch  • བོད་ཡིག  • Chav

Re: Solr Indexing Performance

2011-02-01 Thread Darx Oman
Thanx  Tomas
I'll try with different configuration


SolrIndexWriter was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!!

2011-02-01 Thread Ravi Kiran
Hello,
  While reloading a core I got this following error, when does this
occur ? Prior to this exception I do not see anything wrong in the logs.

[#|2011-02-01T13:02:36.697-0500|SEVERE|sun-appserver2.1|org.apache.solr.servlet.SolrDispatchFilter|_ThreadID=25;_ThreadName=httpWorkerThread-9001-5;_RequestID=450f6337-1f5c-42bc-a572-f0924de36b56;|org.apache.lucene.store.LockObtainFailedException:
Lock obtain timed out: NativeFSLock@
/data/solr/core/solr-data/index/lucene-7dc773a074342fa21d7d5ba09fc80678-write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:85)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1565)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1421)
at
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:191)
at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:313)
at
org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:287)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:218)
at
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
at
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94)
at
com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:98)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:222)
at
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
at
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587)
at
org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1096)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:166)
at
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
at
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587)
at
org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1096)
at
org.apache.coyote.tomcat5.CoyoteAdapter.service(CoyoteAdapter.java:290)
at
com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.invokeAdapter(DefaultProcessorTask.java:647)
at
com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.doProcess(DefaultProcessorTask.java:579)
at
com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.process(DefaultProcessorTask.java:831)
at
com.sun.enterprise.web.connector.grizzly.DefaultReadTask.executeProcessorTask(DefaultReadTask.java:341)
at
com.sun.enterprise.web.connector.grizzly.DefaultReadTask.doTask(DefaultReadTask.java:263)
at
com.sun.enterprise.web.connector.grizzly.DefaultReadTask.doTask(DefaultReadTask.java:214)
at
com.sun.enterprise.web.connector.grizzly.TaskBase.run(TaskBase.java:265)
at
com.sun.enterprise.web.connector.grizzly.WorkerThreadImpl.run(WorkerThreadImpl.java:116)
|#]

[#|2011-02-01T13:02:40.330-0500|SEVERE|sun-appserver2.1|org.apache.solr.update.SolrIndexWriter|_ThreadID=82;_ThreadName=Finalizer;_RequestID=121fac59-7b08-46b9-acaa-5c5462418dc7;|SolrIndexWriter
was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE
LEAK!!!|#]

[#|2011-02-01T13:02:40.330-0500|SEVERE|sun-appserver2.1|org.apache.solr.update.SolrIndexWriter|_ThreadID=82;_ThreadName=Finalizer;_RequestID=121fac59-7b08-46b9-acaa-5c5462418dc7;|SolrIndexWriter
was not closed prior to finalize()

Re: EdgeNgram Auto suggest - doubles ignore

2011-02-01 Thread johnnyisrael

Hi Erick,

I tried to use terms component, I got ended up with the following problems.

Problem: 1

Custom Sort not working in terms component:

http://lucene.472066.n3.nabble.com/Term-component-sort-is-not-working-td1905059.html#a1909386

I want to sort using one of my custom field[value_score], I gave it aleady
in my configuration, but it is not sorting properly.

The following are the configuration in solrconfig.xml

  

  
 
true
json
name
value_score desc
true
 

  termsComponent

  

The SOLR response tag is not returned based on sorted parameter.

Problem: 2

Cap sensitive problem: [I am searching for "Apple"]

http://localhost/solr/core1/terms?terms.fl=name&terms.prefix=apple <-- not
working

http://localhost/solr/core1/terms?terms.fl=name&terms.prefix=Apple <--
working

Tried regex to overcome cap-sensitive problem: 

http://localhost/solr/core1/terms?terms.fl=name&terms.regex=Apple&terms.regex.flag=case_insensitive

Is this regex based search will help me for my requirement?

It is returning irrelevant results. I am using the same syntax it is
mentioned in WIKI.

http://wiki.apache.org/solr/TermsComponent

Am I going wrong anywhere?

Please let me know if you need any more info.

Thanks,

Johnny
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/EdgeNgram-Auto-suggest-doubles-ignore-tp2321919p2399330.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Malformed XML with exotic characters

2011-02-01 Thread Sascha Szott

Hi Markus,

in my case the JSON response writer returns valid JSON. The same holds 
for the PHP response writer.


-Sascha

On 01.02.2011 18:44, Markus Jelsma wrote:

You can exclude the input's involvement by checking if other response writers
do work. For me, the JSONResponseWriter works perfectly with the same returned
data in some AJAX environment.

On Tuesday 01 February 2011 18:29:06 Sascha Szott wrote:

Hi folks,

I've made the same observation when working with Solr's
ExtractingRequestHandler on the command line (no browser interaction).

When issuing the following curl command

curl
'http://mysolrhost/solr/update/extract?extractOnly=true&extractFormat=text&;
wt=xml&resource.name=foo.pdf' --data-binary @foo.pdf -H
'Content-type:text/xml; charset=utf-8'>  foo.xml

Solr's XML response writer returns malformed xml, e.g., xmllint gives me:

foo.xml:21: parser error : Char 0xD835 out of allowed range
foo.xml:21: parser error : PCDATA invalid Char value 55349

I'm not totally sure, if this is an Tika/PDFBox issue. However, I would
expect in every case that the XML output produced by Solr is well-formed
even if the libraries used under the hood return "garbage".


-Sascha

p.s. I can provide the pdf file in question, if anybody would like to
see it in action.

On 01.02.2011 16:43, Markus Jelsma wrote:

There is an issue with the XML response writer. It cannot cope with some
very exotic characters or possibly the right-to-left writing systems.
The issue can be reproduced by indexing the content of the home page of
wikipedia as it contains a lot of exotic matter. The problem does not
affect the JSON response writer.

The problem is, i am unsure whether this is a bug in Solr or that perhaps
Firefox itself trips over.


Here's the output of the JSONResponeWriter for a query returning the home
page:
{

   "responseHeader":{

"status":0,
"QTime":1,
"params":{

"fl":"url,content",
"indent":"true",
"wt":"json",
"q":"*:*",
"rows":"1"}},

   "response":{"numFound":6744,"start":0,"docs":[

{

 "url":"http://www.wikipedia.org/";,
 "content":"Wikipedia English The Free Encyclopedia 3 543 000+ articles
 日

本語 フリー百科事典 730 000+ 記事 Deutsch Die freie Enzyklopädie 1 181 000+ Artikel
Español La enciclopedia libre 710 000+ artículos Français L’encyclopédie
libre 1 061 000+ articles Русский Свободная энциклопедия 654 000+ статей
Italiano L’enciclopedia libera 768 000+ voci Português A enciclopédia
livre 669 000+ artigos Polski Wolna encyklopedia 769 000+ haseł
Nederlands De vrije encyclopedie 668 000+ artikelen Search  • Suchen  •
Rechercher  • Szukaj  • Ricerca  • 検索  • Buscar  • Busca  • Zoeken  •
Поиск  • Sök  • 搜尋  • Cerca  • Søk  • Haku  • Пошук  • Hledání  •
Keresés  • Căutare  • 찾기  • Tìm kiếm  • Ara • Cari  • Søg  • بحث  •
Serĉu  • Претрага  • Paieška  • Hľadať  • Suk  • جستجو • חיפוש  •
Търсене  • Poišči  • Cari  • Bilnga العربية Български Català Česky Dansk
Deutsch English Español Esperanto فارسی Français 한국어 Bahasa Indonesia
Italiano עברית Lietuvių Magyar Bahasa Melayu Nederlands 日本語 Norsk
(bokmål) Polski Português Română Русский Slovenčina Slovenščina Српски /
Srpski Suomi Svenska Türkçe Українська Tiếng Việt Volapük Winaray 中文
100 000+   العربية • Български  • Català  • Česky  • Dansk  • Deutsch  •
English  • Español  • Esperanto  • فارسی  • Français  • 한국어  • Bahasa
Indonesia  • Italiano  • עברית • Lietuvių  • Magyar  • Bahasa Melayu  •
Nederlands  • 日本語  • Norsk (bokmål) • Polski  • Português  • Русский  •
Română  • Slovenčina  • Slovenščina  • Српски / Srpski  • Suomi  •
Svenska  • Türkçe  • Українська  • Tiếng Việt  • Volapük  • Winaray  •
中文   10 000+   Afrikaans  • Aragonés  • Armãneashce  • Asturianu  •
Kreyòl Ayisyen  • Azərbaycan / آذربايجان ديلی  • বাংলা  • Беларуская (
Акадэмічная  • Тарашкевiца )  • বিষ্ণুপ্রিযা় মণিপুরী  • Bosanski  •
Brezhoneg  • Чăваш • Cymraeg  • Eesti  • Ελληνικά  • Euskara  • Frysk  •
Gaeilge  • Galego  • ગુજરાતી  • Հայերեն  • हिन्दी  • Hrvatski  • Ido  •
Íslenska  • Basa Jawa  • ಕನ್ನಡ  • ქართული  • Kurdî / كوردی  • Latina  •
Latviešu  • Lëtzebuergesch  • Lumbaart • Македонски  • മലയാളം  • मराठी
• नेपाल भाषा  • नेपाली  • Norsk (nynorsk)  • Nnapulitano • Occitan  •
Piemontèis  • Plattdüütsch  • Ripoarisch  • Runa Simi  • شاہ مکھی پنجابی
  • Shqip  • Sicilianu  • Simple English  • Sinugboanon  • Srpskohrvatski
/ Српскохрватски  • Basa Sunda  • Kiswahili  • Tagalog  • தமிழ் • తెలుగు
  • ไทย  • اردو  • Walon  • Yorùbá  • 粵語  • Žemaitėška   1 000+   Bahsa
Acèh  • Alemannisch  • አማርኛ  • Arpitan  • ܐܬܘܪܝܐ  • Avañe’ẽ  • Aymar Aru
  • Bân-lâm-gú  • Bahasa Banjar  • Basa Banyumasan  • Башҡорт  • भोजपुरी
• Bikol Central  • Boarisch  • བོད་ཡིག  • Chavacano de Zamboanga  •
Corsu  • Deitsch  • ދިވެހި  • Diné Bizaad  • Eald Englisc  •
Emigliàn–Rumagnòl  • Эрзянь  • Estremeñu • Fiji Hindi  • Føroyskt  •
Furlan  • Gaelg  • Gàidhlig  • 贛語  • گیلکی  • Hak- kâ-fa / 客家話  • Хальмг
  • ʻ

Re: Malformed XML with exotic characters

2011-02-01 Thread Markus Jelsma
You can exclude the input's involvement by checking if other response writers 
do work. For me, the JSONResponseWriter works perfectly with the same returned 
data in some AJAX environment.

On Tuesday 01 February 2011 18:29:06 Sascha Szott wrote:
> Hi folks,
> 
> I've made the same observation when working with Solr's
> ExtractingRequestHandler on the command line (no browser interaction).
> 
> When issuing the following curl command
> 
> curl
> 'http://mysolrhost/solr/update/extract?extractOnly=true&extractFormat=text&;
> wt=xml&resource.name=foo.pdf' --data-binary @foo.pdf -H
> 'Content-type:text/xml; charset=utf-8' > foo.xml
> 
> Solr's XML response writer returns malformed xml, e.g., xmllint gives me:
> 
> foo.xml:21: parser error : Char 0xD835 out of allowed range
> foo.xml:21: parser error : PCDATA invalid Char value 55349
> 
> I'm not totally sure, if this is an Tika/PDFBox issue. However, I would
> expect in every case that the XML output produced by Solr is well-formed
> even if the libraries used under the hood return "garbage".
> 
> 
> -Sascha
> 
> p.s. I can provide the pdf file in question, if anybody would like to
> see it in action.
> 
> On 01.02.2011 16:43, Markus Jelsma wrote:
> > There is an issue with the XML response writer. It cannot cope with some
> > very exotic characters or possibly the right-to-left writing systems.
> > The issue can be reproduced by indexing the content of the home page of
> > wikipedia as it contains a lot of exotic matter. The problem does not
> > affect the JSON response writer.
> > 
> > The problem is, i am unsure whether this is a bug in Solr or that perhaps
> > Firefox itself trips over.
> > 
> > 
> > Here's the output of the JSONResponeWriter for a query returning the home
> > page:
> > {
> > 
> >   "responseHeader":{
> >   
> >"status":0,
> >"QTime":1,
> >"params":{
> > 
> > "fl":"url,content",
> > "indent":"true",
> > "wt":"json",
> > "q":"*:*",
> > "rows":"1"}},
> > 
> >   "response":{"numFound":6744,"start":0,"docs":[
> > 
> > {
> > 
> >  "url":"http://www.wikipedia.org/";,
> >  "content":"Wikipedia English The Free Encyclopedia 3 543 000+ articles
> >  日
> > 
> > 本語 フリー百科事典 730 000+ 記事 Deutsch Die freie Enzyklopädie 1 181 000+ Artikel
> > Español La enciclopedia libre 710 000+ artículos Français L’encyclopédie
> > libre 1 061 000+ articles Русский Свободная энциклопедия 654 000+ статей
> > Italiano L’enciclopedia libera 768 000+ voci Português A enciclopédia
> > livre 669 000+ artigos Polski Wolna encyklopedia 769 000+ haseł
> > Nederlands De vrije encyclopedie 668 000+ artikelen Search  • Suchen  •
> > Rechercher  • Szukaj  • Ricerca  • 検索  • Buscar  • Busca  • Zoeken  •
> > Поиск  • Sök  • 搜尋  • Cerca  • Søk  • Haku  • Пошук  • Hledání  •
> > Keresés  • Căutare  • 찾기  • Tìm kiếm  • Ara • Cari  • Søg  • بحث  •
> > Serĉu  • Претрага  • Paieška  • Hľadať  • Suk  • جستجو • חיפוש  •
> > Търсене  • Poišči  • Cari  • Bilnga العربية Български Català Česky Dansk
> > Deutsch English Español Esperanto فارسی Français 한국어 Bahasa Indonesia
> > Italiano עברית Lietuvių Magyar Bahasa Melayu Nederlands 日本語 Norsk
> > (bokmål) Polski Português Română Русский Slovenčina Slovenščina Српски /
> > Srpski Suomi Svenska Türkçe Українська Tiếng Việt Volapük Winaray 中文  
> > 100 000+   العربية • Български  • Català  • Česky  • Dansk  • Deutsch  •
> > English  • Español  • Esperanto  • فارسی  • Français  • 한국어  • Bahasa
> > Indonesia  • Italiano  • עברית • Lietuvių  • Magyar  • Bahasa Melayu  •
> > Nederlands  • 日本語  • Norsk (bokmål) • Polski  • Português  • Русский  •
> > Română  • Slovenčina  • Slovenščina  • Српски / Srpski  • Suomi  •
> > Svenska  • Türkçe  • Українська  • Tiếng Việt  • Volapük  • Winaray  •
> > 中文   10 000+   Afrikaans  • Aragonés  • Armãneashce  • Asturianu  •
> > Kreyòl Ayisyen  • Azərbaycan / آذربايجان ديلی  • বাংলা  • Беларуская (
> > Акадэмічная  • Тарашкевiца )  • বিষ্ণুপ্রিযা় মণিপুরী  • Bosanski  •
> > Brezhoneg  • Чăваш • Cymraeg  • Eesti  • Ελληνικά  • Euskara  • Frysk  •
> > Gaeilge  • Galego  • ગુજરાતી  • Հայերեն  • हिन्दी  • Hrvatski  • Ido  •
> > Íslenska  • Basa Jawa  • ಕನ್ನಡ  • ქართული  • Kurdî / كوردی  • Latina  •
> > Latviešu  • Lëtzebuergesch  • Lumbaart • Македонски  • മലയാളം  • मराठी 
> > • नेपाल भाषा  • नेपाली  • Norsk (nynorsk)  • Nnapulitano • Occitan  •
> > Piemontèis  • Plattdüütsch  • Ripoarisch  • Runa Simi  • شاہ مکھی پنجابی
> >  • Shqip  • Sicilianu  • Simple English  • Sinugboanon  • Srpskohrvatski
> > / Српскохрватски  • Basa Sunda  • Kiswahili  • Tagalog  • தமிழ் • తెలుగు
> >  • ไทย  • اردو  • Walon  • Yorùbá  • 粵語  • Žemaitėška   1 000+   Bahsa
> > Acèh  • Alemannisch  • አማርኛ  • Arpitan  • ܐܬܘܪܝܐ  • Avañe’ẽ  • Aymar Aru
> >  • Bân-lâm-gú  • Bahasa Banjar  • Basa Banyumasan  • Башҡорт  • भोजपुरी 
> > • Bikol Central  • Boarisch  • བོད་ཡིག  • Chavacano de Zamboanga  •
> > Corsu  • Deitsch  • ދިވެހި  • Diné Bizaad  • Eald Englisc  •
> > Emigliàn–Rumagnòl  • Эрзя

Re: one column indexed, the other isnt

2011-02-01 Thread PeterKerk

I solved it by altering my SQL statement to return a 'true' or 'false' value:
CASE WHEN c.varstatement='False' THEN 'false' ELSE 'true' END as
varstatement 

Thanks! 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/one-column-indexed-the-other-isnt-tp2389819p2399011.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Malformed XML with exotic characters

2011-02-01 Thread Sascha Szott

Hi folks,

I've made the same observation when working with Solr's 
ExtractingRequestHandler on the command line (no browser interaction).


When issuing the following curl command

curl 
'http://mysolrhost/solr/update/extract?extractOnly=true&extractFormat=text&wt=xml&resource.name=foo.pdf' 
--data-binary @foo.pdf -H 'Content-type:text/xml; charset=utf-8' > foo.xml


Solr's XML response writer returns malformed xml, e.g., xmllint gives me:

foo.xml:21: parser error : Char 0xD835 out of allowed range
foo.xml:21: parser error : PCDATA invalid Char value 55349

I'm not totally sure, if this is an Tika/PDFBox issue. However, I would 
expect in every case that the XML output produced by Solr is well-formed 
even if the libraries used under the hood return "garbage".



-Sascha

p.s. I can provide the pdf file in question, if anybody would like to 
see it in action.



On 01.02.2011 16:43, Markus Jelsma wrote:

There is an issue with the XML response writer. It cannot cope with some very
exotic characters or possibly the right-to-left writing systems. The issue can
be reproduced by indexing the content of the home page of wikipedia as it
contains a lot of exotic matter. The problem does not affect the JSON response
writer.

The problem is, i am unsure whether this is a bug in Solr or that perhaps
Firefox itself trips over.


Here's the output of the JSONResponeWriter for a query returning the home
page:
{
  "responseHeader":{
   "status":0,
   "QTime":1,
   "params":{
"fl":"url,content",
"indent":"true",
"wt":"json",
"q":"*:*",
"rows":"1"}},
  "response":{"numFound":6744,"start":0,"docs":[
{
 "url":"http://www.wikipedia.org/";,
 "content":"Wikipedia English The Free Encyclopedia 3 543 000+ articles 
日
本語 フリー百科事典 730 000+ 記事 Deutsch Die freie Enzyklopädie 1 181 000+ Artikel
Español La enciclopedia libre 710 000+ artículos Français L’encyclopédie libre
1 061 000+ articles Русский Свободная энциклопедия 654 000+ статей Italiano
L’enciclopedia libera 768 000+ voci Português A enciclopédia livre 669 000+
artigos Polski Wolna encyklopedia 769 000+ haseł Nederlands De vrije
encyclopedie 668 000+ artikelen Search  • Suchen  • Rechercher  • Szukaj  •
Ricerca  • 検索  • Buscar  • Busca  • Zoeken  • Поиск  • Sök  • 搜尋  • Cerca  •
Søk  • Haku  • Пошук  • Hledání  • Keresés  • Căutare  • 찾기  • Tìm kiếm  • Ara
• Cari  • Søg  • بحث  • Serĉu  • Претрага  • Paieška  • Hľadať  • Suk  • جستجو
• חיפוש  • Търсене  • Poišči  • Cari  • Bilnga العربية Български Català Česky
Dansk Deutsch English Español Esperanto فارسی Français 한국어 Bahasa Indonesia
Italiano עברית Lietuvių Magyar Bahasa Melayu Nederlands 日本語 Norsk (bokmål)
Polski Português Română Русский Slovenčina Slovenščina Српски / Srpski Suomi
Svenska Türkçe Українська Tiếng Việt Volapük Winaray 中文   100 000+   العربية
• Български  • Català  • Česky  • Dansk  • Deutsch  • English  • Español  •
Esperanto  • فارسی  • Français  • 한국어  • Bahasa Indonesia  • Italiano  • עברית
• Lietuvių  • Magyar  • Bahasa Melayu  • Nederlands  • 日本語  • Norsk (bokmål)
• Polski  • Português  • Русский  • Română  • Slovenčina  • Slovenščina  •
Српски / Srpski  • Suomi  • Svenska  • Türkçe  • Українська  • Tiếng Việt  •
Volapük  • Winaray  • 中文   10 000+   Afrikaans  • Aragonés  • Armãneashce  •
Asturianu  • Kreyòl Ayisyen  • Azərbaycan / آذربايجان ديلی  • বাংলা  • 
Беларуская
( Акадэмічная  • Тарашкевiца )  • বিষ্ণুপ্রিযা় মণিপুরী  • Bosanski  • 
Brezhoneg  • Чăваш
• Cymraeg  • Eesti  • Ελληνικά  • Euskara  • Frysk  • Gaeilge  • Galego  •
ગુજરાતી  • Հայերեն  • हिन्दी  • Hrvatski  • Ido  • Íslenska  • Basa Jawa  • 
ಕನ್ನಡ  •
ქართული  • Kurdî / كوردی  • Latina  • Latviešu  • Lëtzebuergesch  • Lumbaart
• Македонски  • മലയാളം  • मराठी  • नेपाल भाषा  • नेपाली  • Norsk (nynorsk)  • 
Nnapulitano
• Occitan  • Piemontèis  • Plattdüütsch  • Ripoarisch  • Runa Simi  • شاہ مکھی
پنجابی  • Shqip  • Sicilianu  • Simple English  • Sinugboanon  •
Srpskohrvatski / Српскохрватски  • Basa Sunda  • Kiswahili  • Tagalog  • தமிழ்
• తెలుగు  • ไทย  • اردو  • Walon  • Yorùbá  • 粵語  • Žemaitėška   1 000+   Bahsa
Acèh  • Alemannisch  • አማርኛ  • Arpitan  • ܐܬܘܪܝܐ  • Avañe’ẽ  • Aymar Aru  •
Bân-lâm-gú  • Bahasa Banjar  • Basa Banyumasan  • Башҡорт  • भोजपुरी  • Bikol
Central  • Boarisch  • བོད་ཡིག  • Chavacano de Zamboanga  • Corsu  • Deitsch  •
ދިވެހި  • Diné Bizaad  • Eald Englisc  • Emigliàn–Rumagnòl  • Эрзянь  • 
Estremeñu
• Fiji Hindi  • Føroyskt  • Furlan  • Gaelg  • Gàidhlig  • 贛語  • گیلکی  • Hak-
kâ-fa / 客家話  • Хальмг  • ʻŌlelo Hawaiʻi  • Hornjoserbsce  • Ilokano  •
Interlingua  • Interlingue  • Ирон Æвзаг  • Kapampangan  • Kaszëbsczi  •
Kernewek  • ភាសាខ្មែរ  • Kinyarwanda  • Коми  • Кыргызча  • Ladino / לאדינו  •
Ligure  • Limburgs  • Lingála  • lojban  • Malagasy  • Malti  • 文言  • Māori  •
مصرى  • مازِرونی / Mäzeruni  • Монгол  • မြန်မာဘာသာ  • Nāhuatlahtōlli  •
Nedersaksisch  • Nouormand  • Novial  • Нохчийн  • Олык Марий  • O‘zbek  • पाऴि
• Pangasinán  • ਪੰਜਾਬੀ 

Re: chaning schema

2011-02-01 Thread Dennis Gearon
Cool, thanks for the tip, Erik :-)

There's so much to learn, and I haven't even got to tuning the thing for best 
results.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Erik Hatcher 
To: solr-user@lucene.apache.org
Sent: Tue, February 1, 2011 9:24:24 AM
Subject: Re: chaning schema

the trick is, you have to remove the data/ directory, not just the data/index 
subdirectory.  and of course then restart Solr.

or delete *:*?commit=true, depending on what's the best fit for your ops.

Erik

On Feb 1, 2011, at 11:41 , Dennis Gearon wrote:

> I tried removing the index directory once, and tomcat refused to sart up 
>because 
>
> it didn't have a segments file.
> 
> 
> 
> 
> - Original Message 
> From: Erick Erickson 
> To: solr-user@lucene.apache.org
> Sent: Tue, February 1, 2011 5:04:51 AM
> Subject: Re: chaning schema
> 
> That sounds right. You can cheat and just remove /data/index
> rather than delete *:* though (you should probably do that with the Solr
> instance stopped)
> 
> Make sure to remove the directory "index" as well.
> 
> Best
> Erick
> 
> On Tue, Feb 1, 2011 at 1:27 AM, Dennis Gearon  wrote:
> 
>> Anyone got a great little script for changing a schema?
>> 
>> i.e., after changing:
>> database,
>> the view in the database for data import
>> the data-config.xml file
>> the schema.xml file
>> 
>> I BELIEVE that I have to run:
>> a delete command for the whole index *:*
>> a full import and optimize
>> 
>> This all sound right?
>> 
>> Dennis Gearon
>> 
>> 
>> Signature Warning
>> 
>> It is always a good idea to learn from your own mistakes. It is usually a
>> better
>> idea to learn from others’ mistakes, so you do not have to make them
>> yourself.
>> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>> 
>> 
>> EARTH has a Right To Life,
>> otherwise we all die.
>> 
>> 
>


Re: chaning schema

2011-02-01 Thread Erik Hatcher
the trick is, you have to remove the data/ directory, not just the data/index 
subdirectory.  and of course then restart Solr.

or delete *:*?commit=true, depending on what's the best fit for your ops.

Erik

On Feb 1, 2011, at 11:41 , Dennis Gearon wrote:

> I tried removing the index directory once, and tomcat refused to sart up 
> because 
> it didn't have a segments file.
> 
> 
> 
> 
> - Original Message 
> From: Erick Erickson 
> To: solr-user@lucene.apache.org
> Sent: Tue, February 1, 2011 5:04:51 AM
> Subject: Re: chaning schema
> 
> That sounds right. You can cheat and just remove /data/index
> rather than delete *:* though (you should probably do that with the Solr
> instance stopped)
> 
> Make sure to remove the directory "index" as well.
> 
> Best
> Erick
> 
> On Tue, Feb 1, 2011 at 1:27 AM, Dennis Gearon  wrote:
> 
>> Anyone got a great little script for changing a schema?
>> 
>> i.e., after changing:
>> database,
>> the view in the database for data import
>> the data-config.xml file
>> the schema.xml file
>> 
>> I BELIEVE that I have to run:
>> a delete command for the whole index *:*
>> a full import and optimize
>> 
>> This all sound right?
>> 
>> Dennis Gearon
>> 
>> 
>> Signature Warning
>> 
>> It is always a good idea to learn from your own mistakes. It is usually a
>> better
>> idea to learn from others’ mistakes, so you do not have to make them
>> yourself.
>> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>> 
>> 
>> EARTH has a Right To Life,
>> otherwise we all die.
>> 
>> 
> 



Lock obtain timed out: NativeFSLock

2011-02-01 Thread Alex Thurlow
I recently added a second core to my solr setup, and I'm now running 
into this "Lock obtain timed out" error when I try to update one core 
after I've updated another core.


In my update process, I add/update 1000 documents at a time and commit 
in between.  Then at the end, I commit and optimize.  The update of the 
new core has about 150k documents.  If I try to update the old core any 
time after updating the new core (even a couple hours later), I get the 
below error.  I've tried switching to the simple lock, but that didn't 
change anything.  I've tried this on solr 1.4 and 1.4.1 both with the 
spatial-solr-2.0-RC2 plugin loaded.


If I restart solr, I can then update the old core again.

Does anyone have any insight for me here?



Feb 1, 2011 10:59:57 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain 
timed out: 
NativeFSLock@./solr/data/index/lucene-088f283afa122cf05ce7eadb1b5ce07b-write.lock

at org.apache.lucene.store.Lock.obtain(Lock.java:85)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1545)
at 
org.apache.lucene.index.IndexWriter.(IndexWriter.java:1402)
at 
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:190)
at 
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
at 
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at 
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)

at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at 
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)

at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)





Re: chaning schema

2011-02-01 Thread Dennis Gearon
I tried removing the index directory once, and tomcat refused to sart up 
because 
it didn't have a segments file.

 


- Original Message 
From: Erick Erickson 
To: solr-user@lucene.apache.org
Sent: Tue, February 1, 2011 5:04:51 AM
Subject: Re: chaning schema

That sounds right. You can cheat and just remove /data/index
rather than delete *:* though (you should probably do that with the Solr
instance stopped)

Make sure to remove the directory "index" as well.

Best
Erick

On Tue, Feb 1, 2011 at 1:27 AM, Dennis Gearon  wrote:

> Anyone got a great little script for changing a schema?
>
> i.e., after changing:
>  database,
>  the view in the database for data import
>  the data-config.xml file
>  the schema.xml file
>
> I BELIEVE that I have to run:
>  a delete command for the whole index *:*
>  a full import and optimize
>
> This all sound right?
>
>  Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better
> idea to learn from others’ mistakes, so you do not have to make them
> yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>



Re: Malformed XML with exotic characters

2011-02-01 Thread Markus Jelsma
Hi,

There is no typical encoding issues on my system. I can index, query and 
display english, german, chinese, vietnamese etc.

Cheers

On Tuesday 01 February 2011 17:23:49 François Schiettecatte wrote:
> Markus
> 
> A few things to check, make sure whatever SOLR is hosted on is outputting
> utf-8 ( URIEncoding="UTF-8" in the Connector section in server.xml on
> Tomcat for example), which it looks like here, also make sure that
> whatever http header there is tells firefox that it is getting utf-8
> (otherwise it defaults to iso-8859-1/latin-1), finally make sure that
> whatever font you use in firefox has the 'exotic' characters you are
> expecting. There might also be some issues on your platform with mixing
> script direction but that is probably not likely.
> 
> Cheers
> 
> François
> 
> On Feb 1, 2011, at 10:43 AM, Markus Jelsma wrote:
> > There is an issue with the XML response writer. It cannot cope with some
> > very exotic characters or possibly the right-to-left writing systems.
> > The issue can be reproduced by indexing the content of the home page of
> > wikipedia as it contains a lot of exotic matter. The problem does not
> > affect the JSON response writer.
> > 
> > The problem is, i am unsure whether this is a bug in Solr or that perhaps
> > Firefox itself trips over.
> > 
> > 
> > Here's the output of the JSONResponeWriter for a query returning the home
> > page:
> > {
> > "responseHeader":{
> > 
> >  "status":0,
> >  "QTime":1,
> >  "params":{
> >  
> > "fl":"url,content",
> > "indent":"true",
> > "wt":"json",
> > "q":"*:*",
> > "rows":"1"}},
> > 
> > "response":{"numFound":6744,"start":0,"docs":[
> > 
> > {
> > 
> >  "url":"http://www.wikipedia.org/";,
> >  "content":"Wikipedia English The Free Encyclopedia 3 543 000+ articles
> >  日
> > 
> > 本語 フリー百科事典 730 000+ 記事 Deutsch Die freie Enzyklopädie 1 181 000+ Artikel
> > Español La enciclopedia libre 710 000+ artículos Français L’encyclopédie
> > libre 1 061 000+ articles Русский Свободная энциклопедия 654 000+ статей
> > Italiano L’enciclopedia libera 768 000+ voci Português A enciclopédia
> > livre 669 000+ artigos Polski Wolna encyklopedia 769 000+ haseł
> > Nederlands De vrije encyclopedie 668 000+ artikelen Search  • Suchen  •
> > Rechercher  • Szukaj  • Ricerca  • 検索  • Buscar  • Busca  • Zoeken  •
> > Поиск  • Sök  • 搜尋  • Cerca  • Søk  • Haku  • Пошук  • Hledání  •
> > Keresés  • Căutare  • 찾기  • Tìm kiếm  • Ara • Cari  • Søg  • بحث  •
> > Serĉu  • Претрага  • Paieška  • Hľadať  • Suk  • جستجو • חיפוש  •
> > Търсене  • Poišči  • Cari  • Bilnga العربية Български Català Česky Dansk
> > Deutsch English Español Esperanto فارسی Français 한국어 Bahasa Indonesia
> > Italiano עברית Lietuvių Magyar Bahasa Melayu Nederlands 日本語 Norsk
> > (bokmål) Polski Português Română Русский Slovenčina Slovenščina Српски /
> > Srpski Suomi Svenska Türkçe Українська Tiếng Việt Volapük Winaray 中文  
> > 100 000+   العربية • Български  • Català  • Česky  • Dansk  • Deutsch  •
> > English  • Español  • Esperanto  • فارسی  • Français  • 한국어  • Bahasa
> > Indonesia  • Italiano  • עברית • Lietuvių  • Magyar  • Bahasa Melayu  •
> > Nederlands  • 日本語  • Norsk (bokmål) • Polski  • Português  • Русский  •
> > Română  • Slovenčina  • Slovenščina  • Српски / Srpski  • Suomi  •
> > Svenska  • Türkçe  • Українська  • Tiếng Việt  • Volapük  • Winaray  •
> > 中文   10 000+   Afrikaans  • Aragonés  • Armãneashce  • Asturianu  •
> > Kreyòl Ayisyen  • Azərbaycan / آذربايجان ديلی  • বাংলা  • Беларуская (
> > Акадэмічная  • Тарашкевiца )  • বিষ্ণুপ্রিযা় মণিপুরী  • Bosanski  •
> > Brezhoneg  • Чăваш • Cymraeg  • Eesti  • Ελληνικά  • Euskara  • Frysk  •
> > Gaeilge  • Galego  • ગુજરાતી  • Հայերեն  • हिन्दी  • Hrvatski  • Ido  •
> > Íslenska  • Basa Jawa  • ಕನ್ನಡ  • ქართული  • Kurdî / كوردی  • Latina  •
> > Latviešu  • Lëtzebuergesch  • Lumbaart • Македонски  • മലയാളം  • मराठी 
> > • नेपाल भाषा  • नेपाली  • Norsk (nynorsk)  • Nnapulitano • Occitan  •
> > Piemontèis  • Plattdüütsch  • Ripoarisch  • Runa Simi  • شاہ مکھی پنجابی
> >  • Shqip  • Sicilianu  • Simple English  • Sinugboanon  • Srpskohrvatski
> > / Српскохрватски  • Basa Sunda  • Kiswahili  • Tagalog  • தமிழ் • తెలుగు
> >  • ไทย  • اردو  • Walon  • Yorùbá  • 粵語  • Žemaitėška   1 000+   Bahsa
> > Acèh  • Alemannisch  • አማርኛ  • Arpitan  • ܐܬܘܪܝܐ  • Avañe’ẽ  • Aymar Aru
> >  • Bân-lâm-gú  • Bahasa Banjar  • Basa Banyumasan  • Башҡорт  • भोजपुरी 
> > • Bikol Central  • Boarisch  • བོད་ཡིག  • Chavacano de Zamboanga  •
> > Corsu  • Deitsch  • ދިވެހި  • Diné Bizaad  • Eald Englisc  •
> > Emigliàn–Rumagnòl  • Эрзянь  • Estremeñu • Fiji Hindi  • Føroyskt  •
> > Furlan  • Gaelg  • Gàidhlig  • 贛語  • گیلکی  • Hak- kâ-fa / 客家話  • Хальмг
> >  • ʻŌlelo Hawaiʻi  • Hornjoserbsce  • Ilokano  • Interlingua  •
> > Interlingue  • Ирон Æвзаг  • Kapampangan  • Kaszëbsczi  • Kernewek  •
> > ភាសាខ្មែរ  • Kinyarwanda  • Коми  • Кыргызча  • Ladino / לאדינו  •
> > Ligure  • Limburgs  • Lingála  • lojban  • Malaga

Re: SOLR 1.4 and Lucene 3.0.3 index problem

2011-02-01 Thread Churchill Nanje Mambe
So I should use 1.4.1, and that is already built
what if I use solr 4 ?? from the source code do you know of any tutorial I
can use to learn how to build it using netbeans IDE ??
 I already have ant installed
 or you advice I go with the 1.4.1 ??

Mambe Churchill Nanje
237 33011349,
AfroVisioN Founder, President,CEO
http://www.afrovisiongroup.com | http://mambenanje.blogspot.com
skypeID: mambenanje
www.twitter.com/mambenanje



On Tue, Feb 1, 2011 at 5:18 PM, Koji Sekiguchi  wrote:

> (11/02/01 23:58), Churchill Nanje Mambe wrote:
>
>> am sorry
>>  I downloaded the solr released version as I dont know how to build solr
>> myself
>>  but I wrote my crawler with lucene 3.x
>>  now I need solr to search this index so I tried used the solr 1.4 I
>> downloaded from the site as the most recent version
>>  now I cant seem to read the index. I considered writing my own Servlet
>> RESTful API or SOAP webservice but I wish that solr can work so I dont go
>> through that stress of recreating what Solr already has
>>  so what am I to do ?
>>  do you have a higher version of solr that uses lucene 3.x ?? so I can
>> download ??
>>
>
> If I remember correctly, Lucene 2.9.4 can read Lucene 3.0 index.
> So if your index is written by Lucene 3.0 program, you can use
> Solr 1.4.1 with Lucene 2.9.4 libraries.
>
> Or simply use branch_3x, it can be downloaded by using subversion:
>
> $ svn co http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x
>
> Koji
> --
> http://www.rondhuit.com/en/
>


Re: Malformed XML with exotic characters

2011-02-01 Thread Markus Jelsma
It's throwing out a lot of disturbing messages:

select.xml:17: parser error : Char 0xD800 out of allowed range
ki  • Eʋegbe  • Frasch  • Fulfulde  • Gagauz  • Gĩkũyũ  • 
   ^
select.xml:17: parser error : PCDATA invalid Char value 55296
ki  • Eʋegbe  • Frasch  • Fulfulde  • Gagauz  • Gĩkũyũ  • 
   ^
select.xml:17: parser error : Char 0xDF32 out of allowed range
 • Eʋegbe  • Frasch  • Fulfulde  • Gagauz  • Gĩkũyũ  • �
   ^
select.xml:17: parser error : PCDATA invalid Char value 57138
 • Eʋegbe  • Frasch  • Fulfulde  • Gagauz  • Gĩkũyũ  • �
   ^
select.xml:17: parser error : Char 0xD800 out of allowed range
�� Eʋegbe  • Frasch  • Fulfulde  • Gagauz  • Gĩkũyũ  • ��
   ^
select.xml:17: parser error : PCDATA invalid Char value 55296
�� Eʋegbe  • Frasch  • Fulfulde  • Gagauz  • Gĩkũyũ  • ��
   ^
select.xml:17: parser error : Char 0xDF3F out of allowed range
Eʋegbe  • Frasch  • Fulfulde  • Gagauz  • Gĩkũyũ  • ���
   ^
select.xml:17: parser error : PCDATA invalid Char value 57151
Eʋegbe  • Frasch  • Fulfulde  • Gagauz  • Gĩkũyũ  • ���
   ^
select.xml:17: parser error : Char 0xD800 out of allowed range
egbe  • Frasch  • Fulfulde  • Gagauz  • Gĩkũyũ  • 
   ^
select.xml:17: parser error : PCDATA invalid Char value 55296
egbe  • Frasch  • Fulfulde  • Gagauz  • Gĩkũyũ  • 
   ^
select.xml:17: parser error : Char 0xDF44 out of allowed range
e  • Frasch  • Fulfulde  • Gagauz  • Gĩkũyũ  • �
   ^
select.xml:17: parser error : PCDATA invalid Char value 57156
e  • Frasch  • Fulfulde  • Gagauz  • Gĩkũyũ  • �
   ^
select.xml:17: parser error : Char 0xD800 out of allowed range
�• Frasch  • Fulfulde  • Gagauz  • Gĩkũyũ  • ��
   ^
select.xml:17: parser error : PCDATA invalid Char value 55296
�• Frasch  • Fulfulde  • Gagauz  • Gĩkũyũ  • ��
   ^
select.xml:17: parser error : Char 0xDF39 out of allowed range
� Frasch  • Fulfulde  • Gagauz  • Gĩkũyũ  • ���
   ^
select.xml:17: parser error : PCDATA invalid Char value 57145
� Frasch  • Fulfulde  • Gagauz  • Gĩkũyũ  • ���
   ^
select.xml:17: parser error : Char 0xD800 out of allowed range
rasch  • Fulfulde  • Gagauz  • Gĩkũyũ  • 
   ^
select.xml:17: parser error : PCDATA invalid Char value 55296
rasch  • Fulfulde  • Gagauz  • Gĩkũyũ  • 
   ^
select.xml:17: parser error : Char 0xDF43 out of allowed range
ch  • Fulfulde  • Gagauz  • Gĩkũyũ  • �
   ^
select.xml:17: parser error : PCDATA invalid Char value 57155
ch  • Fulfulde  • Gagauz  • Gĩkũyũ  • �
   ^
select.xml:17: parser error : Char 0xD800 out of allowed range
 • Fulfulde  • Gagauz  • Gĩkũyũ  • ��
   ^
select.xml:17: parser error : PCDATA invalid Char value 55296
 • Fulfulde  • Gagauz  • Gĩkũyũ  • ��
   ^
select.xml:17: parser error : Char 0xDF3A out of allowed range
�� Fulfulde  • Gagauz  • Gĩkũyũ  • ���
   ^
select.xml:17: parser error : PCDATA invalid Char value 57146
�� Fulfulde  • Gagauz  • Gĩkũyũ  • ���


On Tuesday 01 February 2011 17:00:19 Stefan Matheis wrote:
> Hi Markus,
> 
> to verify that it's not an Firefox-Issue, try xmllint on your shell to
> check the given xml?
> 
> Regards
> Stefan
> 
> On Tue, Feb 1, 2011 at 4:43 PM, Markus Jelsma
> 
>  wrote:
> > There is an issue with the XML response writer. It cannot 

Re: Malformed XML with exotic characters

2011-02-01 Thread François Schiettecatte
Markus 

A few things to check, make sure whatever SOLR is hosted on is outputting utf-8 
( URIEncoding="UTF-8" in the Connector section in server.xml on Tomcat for 
example), which it looks like here, also make sure that whatever http header 
there is tells firefox that it is getting utf-8 (otherwise it defaults to 
iso-8859-1/latin-1), finally make sure that whatever font you use in firefox 
has the 'exotic' characters you are expecting. There might also be some issues 
on your platform with mixing script direction but that is probably not likely.

Cheers

François

On Feb 1, 2011, at 10:43 AM, Markus Jelsma wrote:

> There is an issue with the XML response writer. It cannot cope with some very 
> exotic characters or possibly the right-to-left writing systems. The issue 
> can 
> be reproduced by indexing the content of the home page of wikipedia as it 
> contains a lot of exotic matter. The problem does not affect the JSON 
> response 
> writer.
> 
> The problem is, i am unsure whether this is a bug in Solr or that perhaps 
> Firefox itself trips over.
> 
> 
> Here's the output of the JSONResponeWriter for a query returning the home 
> page:
> {
> "responseHeader":{
>  "status":0,
>  "QTime":1,
>  "params":{
>   "fl":"url,content",
>   "indent":"true",
>   "wt":"json",
>   "q":"*:*",
>   "rows":"1"}},
> "response":{"numFound":6744,"start":0,"docs":[
>   {
>"url":"http://www.wikipedia.org/";,
>"content":"Wikipedia English The Free Encyclopedia 3 543 000+ articles 
> 日
> 本語 フリー百科事典 730 000+ 記事 Deutsch Die freie Enzyklopädie 1 181 000+ Artikel 
> Español La enciclopedia libre 710 000+ artículos Français L’encyclopédie 
> libre 
> 1 061 000+ articles Русский Свободная энциклопедия 654 000+ статей Italiano 
> L’enciclopedia libera 768 000+ voci Português A enciclopédia livre 669 000+ 
> artigos Polski Wolna encyklopedia 769 000+ haseł Nederlands De vrije 
> encyclopedie 668 000+ artikelen Search  • Suchen  • Rechercher  • Szukaj  • 
> Ricerca  • 検索  • Buscar  • Busca  • Zoeken  • Поиск  • Sök  • 搜尋  • Cerca  • 
> Søk  • Haku  • Пошук  • Hledání  • Keresés  • Căutare  • 찾기  • Tìm kiếm  • 
> Ara  
> • Cari  • Søg  • بحث  • Serĉu  • Претрага  • Paieška  • Hľadať  • Suk  • 
> جستجو  
> • חיפוש  • Търсене  • Poišči  • Cari  • Bilnga العربية Български Català Česky 
> Dansk Deutsch English Español Esperanto فارسی Français 한국어 Bahasa Indonesia 
> Italiano עברית Lietuvių Magyar Bahasa Melayu Nederlands 日本語 Norsk (bokmål) 
> Polski Português Română Русский Slovenčina Slovenščina Српски / Srpski Suomi 
> Svenska Türkçe Українська Tiếng Việt Volapük Winaray 中文   100 000+   العربية  
> • Български  • Català  • Česky  • Dansk  • Deutsch  • English  • Español  • 
> Esperanto  • فارسی  • Français  • 한국어  • Bahasa Indonesia  • Italiano  • 
> עברית  
> • Lietuvių  • Magyar  • Bahasa Melayu  • Nederlands  • 日本語  • Norsk (bokmål)  
> • Polski  • Português  • Русский  • Română  • Slovenčina  • Slovenščina  • 
> Српски / Srpski  • Suomi  • Svenska  • Türkçe  • Українська  • Tiếng Việt  • 
> Volapük  • Winaray  • 中文   10 000+   Afrikaans  • Aragonés  • Armãneashce  • 
> Asturianu  • Kreyòl Ayisyen  • Azərbaycan / آذربايجان ديلی  • বাংলা  • 
> Беларуская 
> ( Акадэмічная  • Тарашкевiца )  • বিষ্ণুপ্রিযা় মণিপুরী  • Bosanski  • 
> Brezhoneg  • Чăваш  
> • Cymraeg  • Eesti  • Ελληνικά  • Euskara  • Frysk  • Gaeilge  • Galego  • 
> ગુજરાતી  • Հայերեն  • हिन्दी  • Hrvatski  • Ido  • Íslenska  • Basa Jawa  • 
> ಕನ್ನಡ  • 
> ქართული  • Kurdî / كوردی  • Latina  • Latviešu  • Lëtzebuergesch  • Lumbaart  
> • Македонски  • മലയാളം  • मराठी  • नेपाल भाषा  • नेपाली  • Norsk (nynorsk)  • 
> Nnapulitano  
> • Occitan  • Piemontèis  • Plattdüütsch  • Ripoarisch  • Runa Simi  • شاہ 
> مکھی 
> پنجابی  • Shqip  • Sicilianu  • Simple English  • Sinugboanon  • 
> Srpskohrvatski / Српскохрватски  • Basa Sunda  • Kiswahili  • Tagalog  • 
> தமிழ்  
> • తెలుగు  • ไทย  • اردو  • Walon  • Yorùbá  • 粵語  • Žemaitėška   1 000+   
> Bahsa 
> Acèh  • Alemannisch  • አማርኛ  • Arpitan  • ܐܬܘܪܝܐ  • Avañe’ẽ  • Aymar Aru  • 
> Bân-lâm-gú  • Bahasa Banjar  • Basa Banyumasan  • Башҡорт  • भोजपुरी  • Bikol 
> Central  • Boarisch  • བོད་ཡིག  • Chavacano de Zamboanga  • Corsu  • Deitsch  
> • 
> ދިވެހި  • Diné Bizaad  • Eald Englisc  • Emigliàn–Rumagnòl  • Эрзянь  • 
> Estremeñu  
> • Fiji Hindi  • Føroyskt  • Furlan  • Gaelg  • Gàidhlig  • 贛語  • گیلکی  • Hak-
> kâ-fa / 客家話  • Хальмг  • ʻŌlelo Hawaiʻi  • Hornjoserbsce  • Ilokano  • 
> Interlingua  • Interlingue  • Ирон Æвзаг  • Kapampangan  • Kaszëbsczi  • 
> Kernewek  • ភាសាខ្មែរ  • Kinyarwanda  • Коми  • Кыргызча  • Ladino / לאדינו  
> • 
> Ligure  • Limburgs  • Lingála  • lojban  • Malagasy  • Malti  • 文言  • Māori  
> • 
> مصرى  • مازِرونی / Mäzeruni  • Монгол  • မြန်မာဘာသာ  • Nāhuatlahtōlli  • 
> Nedersaksisch  • Nouormand  • Novial  • Нохчийн  • Олык Марий  • O‘zbek  • 
> पाऴि  
> • Pangasinán  • ਪੰਜਾਬੀ / پنجابی  • Papiamentu  • پښتو  • Picard  • Къарачай–
> Малкъар  • Қазақша  •

Re: SOLR 1.4 and Lucene 3.0.3 index problem

2011-02-01 Thread Peter Karich
 solr 1.4.x uses 2.9.x of lucene

you could try the trunk which uses lucene 3.0.3 and should be compatible
if I'm correct

Regards,
Peter.
> I have the exact opposite problem where Luke won't even load the index but 
> Solr starts fine. I believe there are major differences between the two 
> indexes that are causing all these issues.
>
> Adam
>
>
>
> On Feb 1, 2011, at 6:28 AM, Churchill Nanje Mambe 
>  wrote:
>
>> hi guys
>> I have developed a java crawler and integrated the lucene 3.0.3 API into it
>> so it creates a Lucene.
>> now I wish to search this lucene index using solr, I tried to configure the
>> solrconfig.xml and schema.xml, everything seems to be fine
>> but then solr told me the index is corrupt but I use luke and I am able to
>> browse the index and perform searches and other things on it
>> can someone help me which solr can wrap around a lucene 3.0.3 index ??
>> regards
>>
>> Mambe Churchill Nanje
>> 237 33011349,
>> AfroVisioN Founder, President,CEO
>> http://www.afrovisiongroup.com | http://mambenanje.blogspot.com
>> skypeID: mambenanje
>> www.twitter.com/mambenanje


-- 
http://jetwick.com open twitter search



Re: SOLR 1.4 and Lucene 3.0.3 index problem

2011-02-01 Thread Koji Sekiguchi

(11/02/01 23:58), Churchill Nanje Mambe wrote:

am sorry
  I downloaded the solr released version as I dont know how to build solr
myself
  but I wrote my crawler with lucene 3.x
  now I need solr to search this index so I tried used the solr 1.4 I
downloaded from the site as the most recent version
  now I cant seem to read the index. I considered writing my own Servlet
RESTful API or SOAP webservice but I wish that solr can work so I dont go
through that stress of recreating what Solr already has
  so what am I to do ?
  do you have a higher version of solr that uses lucene 3.x ?? so I can
download ??


If I remember correctly, Lucene 2.9.4 can read Lucene 3.0 index.
So if your index is written by Lucene 3.0 program, you can use
Solr 1.4.1 with Lucene 2.9.4 libraries.

Or simply use branch_3x, it can be downloaded by using subversion:

$ svn co http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x

Koji
--
http://www.rondhuit.com/en/


Solr Distributed Search "start parameter" limitation

2011-02-01 Thread onlinespend...@gmail.com
If you look at the Solr wiki, one of the limitations of distributed
searching it mentions is with regards to the start parameter.

http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations

"Makes it more inefficient to use a high "start" parameter. For example, if
you request start=50&rows=25 on an index with 500,000+ docs per shard,
this will currently result in 500,000 records getting sent over the network
from the shard to the coordinating Solr instance. If you had a single-shard
index, in contrast, only 25 records would ever get sent over the network."

While I may not have a start parameter of 500,000, I could easily have one
of 50,000, and it concerns me the hit in performance I may take when using
such a high start parameter with distributed searching. I would use this if
the user had issued a search query that resulted in say 50,000+ matches. I
may only display 40 matches per web page, with the user having the ability
to "jump" to the end of the results. So specifying a high start parameter is
certainly likely, and I know this sort of scenario is common for a lot of
websites. Are there tricks that can be played to avoid the performance hit
associated with specifying a high start parameter when doing distributed
searching?

Thanks,
Ben


Re: Malformed XML with exotic characters

2011-02-01 Thread Stefan Matheis
Hi Markus,

to verify that it's not an Firefox-Issue, try xmllint on your shell to
check the given xml?

Regards
Stefan

On Tue, Feb 1, 2011 at 4:43 PM, Markus Jelsma
 wrote:
> There is an issue with the XML response writer. It cannot cope with some very
> exotic characters or possibly the right-to-left writing systems. The issue can
> be reproduced by indexing the content of the home page of wikipedia as it
> contains a lot of exotic matter. The problem does not affect the JSON response
> writer.
>
> The problem is, i am unsure whether this is a bug in Solr or that perhaps
> Firefox itself trips over.
>
>
> Here's the output of the JSONResponeWriter for a query returning the home
> page:
> {
>  "responseHeader":{
>  "status":0,
>  "QTime":1,
>  "params":{
>        "fl":"url,content",
>        "indent":"true",
>        "wt":"json",
>        "q":"*:*",
>        "rows":"1"}},
>  "response":{"numFound":6744,"start":0,"docs":[
>        {
>         "url":"http://www.wikipedia.org/";,
>         "content":"Wikipedia English The Free Encyclopedia 3 543 000+ 
> articles 日
> 本語 フリー百科事典 730 000+ 記事 Deutsch Die freie Enzyklopädie 1 181 000+ Artikel
> Español La enciclopedia libre 710 000+ artículos Français L’encyclopédie libre
> 1 061 000+ articles Русский Свободная энциклопедия 654 000+ статей Italiano
> L’enciclopedia libera 768 000+ voci Português A enciclopédia livre 669 000+
> artigos Polski Wolna encyklopedia 769 000+ haseł Nederlands De vrije
> encyclopedie 668 000+ artikelen Search  • Suchen  • Rechercher  • Szukaj  •
> Ricerca  • 検索  • Buscar  • Busca  • Zoeken  • Поиск  • Sök  • 搜尋  • Cerca  •
> Søk  • Haku  • Пошук  • Hledání  • Keresés  • Căutare  • 찾기  • Tìm kiếm  • Ara
> • Cari  • Søg  • بحث  • Serĉu  • Претрага  • Paieška  • Hľadať  • Suk  • جستجو
> • חיפוש  • Търсене  • Poišči  • Cari  • Bilnga العربية Български Català Česky
> Dansk Deutsch English Español Esperanto فارسی Français 한국어 Bahasa Indonesia
> Italiano עברית Lietuvių Magyar Bahasa Melayu Nederlands 日本語 Norsk (bokmål)
> Polski Português Română Русский Slovenčina Slovenščina Српски / Srpski Suomi
> Svenska Türkçe Українська Tiếng Việt Volapük Winaray 中文   100 000+   العربية
> • Български  • Català  • Česky  • Dansk  • Deutsch  • English  • Español  •
> Esperanto  • فارسی  • Français  • 한국어  • Bahasa Indonesia  • Italiano  • עברית
> • Lietuvių  • Magyar  • Bahasa Melayu  • Nederlands  • 日本語  • Norsk (bokmål)
> • Polski  • Português  • Русский  • Română  • Slovenčina  • Slovenščina  •
> Српски / Srpski  • Suomi  • Svenska  • Türkçe  • Українська  • Tiếng Việt  •
> Volapük  • Winaray  • 中文   10 000+   Afrikaans  • Aragonés  • Armãneashce  •
> Asturianu  • Kreyòl Ayisyen  • Azərbaycan / آذربايجان ديلی  • বাংলা  • 
> Беларуская
> ( Акадэмічная  • Тарашкевiца )  • বিষ্ণুপ্রিযা় মণিপুরী  • Bosanski  • 
> Brezhoneg  • Чăваш
> • Cymraeg  • Eesti  • Ελληνικά  • Euskara  • Frysk  • Gaeilge  • Galego  •
> ગુજરાતી  • Հայերեն  • हिन्दी  • Hrvatski  • Ido  • Íslenska  • Basa Jawa  • 
> ಕನ್ನಡ  •
> ქართული  • Kurdî / كوردی  • Latina  • Latviešu  • Lëtzebuergesch  • Lumbaart
> • Македонски  • മലയാളം  • मराठी  • नेपाल भाषा  • नेपाली  • Norsk (nynorsk)  • 
> Nnapulitano
> • Occitan  • Piemontèis  • Plattdüütsch  • Ripoarisch  • Runa Simi  • شاہ مکھی
> پنجابی  • Shqip  • Sicilianu  • Simple English  • Sinugboanon  •
> Srpskohrvatski / Српскохрватски  • Basa Sunda  • Kiswahili  • Tagalog  • தமிழ்
> • తెలుగు  • ไทย  • اردو  • Walon  • Yorùbá  • 粵語  • Žemaitėška   1 000+   
> Bahsa
> Acèh  • Alemannisch  • አማርኛ  • Arpitan  • ܐܬܘܪܝܐ  • Avañe’ẽ  • Aymar Aru  •
> Bân-lâm-gú  • Bahasa Banjar  • Basa Banyumasan  • Башҡорт  • भोजपुरी  • Bikol
> Central  • Boarisch  • བོད་ཡིག  • Chavacano de Zamboanga  • Corsu  • Deitsch  
> •
> ދިވެހި  • Diné Bizaad  • Eald Englisc  • Emigliàn–Rumagnòl  • Эрзянь  • 
> Estremeñu
> • Fiji Hindi  • Føroyskt  • Furlan  • Gaelg  • Gàidhlig  • 贛語  • گیلکی  • Hak-
> kâ-fa / 客家話  • Хальмг  • ʻŌlelo Hawaiʻi  • Hornjoserbsce  • Ilokano  •
> Interlingua  • Interlingue  • Ирон Æвзаг  • Kapampangan  • Kaszëbsczi  •
> Kernewek  • ភាសាខ្មែរ  • Kinyarwanda  • Коми  • Кыргызча  • Ladino / לאדינו  •
> Ligure  • Limburgs  • Lingála  • lojban  • Malagasy  • Malti  • 文言  • Māori  •
> مصرى  • مازِرونی / Mäzeruni  • Монгол  • မြန်မာဘာသာ  • Nāhuatlahtōlli  •
> Nedersaksisch  • Nouormand  • Novial  • Нохчийн  • Олык Марий  • O‘zbek  • 
> पाऴि
> • Pangasinán  • ਪੰਜਾਬੀ / پنجابی  • Papiamentu  • پښتو  • Picard  • Къарачай–
> Малкъар  • Қазақша  • Qırımtatarca  • Rumantsch  • Русиньскый Язык  • 
> संस्कृतम्  •
> Sámegiella  • Sardu  • Саха Тыла  • Scots  • Seeltersk  • සිංහල  • Ślůnski  • 
> Af
> Soomaali  • کوردی  • Tarandíne  • Татарча / Tatarça  • Тоҷикӣ  • Lea faka-
> Tonga  • Türkmen  • Удмурт  • ᨅᨔ ᨕᨙᨁᨗ  • Uyghur / ئۇيغۇرچه  • Vèneto  • Võro  
> •
> West-Vlams  • Wolof  • 吴语  • ייִדיש  • Zazaki   100+   Akan  • Аҧсуа  • Авар  
> •
> Bamanankan  • Bislama  • Буряад  • Chamoru  • Chichewa  • Cuengh  •
> Dolnoserbski  • Eʋegbe  • Frasch  • Fulfulde  • Gagauz  • Gĩk

Malformed XML with exotic characters

2011-02-01 Thread Markus Jelsma
There is an issue with the XML response writer. It cannot cope with some very 
exotic characters or possibly the right-to-left writing systems. The issue can 
be reproduced by indexing the content of the home page of wikipedia as it 
contains a lot of exotic matter. The problem does not affect the JSON response 
writer.

The problem is, i am unsure whether this is a bug in Solr or that perhaps 
Firefox itself trips over.


Here's the output of the JSONResponeWriter for a query returning the home 
page:
{
 "responseHeader":{
  "status":0,
  "QTime":1,
  "params":{
"fl":"url,content",
"indent":"true",
"wt":"json",
"q":"*:*",
"rows":"1"}},
 "response":{"numFound":6744,"start":0,"docs":[
{
 "url":"http://www.wikipedia.org/";,
 "content":"Wikipedia English The Free Encyclopedia 3 543 000+ articles 
日
本語 フリー百科事典 730 000+ 記事 Deutsch Die freie Enzyklopädie 1 181 000+ Artikel 
Español La enciclopedia libre 710 000+ artículos Français L’encyclopédie libre 
1 061 000+ articles Русский Свободная энциклопедия 654 000+ статей Italiano 
L’enciclopedia libera 768 000+ voci Português A enciclopédia livre 669 000+ 
artigos Polski Wolna encyklopedia 769 000+ haseł Nederlands De vrije 
encyclopedie 668 000+ artikelen Search  • Suchen  • Rechercher  • Szukaj  • 
Ricerca  • 検索  • Buscar  • Busca  • Zoeken  • Поиск  • Sök  • 搜尋  • Cerca  • 
Søk  • Haku  • Пошук  • Hledání  • Keresés  • Căutare  • 찾기  • Tìm kiếm  • Ara  
• Cari  • Søg  • بحث  • Serĉu  • Претрага  • Paieška  • Hľadať  • Suk  • جستجو  
• חיפוש  • Търсене  • Poišči  • Cari  • Bilnga العربية Български Català Česky 
Dansk Deutsch English Español Esperanto فارسی Français 한국어 Bahasa Indonesia 
Italiano עברית Lietuvių Magyar Bahasa Melayu Nederlands 日本語 Norsk (bokmål) 
Polski Português Română Русский Slovenčina Slovenščina Српски / Srpski Suomi 
Svenska Türkçe Українська Tiếng Việt Volapük Winaray 中文   100 000+   العربية  
• Български  • Català  • Česky  • Dansk  • Deutsch  • English  • Español  • 
Esperanto  • فارسی  • Français  • 한국어  • Bahasa Indonesia  • Italiano  • עברית  
• Lietuvių  • Magyar  • Bahasa Melayu  • Nederlands  • 日本語  • Norsk (bokmål)  
• Polski  • Português  • Русский  • Română  • Slovenčina  • Slovenščina  • 
Српски / Srpski  • Suomi  • Svenska  • Türkçe  • Українська  • Tiếng Việt  • 
Volapük  • Winaray  • 中文   10 000+   Afrikaans  • Aragonés  • Armãneashce  • 
Asturianu  • Kreyòl Ayisyen  • Azərbaycan / آذربايجان ديلی  • বাংলা  • 
Беларуская 
( Акадэмічная  • Тарашкевiца )  • বিষ্ণুপ্রিযা় মণিপুরী  • Bosanski  • 
Brezhoneg  • Чăваш  
• Cymraeg  • Eesti  • Ελληνικά  • Euskara  • Frysk  • Gaeilge  • Galego  • 
ગુજરાતી  • Հայերեն  • हिन्दी  • Hrvatski  • Ido  • Íslenska  • Basa Jawa  • 
ಕನ್ನಡ  • 
ქართული  • Kurdî / كوردی  • Latina  • Latviešu  • Lëtzebuergesch  • Lumbaart  
• Македонски  • മലയാളം  • मराठी  • नेपाल भाषा  • नेपाली  • Norsk (nynorsk)  • 
Nnapulitano  
• Occitan  • Piemontèis  • Plattdüütsch  • Ripoarisch  • Runa Simi  • شاہ مکھی 
پنجابی  • Shqip  • Sicilianu  • Simple English  • Sinugboanon  • 
Srpskohrvatski / Српскохрватски  • Basa Sunda  • Kiswahili  • Tagalog  • தமிழ்  
• తెలుగు  • ไทย  • اردو  • Walon  • Yorùbá  • 粵語  • Žemaitėška   1 000+   Bahsa 
Acèh  • Alemannisch  • አማርኛ  • Arpitan  • ܐܬܘܪܝܐ  • Avañe’ẽ  • Aymar Aru  • 
Bân-lâm-gú  • Bahasa Banjar  • Basa Banyumasan  • Башҡорт  • भोजपुरी  • Bikol 
Central  • Boarisch  • བོད་ཡིག  • Chavacano de Zamboanga  • Corsu  • Deitsch  • 
ދިވެހި  • Diné Bizaad  • Eald Englisc  • Emigliàn–Rumagnòl  • Эрзянь  • 
Estremeñu  
• Fiji Hindi  • Føroyskt  • Furlan  • Gaelg  • Gàidhlig  • 贛語  • گیلکی  • Hak-
kâ-fa / 客家話  • Хальмг  • ʻŌlelo Hawaiʻi  • Hornjoserbsce  • Ilokano  • 
Interlingua  • Interlingue  • Ирон Æвзаг  • Kapampangan  • Kaszëbsczi  • 
Kernewek  • ភាសាខ្មែរ  • Kinyarwanda  • Коми  • Кыргызча  • Ladino / לאדינו  • 
Ligure  • Limburgs  • Lingála  • lojban  • Malagasy  • Malti  • 文言  • Māori  • 
مصرى  • مازِرونی / Mäzeruni  • Монгол  • မြန်မာဘာသာ  • Nāhuatlahtōlli  • 
Nedersaksisch  • Nouormand  • Novial  • Нохчийн  • Олык Марий  • O‘zbek  • पाऴि 
 
• Pangasinán  • ਪੰਜਾਬੀ / پنجابی  • Papiamentu  • پښتو  • Picard  • Къарачай–
Малкъар  • Қазақша  • Qırımtatarca  • Rumantsch  • Русиньскый Язык  • संस्कृतम् 
 • 
Sámegiella  • Sardu  • Саха Тыла  • Scots  • Seeltersk  • සිංහල  • Ślůnski  • 
Af 
Soomaali  • کوردی  • Tarandíne  • Татарча / Tatarça  • Тоҷикӣ  • Lea faka-
Tonga  • Türkmen  • Удмурт  • ᨅᨔ ᨕᨙᨁᨗ  • Uyghur / ئۇيغۇرچه  • Vèneto  • Võro  • 
West-Vlams  • Wolof  • 吴语  • ייִדיש  • Zazaki   100+   Akan  • Аҧсуа  • Авар  • 
Bamanankan  • Bislama  • Буряад  • Chamoru  • Chichewa  • Cuengh  • 
Dolnoserbski  • Eʋegbe  • Frasch  • Fulfulde  • Gagauz  • Gĩkũyũ  • 
  • Hausa / هَوُسَا  • Igbo  • ᐃᓄᒃᑎᑐᑦ / Inuktitut  • Iñupiak  • 
Kalaallisut  • कश्मीरी / كشميري  • Kongo  • Кырык Мары  • ພາສາລາວ  • Лакку  • 
Luganda  • Mìng-dĕ̤ng-ngṳ̄  • Mirandés  • Мокшень  • Молдовеняскэ  • Na Vosa 
Vaka-Viti  • Dorerin Naoero  • Nēhiya

Next steps in loading plug-in

2011-02-01 Thread McGibbney, Lewis John
Hi list,

Having had a thorough look at the wiki over the weekend and doing some testing 
myself I have some additional questions regarding loading my plug-in to Solr. 
Taking the 'Old Way' to loading plug-ins, I have JARred up the relevant classes 
and added the JAR to the web app WEB-INF/lib dir. I am unsure of next steps to 
take as my plug-in has extension properties (which specify web-based OWL files 
which I wish to use whenever the plug-in is invoked). My main question would be 
where I would include these config properties? My initial thoughts are that 
they would be included within  WEB-INF/web.xml but I am unsure as to how to 
include them. I have had a good look at web.xml and think that they could be 
included as 's but this is solely due to my lack of knowledge in 
this situation.

Thank you

Lewis


Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education's Widening Participation Initiative of the Year 
2009 and Herald Society's Education Initiative of the Year 2009
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html


Re: Solr for noSQL

2011-02-01 Thread openvictor Open
Hi All I don't know if it answers any of your question but if you are
interested by that check out :

Lucandra ( Cassandra + Lucene)



2011/2/1 Steven Noels 

> On Tue, Feb 1, 2011 at 11:52 AM, Upayavira  wrote:
>
>
> >
> > Apologies if my "nothing funky" sounded like you weren't doing cool
> > stuff.
>
>
> No offense whatsoever. I think my longer reply paints a more accurate light
> on what Lily means in terms of "SOLR for NoSQL", and it was your reaction
> who triggered this additional explanation.
>
>
> > I was merely attempting to say that I very much doubt you were
> > doing anything funky like putting HBase underneath Solr as a replacement
> > of FSDirectory.
>
>
> There are some initiatives in the context of Cassandra IIRC, as well as a
> project which stores Lucene index files in HBase tables, but frankly they
> seem more experimentation, and also I think the nature of how Lucene/SOLR
> works + what HBase does on top of Hadoop FS somehow is in conflict with
> each
> other. Too many layers of indirection will kill performance on every layer.
>
>
>
> > I was trying to imply that, likely your integration with
> > Solr was relatively conventional (interacting with its REST interface),
> >
>
>
> Yep. We figured that was the wiser road to walk, and leaves a clear-defined
> interface and possible area of improvement against a too-low level of
> integration.
>
>
> > and the "funky" stuff that you are doing sits outside of that space.
> >
> > Hope that's a clearer (and more accurate?) attempt at what I was trying
> > to say.
> >
> > Upayavira (who finds the Lily project interesting, and would love to
> > find the time to play with it)
> >
>
> Anytime, Upayavira. Anytime! ;-)
>
> Steven.
> --
> Steven Noels
> http://outerthought.org/
> Scalable Smart Data
> Makers of Kauri, Daisy CMS and Lily
>


Re: Terms and termscomponent questions

2011-02-01 Thread openvictor Open
Dear Erick,

Thank you for your answer, here is my fieldtype definition. I took the
standard one because I don't need a better one for this field



















Now my field :



But I have a doubt now... Do I really put a space between words or is it
just a coma... If I only put a coma then the whole process is going to be
impacted ? What I don't really understand is that I find the separate words,
but also their concatenation (but again in one direction only). Let me
explain : if a have "man" "bear" "pig" I will find :
"manbearpig" "bearpig" but never pigman or anyother combination in a
different order.

Thank you very much
Best Regards,
Victor

2011/2/1 Erick Erickson 

> Nope, this isn't what I'd expect. There are a couple of possibilities:
> 1> check out what WordDelimiterFilterFactory is doing, although
> if you're really sending spaces that's probably not it.
> 2> Let's see the  and  definitions for the field
> in question. type="text" doesn't say anything about analysis,
> and that's where I'd expect you're having trouble. In particular
> if your analysis chain uses KeywordTokenizerFactory for instance.
> 3> Look at the admin/schema browse page, look at your field and
> see what the actual tokens are. That'll tell you what TermsComponents
> is returning, perhaps the concatenation is happening somewhere
> else.
>
> Bottom line: Solr will not concatenate terms like this unless you tell it
> to,
> so I suspect you're telling it to, you just don't realize it ...
>
> Best
> Erick
>
> On Tue, Feb 1, 2011 at 1:33 AM, openvictor Open  >wrote:
>
> > Dear Solr users,
> >
> > I am currently using SolR and TermsComponents to make an auto suggest for
> > my
> > website.
> >
> > I have a field called p_field indexed and stored with type="text" in the
> > schema xml. Nothing out of the usual.
> > I feed to Solr a set of words separated by a coma and a space such as
> (for
> > two documents) :
> >
> > Document 1:
> > word11, word12, word13. word14
> >
> > Document 2:
> > word21, word22, word23. word24
> >
> >
> > When I use my newly designed field I get things for the prefix "word1" :
> > word11, word12, word13. word14 word11word12 word11word13 etc...
> > Is it normal to have the concatenation of words and not only the words
> > indexed ? Did I miss something about Terms ?
> >
> > Thank you very much,
> > Best regards all,
> > Victor
> >
>


Re: SOLR 1.4 and Lucene 3.0.3 index problem

2011-02-01 Thread Churchill Nanje Mambe
am sorry
 I downloaded the solr released version as I dont know how to build solr
myself
 but I wrote my crawler with lucene 3.x
 now I need solr to search this index so I tried used the solr 1.4 I
downloaded from the site as the most recent version
 now I cant seem to read the index. I considered writing my own Servlet
RESTful API or SOAP webservice but I wish that solr can work so I dont go
through that stress of recreating what Solr already has
 so what am I to do ?
 do you have a higher version of solr that uses lucene 3.x ?? so I can
download ??
regards

Mambe Churchill Nanje
237 33011349,
AfroVisioN Founder, President,CEO
http://www.afrovisiongroup.com | http://mambenanje.blogspot.com
skypeID: mambenanje
www.twitter.com/mambenanje



On Tue, Feb 1, 2011 at 3:53 PM, Upayavira  wrote:

> What problem are you trying to solve by using a Lucene 3.x index within
> a Solr 1.4 system?
>
> Upayavira
>
> On Tue, 01 Feb 2011 14:59 +0100, "Churchill Nanje Mambe"
>  wrote:
> > is there any way I can change the lucene version wrapped in side solr 1.4
> > from lucene 2.x to lucene 3.x.
> >  any tutorials as I am guessing thats where the index data doesnt match.
> >  something I also found out is that solr 1.4 expects the index to be
> > luce_index_folder/index while lucene 3.x index is just the folder
> > lucene_index_folder
> >  in my case its crawl_data/ for lucene but solr 1.4 is expect
> > crawl_data/index and when I point to this in solrconfig.xml it auto
> > creates
> > crawl_data/index
> >
> > I badly need this help
> >
> > Mambe Churchill Nanje
> > 237 33011349,
> > AfroVisioN Founder, President,CEO
> > http://www.afrovisiongroup.com | http://mambenanje.blogspot.com
> > skypeID: mambenanje
> > www.twitter.com/mambenanje
> >
> >
> >
> > On Tue, Feb 1, 2011 at 2:52 PM, Estrada Groups <
> > estrada.adam.gro...@gmail.com> wrote:
> >
> > > I have the exact opposite problem where Luke won't even load the index
> but
> > > Solr starts fine. I believe there are major differences between the two
> > > indexes that are causing all these issues.
> > >
> > > Adam
> > >
> > >
> > >
> > > On Feb 1, 2011, at 6:28 AM, Churchill Nanje Mambe <
> > > mambena...@afrovisiongroup.com> wrote:
> > >
> > > > hi guys
> > > > I have developed a java crawler and integrated the lucene 3.0.3 API
> into
> > > it
> > > > so it creates a Lucene.
> > > > now I wish to search this lucene index using solr, I tried to
> configure
> > > the
> > > > solrconfig.xml and schema.xml, everything seems to be fine
> > > > but then solr told me the index is corrupt but I use luke and I am
> able
> > > to
> > > > browse the index and perform searches and other things on it
> > > > can someone help me which solr can wrap around a lucene 3.0.3 index
> ??
> > > > regards
> > > >
> > > > Mambe Churchill Nanje
> > > > 237 33011349,
> > > > AfroVisioN Founder, President,CEO
> > > > http://www.afrovisiongroup.com | http://mambenanje.blogspot.com
> > > > skypeID: mambenanje
> > > > www.twitter.com/mambenanje
> > >
> >
> ---
> Enterprise Search Consultant at Sourcesense UK,
> Making Sense of Open Source
>
>


Re: SOLR 1.4 and Lucene 3.0.3 index problem

2011-02-01 Thread Upayavira
What problem are you trying to solve by using a Lucene 3.x index within
a Solr 1.4 system?

Upayavira

On Tue, 01 Feb 2011 14:59 +0100, "Churchill Nanje Mambe"
 wrote:
> is there any way I can change the lucene version wrapped in side solr 1.4
> from lucene 2.x to lucene 3.x.
>  any tutorials as I am guessing thats where the index data doesnt match.
>  something I also found out is that solr 1.4 expects the index to be
> luce_index_folder/index while lucene 3.x index is just the folder
> lucene_index_folder
>  in my case its crawl_data/ for lucene but solr 1.4 is expect
> crawl_data/index and when I point to this in solrconfig.xml it auto
> creates
> crawl_data/index
> 
> I badly need this help
> 
> Mambe Churchill Nanje
> 237 33011349,
> AfroVisioN Founder, President,CEO
> http://www.afrovisiongroup.com | http://mambenanje.blogspot.com
> skypeID: mambenanje
> www.twitter.com/mambenanje
> 
> 
> 
> On Tue, Feb 1, 2011 at 2:52 PM, Estrada Groups <
> estrada.adam.gro...@gmail.com> wrote:
> 
> > I have the exact opposite problem where Luke won't even load the index but
> > Solr starts fine. I believe there are major differences between the two
> > indexes that are causing all these issues.
> >
> > Adam
> >
> >
> >
> > On Feb 1, 2011, at 6:28 AM, Churchill Nanje Mambe <
> > mambena...@afrovisiongroup.com> wrote:
> >
> > > hi guys
> > > I have developed a java crawler and integrated the lucene 3.0.3 API into
> > it
> > > so it creates a Lucene.
> > > now I wish to search this lucene index using solr, I tried to configure
> > the
> > > solrconfig.xml and schema.xml, everything seems to be fine
> > > but then solr told me the index is corrupt but I use luke and I am able
> > to
> > > browse the index and perform searches and other things on it
> > > can someone help me which solr can wrap around a lucene 3.0.3 index ??
> > > regards
> > >
> > > Mambe Churchill Nanje
> > > 237 33011349,
> > > AfroVisioN Founder, President,CEO
> > > http://www.afrovisiongroup.com | http://mambenanje.blogspot.com
> > > skypeID: mambenanje
> > > www.twitter.com/mambenanje
> >
> 
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source



Re: escaping parenthesis in search query don't work...

2011-02-01 Thread shan2812

Hi,

I think you can search without the escape sequence as its not necessary.
Instead just try (term) and it should work.

Regards
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Http-Connection-is-hanging-while-deleteByQuery-tp2367405p2397455.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR 1.4 and Lucene 3.0.3 index problem

2011-02-01 Thread Churchill Nanje Mambe
is there any way I can change the lucene version wrapped in side solr 1.4
from lucene 2.x to lucene 3.x.
 any tutorials as I am guessing thats where the index data doesnt match.
 something I also found out is that solr 1.4 expects the index to be
luce_index_folder/index while lucene 3.x index is just the folder
lucene_index_folder
 in my case its crawl_data/ for lucene but solr 1.4 is expect
crawl_data/index and when I point to this in solrconfig.xml it auto creates
crawl_data/index

I badly need this help

Mambe Churchill Nanje
237 33011349,
AfroVisioN Founder, President,CEO
http://www.afrovisiongroup.com | http://mambenanje.blogspot.com
skypeID: mambenanje
www.twitter.com/mambenanje



On Tue, Feb 1, 2011 at 2:52 PM, Estrada Groups <
estrada.adam.gro...@gmail.com> wrote:

> I have the exact opposite problem where Luke won't even load the index but
> Solr starts fine. I believe there are major differences between the two
> indexes that are causing all these issues.
>
> Adam
>
>
>
> On Feb 1, 2011, at 6:28 AM, Churchill Nanje Mambe <
> mambena...@afrovisiongroup.com> wrote:
>
> > hi guys
> > I have developed a java crawler and integrated the lucene 3.0.3 API into
> it
> > so it creates a Lucene.
> > now I wish to search this lucene index using solr, I tried to configure
> the
> > solrconfig.xml and schema.xml, everything seems to be fine
> > but then solr told me the index is corrupt but I use luke and I am able
> to
> > browse the index and perform searches and other things on it
> > can someone help me which solr can wrap around a lucene 3.0.3 index ??
> > regards
> >
> > Mambe Churchill Nanje
> > 237 33011349,
> > AfroVisioN Founder, President,CEO
> > http://www.afrovisiongroup.com | http://mambenanje.blogspot.com
> > skypeID: mambenanje
> > www.twitter.com/mambenanje
>


Re: CUSTOM JSP FOR APACHE SOLR

2011-02-01 Thread Estrada Groups
Has anyone noticed the rails application that installs with Solr4.0? I am 
interested to hear some feedback on that one...

Adam


On Jan 31, 2011, at 4:25 PM, Paul Libbrecht  wrote:

> Tomas,
> 
> I also know velocity can be used and works well.
> I would be interested to a simpler way to have the objects of SOLR available 
> in a jsp than write a custom jsp processor as a request handler; indeed, this 
> seems to be the way solrj is expected to be used in the wiki page.
> 
> Actually I migrated to velocity (which I like less than jsp) just because I 
> did not find a response to this question.
> 
> paul
> 
> 
> Le 31 janv. 2011 à 21:53, Tomás Fernández Löbbe a écrit :
> 
>> Hi John, you can use whatever you want for building your application, using
>> Solr on the backend (JSP included). You should find all the information you
>> need on Solr's wiki page:
>> http://wiki.apache.org/solr/
>> 
>> including some client libraries to easy
>> integrate your application with Solr:
>> http://wiki.apache.org/solr/IntegratingSolr
>> 
>> for fast prototyping you could
>> use Velocity:
>> http://wiki.apache.org/solr/VelocityResponseWriter
>> 
>> Anyway, I recommend you
>> to start with Solr's tutorial:
>> http://lucene.apache.org/solr/tutorial.html
>> 
>> 
>> Good luck,
>> Tomás
>> 
>> 2011/1/31 JOHN JAIRO GÓMEZ LAVERDE 
>> 
>>> 
>>> 
>>> SOLR LUCENE
>>> DEVELOPERS
>>> 
>>> Hi i am new to solr and i like to make a custom search page for enterprise
>>> users
>>> in JSP that takes the results of Apache Solr.
>>> 
>>> - Where i can find some useful examples for that topic ?
>>> - Is JSP the correct approach to solve mi requirement ?
>>> - If not what is the best solution to build a customize search page for my
>>> users?
>>> 
>>> Thanks
>>> from South America
>>> 
>>> JOHN JAIRO GOMEZ LAVERDE
>>> Bogotá - Colombia
>>> 
> 


Re: SOLR 1.4 and Lucene 3.0.3 index problem

2011-02-01 Thread Estrada Groups
I have the exact opposite problem where Luke won't even load the index but Solr 
starts fine. I believe there are major differences between the two indexes that 
are causing all these issues.

Adam



On Feb 1, 2011, at 6:28 AM, Churchill Nanje Mambe 
 wrote:

> hi guys
> I have developed a java crawler and integrated the lucene 3.0.3 API into it
> so it creates a Lucene.
> now I wish to search this lucene index using solr, I tried to configure the
> solrconfig.xml and schema.xml, everything seems to be fine
> but then solr told me the index is corrupt but I use luke and I am able to
> browse the index and perform searches and other things on it
> can someone help me which solr can wrap around a lucene 3.0.3 index ??
> regards
> 
> Mambe Churchill Nanje
> 237 33011349,
> AfroVisioN Founder, President,CEO
> http://www.afrovisiongroup.com | http://mambenanje.blogspot.com
> skypeID: mambenanje
> www.twitter.com/mambenanje


escaping parenthesis in search query don't work...

2011-02-01 Thread Pierre-Yves LANDRON

Hello !I've seen that in order to search term with parenthesis=2C those have to 
be=escaped as in title:\(term\).But it doesn't seem to work - parenthesis 
are=n't taken in account.here is the field type I'm using to index these data : 


   
 
  

 
 
 
 
How can I search parenthesis within my query ?Thanks,P. 
  

Re: chaning schema

2011-02-01 Thread Stefan Matheis
>From http://wiki.apache.org/solr/DataImportHandler#Commands

> The handler exposes all its API as http requests . The following are the 
> possible operations
> [..]
> clean : (default 'true'). Tells whether to clean up the index before the 
> indexing is started

so, no need for an (additional) delete *:* or?

On Tue, Feb 1, 2011 at 2:04 PM, Erick Erickson  wrote:
> That sounds right. You can cheat and just remove /data/index
> rather than delete *:* though (you should probably do that with the Solr
> instance stopped)
>
> Make sure to remove the directory "index" as well.
>
> Best
> Erick
>
> On Tue, Feb 1, 2011 at 1:27 AM, Dennis Gearon  wrote:
>
>> Anyone got a great little script for changing a schema?
>>
>> i.e., after changing:
>>  database,
>>  the view in the database for data import
>>  the data-config.xml file
>>  the schema.xml file
>>
>> I BELIEVE that I have to run:
>>  a delete command for the whole index *:*
>>  a full import and optimize
>>
>> This all sound right?
>>
>>  Dennis Gearon
>>
>>
>> Signature Warning
>> 
>> It is always a good idea to learn from your own mistakes. It is usually a
>> better
>> idea to learn from others’ mistakes, so you do not have to make them
>> yourself.
>> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>>
>>
>> EARTH has a Right To Life,
>> otherwise we all die.
>>
>>
>


Re: Terms and termscomponent questions

2011-02-01 Thread Erick Erickson
Nope, this isn't what I'd expect. There are a couple of possibilities:
1> check out what WordDelimiterFilterFactory is doing, although
 if you're really sending spaces that's probably not it.
2> Let's see the  and  definitions for the field
 in question. type="text" doesn't say anything about analysis,
 and that's where I'd expect you're having trouble. In particular
 if your analysis chain uses KeywordTokenizerFactory for instance.
3> Look at the admin/schema browse page, look at your field and
 see what the actual tokens are. That'll tell you what TermsComponents
 is returning, perhaps the concatenation is happening somewhere
 else.

Bottom line: Solr will not concatenate terms like this unless you tell it
to,
so I suspect you're telling it to, you just don't realize it ...

Best
Erick

On Tue, Feb 1, 2011 at 1:33 AM, openvictor Open wrote:

> Dear Solr users,
>
> I am currently using SolR and TermsComponents to make an auto suggest for
> my
> website.
>
> I have a field called p_field indexed and stored with type="text" in the
> schema xml. Nothing out of the usual.
> I feed to Solr a set of words separated by a coma and a space such as (for
> two documents) :
>
> Document 1:
> word11, word12, word13. word14
>
> Document 2:
> word21, word22, word23. word24
>
>
> When I use my newly designed field I get things for the prefix "word1" :
> word11, word12, word13. word14 word11word12 word11word13 etc...
> Is it normal to have the concatenation of words and not only the words
> indexed ? Did I miss something about Terms ?
>
> Thank you very much,
> Best regards all,
> Victor
>


Re: chaning schema

2011-02-01 Thread Erick Erickson
That sounds right. You can cheat and just remove /data/index
rather than delete *:* though (you should probably do that with the Solr
instance stopped)

Make sure to remove the directory "index" as well.

Best
Erick

On Tue, Feb 1, 2011 at 1:27 AM, Dennis Gearon  wrote:

> Anyone got a great little script for changing a schema?
>
> i.e., after changing:
>  database,
>  the view in the database for data import
>  the data-config.xml file
>  the schema.xml file
>
> I BELIEVE that I have to run:
>  a delete command for the whole index *:*
>  a full import and optimize
>
> This all sound right?
>
>  Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better
> idea to learn from others’ mistakes, so you do not have to make them
> yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>


Re: Solr for noSQL

2011-02-01 Thread Steven Noels
On Tue, Feb 1, 2011 at 11:52 AM, Upayavira  wrote:


>
> Apologies if my "nothing funky" sounded like you weren't doing cool
> stuff.


No offense whatsoever. I think my longer reply paints a more accurate light
on what Lily means in terms of "SOLR for NoSQL", and it was your reaction
who triggered this additional explanation.


> I was merely attempting to say that I very much doubt you were
> doing anything funky like putting HBase underneath Solr as a replacement
> of FSDirectory.


There are some initiatives in the context of Cassandra IIRC, as well as a
project which stores Lucene index files in HBase tables, but frankly they
seem more experimentation, and also I think the nature of how Lucene/SOLR
works + what HBase does on top of Hadoop FS somehow is in conflict with each
other. Too many layers of indirection will kill performance on every layer.



> I was trying to imply that, likely your integration with
> Solr was relatively conventional (interacting with its REST interface),
>


Yep. We figured that was the wiser road to walk, and leaves a clear-defined
interface and possible area of improvement against a too-low level of
integration.


> and the "funky" stuff that you are doing sits outside of that space.
>
> Hope that's a clearer (and more accurate?) attempt at what I was trying
> to say.
>
> Upayavira (who finds the Lily project interesting, and would love to
> find the time to play with it)
>

Anytime, Upayavira. Anytime! ;-)

Steven.
-- 
Steven Noels
http://outerthought.org/
Scalable Smart Data
Makers of Kauri, Daisy CMS and Lily


Lucene 3.0.3 index cannot be read by Solr

2011-02-01 Thread Churchill Nanje Mambe
Hello I need help,
 I am trying to configure solr 1.4 to read my lucene 3.0.3 based index
I have but it says they are not compatible. can someone help me as I
dont know what to do

-- 
Mambe Churchill Nanje
237 33011349,
AfroVisioN Founder, President,CEO
http://www.afrovisiongroup.com | http://mambenanje.blogspot.com
skypeID: mambenanje
www.twitter.com/mambenanje


SOLR 1.4 and Lucene 3.0.3 index problem

2011-02-01 Thread Churchill Nanje Mambe
hi guys
 I have developed a java crawler and integrated the lucene 3.0.3 API into it
so it creates a Lucene.
 now I wish to search this lucene index using solr, I tried to configure the
solrconfig.xml and schema.xml, everything seems to be fine
but then solr told me the index is corrupt but I use luke and I am able to
browse the index and perform searches and other things on it
 can someone help me which solr can wrap around a lucene 3.0.3 index ??
regards

Mambe Churchill Nanje
237 33011349,
AfroVisioN Founder, President,CEO
http://www.afrovisiongroup.com | http://mambenanje.blogspot.com
skypeID: mambenanje
www.twitter.com/mambenanje


Re: Solr for noSQL

2011-02-01 Thread Upayavira


On Tue, 01 Feb 2011 07:22 +0100, "Steven Noels"
 wrote:
> On Mon, Jan 31, 2011 at 9:38 PM, Upayavira  wrote:
> 
> >
> >
> > On Mon, 31 Jan 2011 08:40 -0500, "Estrada Groups"
> >  wrote:
> > > What are the advantages of using something like HBase over your standard
> > > Lucene index with Solr? It would seem to me like you'd be losing a lot of
> > > what Lucene has to offer!?!
> >
> > I think Steven is saying that he has an indexer app that reads from
> > HBase and writes to a standard Solr by hitting its Rest API.
> >
> > So, nothing funky, just a little app that reads from HBase and posts to
> > Solr.
> >
> 
> 
> We're doing something like offering a relational-database-like experience
> (i.e. a schema language, storing typed data instead of byte[]s, secondary
> indexing facilities), with some content management features (versioning,
> blob storage), combined with SOLR as a search index (with mapping between
> our schema and that of SOLR), the index being maintained incrementally
> and
> through map/reduce (for reindexing). We keep multiple versions of the
> index
> if you want, with state management and we do text extraction with Tika.
> All
> this happens fully distributed, so you can play with different boxes
> serving
> as HBase datanode, or index feeder, SOLR search node, etc etc.
> 
> All that sits behind a Java API that uses Avro underneath, and a REST
> interface as well (searches go directly to SOLR). For future versions, we
> will integrate a recommendation engine and some analytics tools as well.
> 
> So yes, we do more (or rather: different things) than what Lucene/SOLR
> does,
> as we offer a full-featured data storage environment, stuffing your data
> in
> HBase (which scales better than MySQL), and make it searchable through
> SOLR.
> 
> The 'funky app' you're referring at now sits at about 3 manyears of
> fulltime
> development, BTW. ;-)

Apologies if my "nothing funky" sounded like you weren't doing cool
stuff. I was merely attempting to say that I very much doubt you were
doing anything funky like putting HBase underneath Solr as a replacement
of FSDirectory. I was trying to imply that, likely your integration with
Solr was relatively conventional (interacting with its REST interface),
and the "funky" stuff that you are doing sits outside of that space.

Hope that's a clearer (and more accurate?) attempt at what I was trying
to say.

Upayavira (who finds the Lily project interesting, and would love to
find the time to play with it)
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source



Re: SolrJ (Trunk) Invalid version or the data in not in 'javabin' format

2011-02-01 Thread Em

Hi,

sorry for the late feedback. Everything seems to be fine now.

Thank you!


Koji Sekiguchi wrote:
> 
> (11/01/31 3:11), Em wrote:
>>
>> Hello list,
>>
>> I build an application that uses SolrJ to communicate with Solr.
>>
>> What did I do?
>> Well, I deleted all the solrj-lib stuff from my application's
>> Webcontent-directory and inserted the solrj-lib from the freshly compiled
>> solr 4.0 - trunk.
>> However, when trying to query Solr 4.0 it shows me a
>> RuntimeException:
>> Invalid version or the data in not in 'javabin' format
> 
> I've just committed a small change so that you can see the version
> difference
> (I'll open the JIRA issue later because it is in maintenance now):
> 
> Index: solr/src/common/org/apache/solr/common/util/JavaBinCodec.java
> ===
> --- solr/src/common/org/apache/solr/common/util/JavaBinCodec.java
> (revision 1065245)
> +++ solr/src/common/org/apache/solr/common/util/JavaBinCodec.java (working
> copy)
> @@ -96,7 +96,8 @@
>   FastInputStream dis = FastInputStream.wrap(is);
>   version = dis.readByte();
>   if (version != VERSION) {
> -  throw new RuntimeException("Invalid version or the data in not in
> 'javabin' format");
> +  throw new RuntimeException("Invalid version (expected " + VERSION +
> +  ", but " + version + ") or the data in not in 'javabin'
> format");
>   }
>   return readVal(dis);
> }
> 
> Can you try the latest trunk and see the version difference?
> 
> Koji
> -- 
> http://www.rondhuit.com/en/
> 
> 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Trunk-Invalid-version-or-the-data-in-not-in-javabin-format-tp2384421p2396195.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: data-config.xml: delta-import unclear behaviour pre/postDeleteImportQuery with clean

2011-02-01 Thread Charton, Andre
Hi Manu,

from 1.4.1 it is invoked if "postImportDeleteQuery" is not null and clean is 
true, see Code

...
String delQuery = e.allAttributes.get("preImportDeleteQuery");
  if (dataImporter.getStatus() == DataImporter.Status.RUNNING_DELTA_DUMP) {
cleanByQuery(delQuery, fullCleanDone);
doDelta();
delQuery = e.allAttributes.get("postImportDeleteQuery");
if (delQuery != null) {
  fullCleanDone.set(false);
  cleanByQuery(delQuery, fullCleanDone);
}
  }
...


private void cleanByQuery(String delQuery, AtomicBoolean completeCleanDone) {
delQuery = getVariableResolver().replaceTokens(delQuery);
if (requestParameters.clean) {
  if (delQuery == null && !completeCleanDone.get()) {
writer.doDeleteAll();
completeCleanDone.set(true);
  } else if (delQuery != null) {
writer.deleteByQuery(delQuery);
  }
}
  }

André



-Original Message-
From: manuel aldana [mailto:ald...@gmx.de] 
Sent: Montag, 31. Januar 2011 09:40
To: solr-user@lucene.apache.org
Subject: data-config.xml: delta-import unclear behaviour 
pre/postDeleteImportQuery with clean

I have some unclear behaviour with using clean and 
pre/postImportDeleteQuery for delta-imports. The docs under 
http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml 
are not clear enough.

My observation is:
- preImportDeleteQuery is only executed if clean=true is set
- postImportDeleteQuery is only executed if clean=true is set
- if preImportDeleteQuery is ommitted and clean=true then the whole 
index is cleaned
=> config with postImportDeleteQuery itself won't work

Is above correct?

I don't need preImportDeleteQuery only post is necessary. But to make 
post work I am doubling the post to pre so clean=true doesn't delete 
whole index. This looks a bit like a workaround as wanted behaviour.

solr version is 1.4.1

thanks.

-- 
  manuel aldana
  mail: ald...@gmx.de | man...@aldana-online.de
  blog: www.aldana-online.de