Re: DIH - Example of using $nextUrl and $hasMore

2009-02-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
currently the initial counter is not set , so the value becomes an empty string
http://subdomain.site.com/boards.rss?page=${blogs.n}
becomes
http://subdomain.site.com/boards.rss?page=

we need to fix this. Unfortunately the transformer is invoked only
after the first chunk is fetched.

the best bet is to keep the url  as
http://subdomain.site.com/boards.rss?page=1

create the $nextUrl from the transformer and return it in the row

so the url is ignored for second chunk onwards and the value of
$nextUrl will be used




On Tue, Feb 3, 2009 at 12:13 AM, Jon Baer  wrote:
> See I think Im just misunderstanding how this entity is suppose to be setup
> ... for example, using the patch on 1.3 I ended up in a loop where .n is
> never set ...
>
> Feb 2, 2009 1:31:02 PM org.apache.solr.handler.dataimport.HttpDataSource
> getData
> INFO: Created URL to: http://subdomain.site.com/feed.rss?page=
>
> http://subdomain.site.com/boards.rss?page=${blogs.n}"; chunkSize="50"
> name="docs" pk="link" processor="XPathEntityProcessor"
> forEach="/rss/channel/item" transformer="RegexTransformer,
> com.nhl.solr.DateFormatTransformer, TemplateTransformer,
> com.nhl.solr.EnumeratedEntityTransformer">
>
> I guess what Im looking for is that snippet which shows how it is setup (the
> initial counter) ...
>
> - Jon
>
> On Mon, Feb 2, 2009 at 12:39 PM, Noble Paul നോബിള്‍ नोब्ळ् <
> noble.p...@gmail.com> wrote:
>
>> On Mon, Feb 2, 2009 at 11:01 PM, Jon Baer  wrote:
>> > Yes I think what Jared mentions in the JIRA is what I was thinking about
>> > when it is recommended to always return true for $hasMore ...
>> >
>> > "The transformer must know somehow when $hasMore should be true. If the
>> > transformer always give $hasMore a value "true", will there be infinite
>> > requests made or will it stop on the first empty request? Using the
>> > EnumeratedEntityTransformer, a user can specify from the config xml when
>> > $hasMore should be true using the chunkSize attribute. This solves a
>> general
>> > case of "request N rows at a time until no more are available". I agree,
>> a
>> > combination of 'rowsFetchedCount' and a HasMoreUntilEmptyTransformer
>> would
>> > also make this doable from the configuration"
>> why cant a Tranformer put a $hasMore=false?
>> >
>> > This makes sense.
>> >
>> > - Jon
>> >  [ Show »  ]
>> >  Jared Flatow<
>> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=jflatow>-
>> > 28/Jan/09
>> > 09:16 PM The transformer must know somehow when $hasMore should be true.
>> If
>> > the transformer always give $hasMore a value "true", will there be
>> infinite
>> > requests made or will it stop on the first empty request? Using the
>> > EnumeratedEntityTransformer, a user can specify from the config xml when
>> > $hasMore should be true using the chunkSize attribute. This solves a
>> general
>> > case of "request N rows at a time until no more are available". I agree,
>> a
>> > combination of 'rowsFetchedCount' and a HasMoreUntilEmptyTransformer
>> would
>> > also make this doable from the configuration.
>> >
>> > On Mon, Feb 2, 2009 at 11:53 AM, Shalin Shekhar Mangar <
>> > shalinman...@gmail.com> wrote:
>> >
>> >> On Mon, Feb 2, 2009 at 9:20 PM, Jon Baer  wrote:
>> >>
>> >> > Hi,
>> >> >
>> >> > Sorry I know this exists ...
>> >> >
>> >> > "If an API supports chunking (when the dataset is too large) multiple
>> >> calls
>> >> > need to be made to complete the process. XPathEntityprocessor supports
>> >> this
>> >> > with a transformer. If transformer returns a row which contains a
>> field *
>> >> > $hasMore* with a the value "true" the Processor makes another request
>> >> with
>> >> > the same url template (The actual value is recomputed before invoking
>> ).
>> >> A
>> >> > transformer can pass a totally new url too for the next call by
>> returning
>> >> a
>> >> > row which contains a field *$nextUrl* whose value must be the complete
>> >> url
>> >> > for the next call."
>> >> >
>> >> > But is there a true example of it's use somewhere?  Im trying to
>> figure
>> >> out
>> >> > if I know before import that I have 56 "pages" to index how to set
>> this
>> >> up
>> >> > properly.  (And how to set it up if pages need to be determined by
>> >> > something
>> >> > in the feed, etc).
>> >> >
>> >>
>> >> No, there is no example (yet). You'll put the url with variables for the
>> >> corresponding 'start' and 'count' parameters and a custom transformer
>> can
>> >> specify if another request needs to be made. I know it's not much to go
>> on.
>> >> I'll try to write some documentation on the wiki.
>> >>
>> >> SOLR-994 might be interesting to you. I haven't been able to look at the
>> >> patch though.
>> >>
>> >>  https://issues.apache.org/jira/browse/SOLR-994
>> >> --
>> >> Regards,
>> >> Shalin Shekhar Mangar.
>> >>
>> >
>>
>>
>>
>> --
>> --Noble Paul
>>
>



-- 
--Noble Paul


Re: DIH using values from solrconfig.xml inside data-config.xml

2009-02-04 Thread Lance Norskog
There are two xml library projects that do streaming xpath reads with full
expression evaluation: Nux and dom4j. Nux is from LBL and is an "kinda like
BSD" license and dom4j is BSD license.

http://dom4j.org/dom4j-1.6.1/project-info.html
http://acs.lbl.gov/nux/

The licensing probably kills these, right?

Apache includes the Jaxen library, but I can't quite tell if they can stream
or not.

http://xml.apache.org/xalan-j/xpath_apis.html

On Tue, Feb 3, 2009 at 8:48 PM, Noble Paul നോബിള്‍ नोब्ळ् <
noble.p...@gmail.com> wrote:

> On Wed, Feb 4, 2009 at 6:13 AM, Chris Hostetter
>  wrote:
> >
> > : > The solr data field is populated properly. So I guess that bit works.
> > : > I really wish I could use xpath="//para"
> >
> > : The limitation comes from streaming the XML instead of creating a DOM.
> > : XPathRecordReader is a custom streaming XPath parser implementation and
> > : streaming is easy only because we limit the syntax. You can use
> > : PlainTextEntityProcessor which gives the XML as a string to a  custom
> > : Transformer. This Transformer can create a DOM, run your XPath query
> and
> > : populate the fields. It's more expensive but it is an option.
> >
> > Maybe it's just me, but it seems like i'm noticing that as DIH gets used
> > more, many people are noting that the XPath processing in DIH doesn't
> work
> > the way they expect because it's a custom XPath parser/engine designed
> for
> > streaming.
> >
> > It seems like it would be helpful to have an alternate processor for
> > people who don't need the streaming support (ie: are dealing with small
> > enough docs that they can load the full DOM tree into memory) that would
> > use the default Java XPath engine (and have less caveats/suprises) ... i
> > wou think it would probably even make sense for this new XPath processor
> > to be the one we suggest for new users, and only suggest the existing
> > (stream based) processor if they have really big xml docs to deal with.
> >
> I guess the current XPathEntityProcessor must be able to switch
> between the streaming xpath(XPathRecordReader) and the default java
> XPath engine .
>
> I am just hoping that all the current syntax and semantics will be
> applicable for the Java Xpath engine. If not ,we will need a new
> EntityProcessor.
>
> I also would like to explore if the current XPathRecordReader can
> implement more XPath syntax with streaming.
>
> The java xpath engine is not at all efficient for large scale data
> processing
>
>
> > (In hindsight XPathEntityProcessor and XPathRecordReader should probably
> > have been named StreamingXPathEntityProcessor and
> > StreamingXPathRecordReader)
>
> >
> > thoughts?
> >
> >
> > -Hoss
> >
> >
>
>
>
> --
> --Noble Paul
>



-- 
Lance Norskog
goks...@gmail.com
650-922-8831 (US)


Re: Total count of facets

2009-02-04 Thread Bruno Aranda
Maybe I am not clear, but I am not able to find anything on the net.
Basically, if I had in my index millions of names starting with A* I would
like to know how many distinct surnames are present in the resultset
(similar to a distinct SQL query).
I will attempt to have a look at the SOLR sources to try to see if this is
possible to implement. Any hints where to look at would be great!

Thanks,

Bruno

2009/2/3 Bruno Aranda 

> But as far as I understand the total number of constraints is limited
> (there is a default value), so I cannot know the total if I don't set the
> facet.limit to a really big number and then the request takes a long time. I
> was wondering if there was a way to get the total number (e.g. 100.000
> constraints) to show it to the user, and then paginate using facet.offset
> and facet.limit until I reach that total.
> Does this make sense?
>
> Thanks!
>
> Bruno
>
> 2009/2/3 Markus Jelsma - Buyways B.V. 
>
> Hello,
>>
>>
>> Searching for ?q=*:* with facetting turned on gives me the total number
>> of available constraints, if that is what you mean.
>>
>>
>> Cheers,
>>
>>
>>
>> On Tue, 2009-02-03 at 16:03 +, Bruno Aranda wrote:
>>
>> > Hi,
>> >
>> > I would like to know if is there a way to get the total number of
>> different
>> > facets returned by a faceted search? I see already that I can paginate
>> > through the facets with the facet.offset and facet.limit, but there is a
>> way
>> > to know how many facets are found in total?
>> >
>> > For instance,
>> >
>> > NameSurname
>> >
>> > Peter Smith
>> > John  Smith
>> > Anne Baker
>> > Mary York
>> > ... 1 million records more with 100.000 distinct surnames
>> >
>> > For instance, now I search for people with names starting with A, and I
>> > retrieve 5000 results. I would like to know the distinct number of
>> surnames
>> > (facets) for the result set if possible, so I could show in my app
>> something
>> > like this:
>> >
>> > 5000 people found with 1440 distinct surnames.
>> >
>> > Any ideas? Is this possible to implement? Any pointers would be greatly
>> > appreciated,
>> >
>> > Thanks!
>> >
>> > Bruno
>>
>
>


Re: Total count of facets

2009-02-04 Thread Shalin Shekhar Mangar
On Wed, Feb 4, 2009 at 2:14 PM, Bruno Aranda  wrote:

> Maybe I am not clear, but I am not able to find anything on the net.
> Basically, if I had in my index millions of names starting with A* I would
> like to know how many distinct surnames are present in the resultset
> (similar to a distinct SQL query).
> I will attempt to have a look at the SOLR sources to try to see if this is
> possible to implement. Any hints where to look at would be great!
>

You can use facet.query=name:A* to get the count of names starting with A.

-- 
Regards,
Shalin Shekhar Mangar.


Re: New wiki pages

2009-02-04 Thread Lance Norskog
I've added them to http://wiki.apache.org/solr/FrontPage under "Search and
Indexing". I declare open season on them. That is, anyone can edit them for
any reason. I'm sure I got some things wrong in memory sizing and sorting.

These tips and opinions came from my experience on an index with hundreds of
millions of small records. These are not the final word on how to do
production Solr.

Enjoy,

Lance Norskog

On Mon, Feb 2, 2009 at 10:25 PM, Lance Norskog  wrote:

> http://wiki.apache.org/solr/SchemaDesign
> http://wiki.apache.org/solr/LargeIndexes
> http://wiki.apache.org/solr/UniqueKey
>
> These pages are based on my recent experience and some generalizations.
> They are intended for new users who want to use Solr for a major project.
> Please review them and send me comments.
>
> For example: "they are stupid",  "the wiki has no links to them and those
> links should be here", etc.
>
> --
> Lance Norskog
> goks...@gmail.com
> 650-922-8831 (US)
>
>


-- 
Lance Norskog
goks...@gmail.com
650-922-8831 (US)


Re: Total count of facets

2009-02-04 Thread Bruno Aranda
Mmh, thanks for your answer but with that I get the count of names starting
with A*, but I would like to get the count of distinct surnames (or town
names, or any other field that is not the name...) for the people with name
starting with A*. Is that possible?

Thanks!

Bruno

2009/2/4 Shalin Shekhar Mangar 

> On Wed, Feb 4, 2009 at 2:14 PM, Bruno Aranda 
> wrote:
>
> > Maybe I am not clear, but I am not able to find anything on the net.
> > Basically, if I had in my index millions of names starting with A* I
> would
> > like to know how many distinct surnames are present in the resultset
> > (similar to a distinct SQL query).
> > I will attempt to have a look at the SOLR sources to try to see if this
> is
> > possible to implement. Any hints where to look at would be great!
> >
>
> You can use facet.query=name:A* to get the count of names starting with A.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Total count of facets

2009-02-04 Thread Shalin Shekhar Mangar
On Wed, Feb 4, 2009 at 2:53 PM, Bruno Aranda  wrote:

> Mmh, thanks for your answer but with that I get the count of names starting
> with A*, but I would like to get the count of distinct surnames (or town
> names, or any other field that is not the name...) for the people with name
> starting with A*. Is that possible?
>

It is possible. You can use fq=name:A* to filter people whose names start
with 'A'. Then you can use facet.field=surnames or facet.field=town or
whatever you want with facet.limit=-1 and count the number of results for
each facet. It may be slow for the first query but it is cached so
subsequent queries should be faster (make sure you size filterCache
appropriately).

-- 
Regards,
Shalin Shekhar Mangar.


Re: DIH using values from solrconfig.xml inside data-config.xml

2009-02-04 Thread Fergus McMenemie
>: > The solr data field is populated properly. So I guess that bit works.
>: > I really wish I could use xpath="//para"
>
>: The limitation comes from streaming the XML instead of creating a DOM.
>: XPathRecordReader is a custom streaming XPath parser implementation and
>: streaming is easy only because we limit the syntax. You can use
>: PlainTextEntityProcessor which gives the XML as a string to a  custom
>: Transformer. This Transformer can create a DOM, run your XPath query and
>: populate the fields. It's more expensive but it is an option.
>
>Maybe it's just me, but it seems like i'm noticing that as DIH gets used 
>more, many people are noting that the XPath processing in DIH doesn't work 
>the way they expect because it's a custom XPath parser/engine designed for 
>streaming.  
>
>It seems like it would be helpful to have an alternate processor for 
>people who don't need the streaming support (ie: are dealing with small 
>enough docs that they can load the full DOM tree into memory) that would 
>use the default Java XPath engine (and have less caveats/suprises) ... i 
>wou think it would probably even make sense for this new XPath processor 
>to be the one we suggest for new users, and only suggest the existing 
>(stream based) processor if they have really big xml docs to deal with.
>
>(In hindsight XPathEntityProcessor and XPathRecordReader should probably 
>have been named StreamingXPathEntityProcessor and 
>StreamingXPathRecordReader)
>
Four thoughts!

1) My use case involves a few million XML documents ranging in size
   from a few K to 500K. 95% of the documents are under 25KBytes, 
   5 of the documents are around 0.5Mbytes. So.. sod it, I think I
   need a streaming parser.

2) "streaming XPath parser"? I only half understand all this stuff,
   but, and this is based on the little bit of SAX stuff I have written,
   I would have thought that //para was trivial for any kind of
   streaming XML parser.

3) Much of the confusion may be arising because the DIH wiki page is
   not to clear on what is and is not allowed. We need better,
   more explicit examples. What seems to be allowed is:-
 


   I will add these to the wiki. Just to be sure, I tested 
   xpath="//para". It does not work!

4) XML documents are ether well structured with good separation of 
   data and presentation in which case absolute xpaths work fine.
   Or older, in my case text documents, which have been forced into
   XML format with poor structure where the data and presentation 
   is all mixed up. I suspect that the addition of //para would
   cover many of the use cases, and what was left could be covered
   by a preceding XSLT transform. 
-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: Total count of facets

2009-02-04 Thread Bruno Aranda
Unfortunately, after some tests listing all the distinct surnames or other
fields is too slow and too memory consuming with our current infrastructure.
Could someone confirm that if I wanted to add this functionality (just count
the total of different facets) what I should do is to subclass the
SimpleFacets class and create an extended FacetComponent that returns the
size of the term counts list instead of the list itself?
I see that the FacetComponent is registered by default. Is it possible to
register an extended FacetComponent instead? Or just creating a new one is
enough?

Sorry for asking so many questions today. I am new to SOLR and I was very
excited until I found that I could not comply with one of our requirements:
"counting the distinct surnames for names starting with A*", which is
possible with SQL but no with SOLR out of the box...

Thanks!

Bruno

2009/2/4 Bruno Aranda 

> Thanks, I will try that though I am talking in my case about 100,000+
> distinct surnames/towns maximum per query and I just needed the count and
> not the whole list. In any case, this brute-force approach is still
> something I can try but I wonder how this will behave speed and memory wise
> when there are many different concurrent queries and so on...
>
> Cheers,
>
> Bruno
>
> 2009/2/4 Shalin Shekhar Mangar 
>
>> On Wed, Feb 4, 2009 at 2:53 PM, Bruno Aranda 
>> wrote:
>>
>>
>> > Mmh, thanks for your answer but with that I get the count of names
>> starting
>> > with A*, but I would like to get the count of distinct surnames (or town
>> > names, or any other field that is not the name...) for the people with
>> name
>> > starting with A*. Is that possible?
>> >
>>
>> It is possible. You can use fq=name:A* to filter people whose names start
>> with 'A'. Then you can use facet.field=surnames or facet.field=town or
>> whatever you want with facet.limit=-1 and count the number of results for
>> each facet. It may be slow for the first query but it is cached so
>> subsequent queries should be faster (make sure you size filterCache
>> appropriately).
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
>


Severe errors in solr configuration

2009-02-04 Thread Anto Binish Kaspar
Hi,
I am trying to configure solr on ubuntu server and I am getting the following 
exception. I can able work it on windows box.


message Severe errors in solr configuration. Check your log files for more 
detailed information on what may be wrong. If you want solr to continue after 
configuration errors, change: 
false in null 
- 
java.security.AccessControlException: access denied 
(java.util.PropertyPermission user.dir read) at 
java.security.AccessControlContext.checkPermission(AccessControlContext.java:342)
 at java.security.AccessController.checkPermission(AccessController.java:553) 
at java.lang.SecurityManager.checkPermission(SecurityManager.java:549) at 
java.lang.SecurityManager.checkPropertyAccess(SecurityManager.java:1302) at 
java.lang.System.getProperty(System.java:669) at 
java.io.UnixFileSystem.resolve(UnixFileSystem.java:133) at 
java.io.File.getAbsolutePath(File.java:518) at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:101)
 at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) 
at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
 at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
 at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
 at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709) 
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4363) at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791) 
at org.apache.catalina.core.ContainerBase.access$000(ContainerBase.java:123) at 
org.apache.catalina.core.ContainerBase$PrivilegedAddChild.run(ContainerBase.java:145)
 at java.security.AccessController.doPrivileged(Native Method) at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:769) at 
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525) at 
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:627) at 
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553) 
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488) at 
org.apache.catalina.startup.HostConfig.start(HostConfig.java:1149) at 
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311) at 
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
 at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at 
org.apache.catalina.core.StandardHost.start(StandardHost.java:719) at 
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at 
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at 
org.apache.catalina.core.StandardService.start(StandardService.java:516) at 
org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at 
org.apache.catalina.startup.Catalina.start(Catalina.java:578) at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616) at 
org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288) at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616) at 
org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:177)

Please help me to fix this problem.

Thanks,
Anto Binish Kaspar,
Acting Team Lead,
E.C software Pvt. Ltd.



Re: Severe errors in solr configuration

2009-02-04 Thread Olivier Dobberkau


Am 04.02.2009 um 13:33 schrieb Anto Binish Kaspar:


Hi,
I am trying to configure solr on ubuntu server and I am getting the  
following exception. I can able work it on windows box.



Hi Anto.

Have you installed the solr package 1.2 from ubuntu?
Or the release 1.3 as war file?

Olivier

--
Olivier Dobberkau

Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstr. 79
D 60329 Frankfurt/Main

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer:
Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

fon:  +49 (0)69 - 43 05 61-70
fax:  +49 (0)69 - 43 05 61-90
mail: olivier.dobber...@dkd.de
home: http://www.dkd.de

aktuelle TYPO3-Projekte:
www.licht.de - Relaunch (TYPO3)
www.lahmeyer.de - Launch (TYPO3)
www.seb-assetmanagement.de - Relaunch (TYPO3)


RE: Severe errors in solr configuration

2009-02-04 Thread Anto Binish Kaspar
Hi Olivier

Thanks for your quick reply. I am using the release 1.3 as war file.

- Anto Binish Kaspar


-Original Message-
From: Olivier Dobberkau [mailto:olivier.dobber...@dkd.de] 
Sent: Wednesday, February 04, 2009 6:20 PM
To: solr-user@lucene.apache.org
Subject: Re: Severe errors in solr configuration


Am 04.02.2009 um 13:33 schrieb Anto Binish Kaspar:

> Hi,
> I am trying to configure solr on ubuntu server and I am getting the  
> following exception. I can able work it on windows box.


Hi Anto.

Have you installed the solr package 1.2 from ubuntu?
Or the release 1.3 as war file?

Olivier

--
Olivier Dobberkau

Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstr. 79
D 60329 Frankfurt/Main

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer:
Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

fon:  +49 (0)69 - 43 05 61-70
fax:  +49 (0)69 - 43 05 61-90
mail: olivier.dobber...@dkd.de
home: http://www.dkd.de

aktuelle TYPO3-Projekte:
www.licht.de - Relaunch (TYPO3)
www.lahmeyer.de - Launch (TYPO3)
www.seb-assetmanagement.de - Relaunch (TYPO3)


Re: Severe errors in solr configuration

2009-02-04 Thread Olivier Dobberkau


Am 04.02.2009 um 13:54 schrieb Anto Binish Kaspar:


Hi Olivier

Thanks for your quick reply. I am using the release 1.3 as war file.

- Anto Binish Kaspar


OK.
As far a i understood you need to make sure that your solr home is set.
this needs to be done in

Quting:

http://wiki.apache.org/solr/SolrTomcat

In addition to using the default behavior of relying on the Solr Home  
being in the current working directory (./solr) you can alternately  
add the solr.solr.home system property to your JVM settings before  
starting Tomcat...


export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/my/custom/solr/home/dir/"

...or use a Context file to configure the Solr Home using JNDI

A Tomcat context fragments can be used to configure the JNDI property  
needed to specify your Solr Home directory.


Just put a context fragment file under $CATALINA_HOME/conf/Catalina/ 
localhost that looks something like this...


$ cat /tomcat55/conf/Catalina/localhost/solr.xml


   



Greetings,

Olivier

PS: May be it would be great if we could provide an ubuntu dpkg with  
1.3 ? Any takers?


--
Olivier Dobberkau

Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstr. 79
D 60329 Frankfurt/Main

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer:
Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

fon:  +49 (0)69 - 43 05 61-70
fax:  +49 (0)69 - 43 05 61-90
mail: olivier.dobber...@dkd.de
home: http://www.dkd.de

aktuelle TYPO3-Projekte:
www.licht.de - Relaunch (TYPO3)
www.lahmeyer.de - Launch (TYPO3)
www.seb-assetmanagement.de - Relaunch (TYPO3)


RE: Severe errors in solr configuration

2009-02-04 Thread Anto Binish Kaspar
I am using Context file, here is my solr.xml

$ cat /var/lib/tomcat6/conf/Catalina/localhost/solr.xml 





I change the ownership of the folder (usr/local/solr/solr-1.3/solr) to 
tomcat6:tomcat6 from root:root

Anything I am missing? 

- Anto Binish Kaspar


-Original Message-
From: Olivier Dobberkau [mailto:olivier.dobber...@dkd.de] 
Sent: Wednesday, February 04, 2009 6:30 PM
To: solr-user@lucene.apache.org
Subject: Re: Severe errors in solr configuration


Am 04.02.2009 um 13:54 schrieb Anto Binish Kaspar:

> Hi Olivier
>
> Thanks for your quick reply. I am using the release 1.3 as war file.
>
> - Anto Binish Kaspar

OK.
As far a i understood you need to make sure that your solr home is set.
this needs to be done in

Quting:

http://wiki.apache.org/solr/SolrTomcat

In addition to using the default behavior of relying on the Solr Home  
being in the current working directory (./solr) you can alternately  
add the solr.solr.home system property to your JVM settings before  
starting Tomcat...

export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/my/custom/solr/home/dir/"

...or use a Context file to configure the Solr Home using JNDI

A Tomcat context fragments can be used to configure the JNDI property  
needed to specify your Solr Home directory.

Just put a context fragment file under $CATALINA_HOME/conf/Catalina/ 
localhost that looks something like this...

$ cat /tomcat55/conf/Catalina/localhost/solr.xml





Greetings,

Olivier

PS: May be it would be great if we could provide an ubuntu dpkg with  
1.3 ? Any takers?

--
Olivier Dobberkau

Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstr. 79
D 60329 Frankfurt/Main

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer:
Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

fon:  +49 (0)69 - 43 05 61-70
fax:  +49 (0)69 - 43 05 61-90
mail: olivier.dobber...@dkd.de
home: http://www.dkd.de

aktuelle TYPO3-Projekte:
www.licht.de - Relaunch (TYPO3)
www.lahmeyer.de - Launch (TYPO3)
www.seb-assetmanagement.de - Relaunch (TYPO3)


Re: Severe errors in solr configuration

2009-02-04 Thread Olivier Dobberkau

A slash?

Olivier

Von meinem iPhone gesendet


Am 04.02.2009 um 14:06 schrieb Anto Binish Kaspar :


I am using Context file, here is my solr.xml

$ cat /var/lib/tomcat6/conf/Catalina/localhost/solr.xml






I change the ownership of the folder (usr/local/solr/solr-1.3/solr)  
to tomcat6:tomcat6 from root:root


Anything I am missing?

- Anto Binish Kaspar


-Original Message-
From: Olivier Dobberkau [mailto:olivier.dobber...@dkd.de]
Sent: Wednesday, February 04, 2009 6:30 PM
To: solr-user@lucene.apache.org
Subject: Re: Severe errors in solr configuration


Am 04.02.2009 um 13:54 schrieb Anto Binish Kaspar:


Hi Olivier

Thanks for your quick reply. I am using the release 1.3 as war file.

- Anto Binish Kaspar


OK.
As far a i understood you need to make sure that your solr home is  
set.

this needs to be done in

Quting:

http://wiki.apache.org/solr/SolrTomcat

In addition to using the default behavior of relying on the Solr Home
being in the current working directory (./solr) you can alternately
add the solr.solr.home system property to your JVM settings before
starting Tomcat...

export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/my/custom/solr/home/ 
dir/"


...or use a Context file to configure the Solr Home using JNDI

A Tomcat context fragments can be used to configure the JNDI property
needed to specify your Solr Home directory.

Just put a context fragment file under $CATALINA_HOME/conf/Catalina/
localhost that looks something like this...

$ cat /tomcat55/conf/Catalina/localhost/solr.xml


   


Greetings,

Olivier

PS: May be it would be great if we could provide an ubuntu dpkg with
1.3 ? Any takers?

--
Olivier Dobberkau

Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstr. 79
D 60329 Frankfurt/Main

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer:
Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

fon:  +49 (0)69 - 43 05 61-70
fax:  +49 (0)69 - 43 05 61-90
mail: olivier.dobber...@dkd.de
home: http://www.dkd.de

aktuelle TYPO3-Projekte:
www.licht.de - Relaunch (TYPO3)
www.lahmeyer.de - Launch (TYPO3)
www.seb-assetmanagement.de - Relaunch (TYPO3)



RE: Severe errors in solr configuration

2009-02-04 Thread Anto Binish Kaspar
Now it’s a giving a different message

Severe errors in solr configuration. Check your log files for more detailed 
information on what may be wrong. If you want solr to continue after 
configuration errors, change: 
false in null 
- 
java.security.AccessControlException: access denied (java.io.FilePermission 
/usr/local/solr/solr-1.3/solr/solr.xml read) at 
java.security.AccessControlContext.checkPermission(AccessControlContext.java:342)
 at java.security.AccessController.checkPermission(AccessController.java:553) 
at java.lang.SecurityManager.checkPermission(SecurityManager.java:549) at 
java.lang.SecurityManager.checkRead(SecurityManager.java:888) at 
java.io.File.exists(File.java:748) at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:103)
 at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) 
at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
 at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
 at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
 at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709) 
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4363) at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791) 
at org.apache.catalina.core.ContainerBase.access$000(ContainerBase.java:123) at 
org.apache.catalina.core.ContainerBase$PrivilegedAddChild.run(ContainerBase.java:145)
 at java.security.AccessController.doPrivileged(Native Method) at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:769) at 
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525) at 
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:627) at 
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553) 
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488) at 
org.apache.catalina.startup.HostConfig.start(HostConfig.java:1149) at 
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311) at 
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
 at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at 
org.apache.catalina.core.StandardHost.start(StandardHost.java:719) at 
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at 
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at 
org.apache.catalina.core.StandardService.start(StandardService.java:516) at 
org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at 
org.apache.catalina.startup.Catalina.start(Catalina.java:578) at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616) at 
org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288) at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616) at 
org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:177)

Why its trying to read the solr.xml from /usr/local/solr/solr-1.3/solr/ ?

- Anto Binish Kaspar


-Original Message-
From: Olivier Dobberkau [mailto:olivier.dobber...@dkd.de] 
Sent: Wednesday, February 04, 2009 6:50 PM
To: solr-user@lucene.apache.org
Subject: Re: Severe errors in solr configuration

A slash?

Olivier

Von meinem iPhone gesendet


Am 04.02.2009 um 14:06 schrieb Anto Binish Kaspar :

> I am using Context file, here is my solr.xml
>
> $ cat /var/lib/tomcat6/conf/Catalina/localhost/solr.xml
>
>  debug="0" crossContext="true" >
> 
> 
>
> I change the ownership of the folder (usr/local/solr/solr-1.3/solr)  
> to tomcat6:tomcat6 from root:root
>
> Anything I am missing?
>
> - Anto Binish Kaspar
>
>
> -Original Message-
> From: Olivier Dobberkau [mailto:olivier.dobber...@dkd.de]
> Sent: Wednesday, February 04, 2009 6:30 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Severe errors in solr configuration
>
>
> Am 04.02.2009 um 13:54 schrieb Anto Binish Kaspar:
>
>> Hi Olivier
>>
>> Thanks for your quick reply. I am using the release 1.3 as war file.
>>
>> - Anto Binish Kaspar
>
> OK.
> As far a i understood you need to make sure that your solr home is  
> set.
> this needs to be done in
>
> Quting:
>
> http://wiki.apache.org/solr/SolrTomcat
>
> In addition to using the default behavior of relying on the Solr Home
> being in the current working directory (./solr) you can alternately
> add 

Re: Boost function

2009-02-04 Thread Erick Erickson
>From Hossman...

<<>>


Search time boosts, as the name implies, factor into the scoring of
documents, increasing the score assigned to documents that match on the
boosted term, thus tending to score the entire document higher. So these
documents tend to be returned earlier in the results when sorting by score
(the default).

See "Lucene in Action"

Best
Erick

On Wed, Feb 4, 2009 at 8:12 AM, Tushar_Gandhi <
tushar_gan...@neovasolutions.com> wrote:

>
> Hi,
>   I want to know about boosting. What is the use ?
> How we can implement that? and How it will affect my search results?
>
> Thanks,
> Tushar
> --
> View this message in context:
> http://www.nabble.com/Boost-function-tp21829651p21829651.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Severe errors in solr configuration

2009-02-04 Thread Shalin Shekhar Mangar
According to http://wiki.apache.org/solr/SolrTomcat, the JNDI context should
be:


   



Notice that in the snippet you posted, the name was "/solr/home" (an extra
leading '/')

http://wiki.apache.org/solr/SolrTomcat#head-7036378fa48b79c0797cc8230a8aa0965412fb2e

On Wed, Feb 4, 2009 at 6:59 PM, Anto Binish Kaspar  wrote:

> Now it's a giving a different message
>
> Severe errors in solr configuration. Check your log files for more detailed
> information on what may be wrong. If you want solr to continue after
> configuration errors, change:
> false in null
> -
> java.security.AccessControlException: access denied (java.io.FilePermission
> /usr/local/solr/solr-1.3/solr/solr.xml read) at
> java.security.AccessControlContext.checkPermission(AccessControlContext.java:342)
> at java.security.AccessController.checkPermission(AccessController.java:553)
> at java.lang.SecurityManager.checkPermission(SecurityManager.java:549) at
> java.lang.SecurityManager.checkRead(SecurityManager.java:888) at
> java.io.File.exists(File.java:748) at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:103)
> at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
> at
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
> at
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
> at
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
> at
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709)
> at org.apache.catalina.core.StandardContext.start(StandardContext.java:4363)
> at
> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
> at org.apache.catalina.core.ContainerBase.access$000(ContainerBase.java:123)
> at
> org.apache.catalina.core.ContainerBase$PrivilegedAddChild.run(ContainerBase.java:145)
> at java.security.AccessController.doPrivileged(Native Method) at
> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:769) at
> org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525) at
> org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:627)
> at
> org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553)
> at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488) at
> org.apache.catalina.startup.HostConfig.start(HostConfig.java:1149) at
> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311)
> at
> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
> at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at
> org.apache.catalina.core.StandardHost.start(StandardHost.java:719) at
> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at
> org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at
> org.apache.catalina.core.StandardService.start(StandardService.java:516) at
> org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at
> org.apache.catalina.startup.Catalina.start(Catalina.java:578) at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616) at
> org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288) at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616) at
> org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:177)
>
> Why its trying to read the solr.xml from /usr/local/solr/solr-1.3/solr/ ?
>
> - Anto Binish Kaspar
>
>
> -Original Message-
> From: Olivier Dobberkau [mailto:olivier.dobber...@dkd.de]
> Sent: Wednesday, February 04, 2009 6:50 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Severe errors in solr configuration
>
> A slash?
>
> Olivier
>
> Von meinem iPhone gesendet
>
>
> Am 04.02.2009 um 14:06 schrieb Anto Binish Kaspar :
>
> > I am using Context file, here is my solr.xml
> >
> > $ cat /var/lib/tomcat6/conf/Catalina/localhost/solr.xml
> >
> >  > debug="0" crossContext="true" >
> > 
> > 
> >
> > I change the ownership of the folder (usr/local/solr/solr-1.3/solr)
> > to tomcat6:tomcat6 from root:root
> >
> > Anything I am missing?
> >
> > - Anto Binish Kaspar
> >
> >
> > -Original Message-
> > From: Olivier Dobberkau [mailto:olivier.dobber...@dkd.de]
> > Sent: Wednesday, February 04, 2009 6:30 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Severe errors in solr configuration
> >
> >
> > Am 04.02.2009 u

RE: Severe errors in solr configuration

2009-02-04 Thread Anto Binish Kaspar
Yes I removed, still I have the same issue. Any idea what may be cause of this 
issue?

- Anto Binish Kaspar


-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Wednesday, February 04, 2009 7:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Severe errors in solr configuration

According to http://wiki.apache.org/solr/SolrTomcat, the JNDI context should
be:


   



Notice that in the snippet you posted, the name was "/solr/home" (an extra
leading '/')

http://wiki.apache.org/solr/SolrTomcat#head-7036378fa48b79c0797cc8230a8aa0965412fb2e

On Wed, Feb 4, 2009 at 6:59 PM, Anto Binish Kaspar  wrote:

> Now it's a giving a different message
>
> Severe errors in solr configuration. Check your log files for more detailed
> information on what may be wrong. If you want solr to continue after
> configuration errors, change:
> false in null
> -
> java.security.AccessControlException: access denied (java.io.FilePermission
> /usr/local/solr/solr-1.3/solr/solr.xml read) at
> java.security.AccessControlContext.checkPermission(AccessControlContext.java:342)
> at java.security.AccessController.checkPermission(AccessController.java:553)
> at java.lang.SecurityManager.checkPermission(SecurityManager.java:549) at
> java.lang.SecurityManager.checkRead(SecurityManager.java:888) at
> java.io.File.exists(File.java:748) at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:103)
> at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
> at
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
> at
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
> at
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
> at
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709)
> at org.apache.catalina.core.StandardContext.start(StandardContext.java:4363)
> at
> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
> at org.apache.catalina.core.ContainerBase.access$000(ContainerBase.java:123)
> at
> org.apache.catalina.core.ContainerBase$PrivilegedAddChild.run(ContainerBase.java:145)
> at java.security.AccessController.doPrivileged(Native Method) at
> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:769) at
> org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525) at
> org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:627)
> at
> org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553)
> at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488) at
> org.apache.catalina.startup.HostConfig.start(HostConfig.java:1149) at
> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311)
> at
> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
> at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at
> org.apache.catalina.core.StandardHost.start(StandardHost.java:719) at
> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at
> org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at
> org.apache.catalina.core.StandardService.start(StandardService.java:516) at
> org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at
> org.apache.catalina.startup.Catalina.start(Catalina.java:578) at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616) at
> org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288) at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616) at
> org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:177)
>
> Why its trying to read the solr.xml from /usr/local/solr/solr-1.3/solr/ ?
>
> - Anto Binish Kaspar
>
>
> -Original Message-
> From: Olivier Dobberkau [mailto:olivier.dobber...@dkd.de]
> Sent: Wednesday, February 04, 2009 6:50 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Severe errors in solr configuration
>
> A slash?
>
> Olivier
>
> Von meinem iPhone gesendet
>
>
> Am 04.02.2009 um 14:06 schrieb Anto Binish Kaspar :
>
> > I am using Context file, here is my solr.xml
> >
> > $ cat /var/lib/tomcat6/conf/Catalina/localhost/solr.xml
> >
> >  > debug="0" crossContext="true" >
> > 
> > 
> >
> > I change the ownership of the folder (usr/local/solr/solr-1.3/solr)
> > to tomcat6:tomcat6 from root:root

Highlighting on Prefix-Search Bug/Workaround (Re: query with stemming, prefix and fuzzy?)

2009-02-04 Thread Gert Brinkmann
Mark Miller wrote:

>> Currently I think about dropping the stemming and only use
>> prefix-search. But as highlighting does not work with a prefix "house*"
>> this is a problem for me. The hint to use "house?*" instead does not
>> work here.
>>   
> Thats because wildcard queries are also not highlightable now. I
> actually have somewhat of a solution to this that I'll work on soon
> (I've gotten the ground work for it in or ready to be in Lucene). No
> guarantee on when or if it will be accepted in solr though.

As I am writing in perl (using WebService::Solr) I found the workaround
to use the Search::Tools module for highlighting "manually" in those
cases if Solr does not return snippets. This seems to work fine, but the
drawback is, that I need Solr to return the full data field in a query.
This can be expensive on larger documents. But I hope this is just a
temporal workaround until Solr 1.4...

Thanks,
Gert



Differences in output of spell checkers

2009-02-04 Thread Marcus Stratmann

Hello,

I'm trying to learn how to use the spell checkers of solr (1.3). I found 
out that FileBasedSpellChecker and IndexBasedSpellChecker produce 
different outputs.


IndexBasedSpellChecker says




1
0
4
0

85
game


false



whereas FileBasedSpellChecker returns




1
0
4

game





The differences are the usage of  respectively  for markup of 
the suggestions, missing frequences and missing "correctlySpelled" in 
FileBasedSpellChecker. Is that a bug or a feature? Or are there simply 
no universal rules for the format of the ouput? The differences make 
parsing more difficult if you use IndexBasedSpellChecker and 
FileBasedSpellChecker.


Thanks,
Marcus


Boost function

2009-02-04 Thread Tushar_Gandhi

Hi,
   I want to know about boosting. What is the use ?
How we can implement that? and How it will affect my search results?

Thanks,
Tushar
-- 
View this message in context: 
http://www.nabble.com/Boost-function-tp21829651p21829651.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Total count of facets

2009-02-04 Thread Bruno Aranda
Thanks, I will try that though I am talking in my case about 100,000+
distinct surnames/towns maximum per query and I just needed the count and
not the whole list. In any case, this brute-force approach is still
something I can try but I wonder how this will behave speed and memory wise
when there are many different concurrent queries and so on...

Cheers,

Bruno

2009/2/4 Shalin Shekhar Mangar 

> On Wed, Feb 4, 2009 at 2:53 PM, Bruno Aranda 
> wrote:
>
> > Mmh, thanks for your answer but with that I get the count of names
> starting
> > with A*, but I would like to get the count of distinct surnames (or town
> > names, or any other field that is not the name...) for the people with
> name
> > starting with A*. Is that possible?
> >
>
> It is possible. You can use fq=name:A* to filter people whose names start
> with 'A'. Then you can use facet.field=surnames or facet.field=town or
> whatever you want with facet.limit=-1 and count the number of results for
> each facet. It may be slow for the first query but it is cached so
> subsequent queries should be faster (make sure you size filterCache
> appropriately).
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: DIH, assigning multiple xpaths to the same solr field: solved

2009-02-04 Thread Fergus McMenemie
Thanks Shalin,

Using the following appears to work properly!
   
   
   
   

Regards Fergus

>On Wed, Feb 4, 2009 at 1:35 AM, Fergus McMenemie  wrote:
>
>>   >  dataSource="myfilereader"
>>  processor="XPathEntityProcessor"
>>  url="${jc.fileAbsolutePath}"
>>  stream="false"
>>  forEach="/record">
>>   
>>   
>>   
>>   
>>
>> Below is the line from my schema.xml
>>
>>   >  multiValued="true"/>
>>
>> Now a given document will only have one style of layout, and of course
>> the /a/b/c /d/e/f/g  stuff is made up. For a document that has a single
>> Hello world element I see search results as follows, the
>> one  string seems to have been entered into the index four times.
>> I only saw duplicate results before adding the extra made-up stuff.
>>
>>
>I think there is something fishy with the XPathEntityProcessor. For now, I
>think you can work around by giving each field a different 'column' and
>attribute 'name=para' on each of them.
>
>-- 
>Regards,
>Shalin Shekhar Mangar.

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: Total count of facets

2009-02-04 Thread Yonik Seeley
On Wed, Feb 4, 2009 at 5:42 AM, Bruno Aranda  wrote:
> Unfortunately, after some tests listing all the distinct surnames or other
> fields is too slow and too memory consuming with our current infrastructure.
> Could someone confirm that if I wanted to add this functionality (just count
> the total of different facets) what I should do is to subclass the
> SimpleFacets class and create an extended FacetComponent that returns the
> size of the term counts list instead of the list itself?

This wouldn't be too hard to do... and I think it's been requested in
the past at least a few times:
http://www.lucidimagination.com/search/document/7ab1d7fff1fb556e/numfound_for_facet_results

The slightly harder part is changing the response format in a backward
compatible way.

-Yonik


Re: Severe errors in solr configuration

2009-02-04 Thread Olivier Dobberkau


Am 04.02.2009 um 15:50 schrieb Anto Binish Kaspar:

Yes I removed, still I have the same issue. Any idea what may be  
cause of this issue?



Have you solved your problem?

Olivier
--
Olivier Dobberkau

Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstr. 79
D 60329 Frankfurt/Main

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer:
Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

fon:  +49 (0)69 - 43 05 61-70
fax:  +49 (0)69 - 43 05 61-90
mail: olivier.dobber...@dkd.de
home: http://www.dkd.de

aktuelle TYPO3-Projekte:
www.licht.de - Relaunch (TYPO3)
www.lahmeyer.de - Launch (TYPO3)
www.seb-assetmanagement.de - Relaunch (TYPO3)


Re: exceeded limit of maxWarmingSearchers

2009-02-04 Thread Jon Drukman

Otis Gospodnetic wrote:

That should be fine (but apparently isn't), as long as you don't have some very 
slow machine or if your caches are are large and configured to copy a lot of 
data on commit.



this is becoming more and more problematic.  we have periods where we 
get 10 of these exceptions in a 4 second period.  how do i diagnose what 
the cause is, or alternatively work around it?


when you say "copy" are you talking about copyFields or something else?

we commit on every update, but each update is very small... just a few 
hundred bytes on average.




Re: DIH using values from solrconfig.xml inside data-config.xml

2009-02-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
The implementation assumed that most of the users have xml with a
fixed schema. . In that case giving absolute path is not hard. This
helps us deal with a large subset of usecases rather easily.

We have not added all the features which are possible with a
streaming parser. It is wiser to piggyback on some real XPath engine
for because the demand for full xpath support will always be there.
--Noble

On Wed, Feb 4, 2009 at 5:15 PM, Fergus McMenemie  wrote:
>>: > The solr data field is populated properly. So I guess that bit works.
>>: > I really wish I could use xpath="//para"
>>
>>: The limitation comes from streaming the XML instead of creating a DOM.
>>: XPathRecordReader is a custom streaming XPath parser implementation and
>>: streaming is easy only because we limit the syntax. You can use
>>: PlainTextEntityProcessor which gives the XML as a string to a  custom
>>: Transformer. This Transformer can create a DOM, run your XPath query and
>>: populate the fields. It's more expensive but it is an option.
>>
>>Maybe it's just me, but it seems like i'm noticing that as DIH gets used
>>more, many people are noting that the XPath processing in DIH doesn't work
>>the way they expect because it's a custom XPath parser/engine designed for
>>streaming.
>>
>>It seems like it would be helpful to have an alternate processor for
>>people who don't need the streaming support (ie: are dealing with small
>>enough docs that they can load the full DOM tree into memory) that would
>>use the default Java XPath engine (and have less caveats/suprises) ... i
>>wou think it would probably even make sense for this new XPath processor
>>to be the one we suggest for new users, and only suggest the existing
>>(stream based) processor if they have really big xml docs to deal with.
>>
>>(In hindsight XPathEntityProcessor and XPathRecordReader should probably
>>have been named StreamingXPathEntityProcessor and
>>StreamingXPathRecordReader)
>>
> Four thoughts!
>
> 1) My use case involves a few million XML documents ranging in size
>   from a few K to 500K. 95% of the documents are under 25KBytes,
>   5 of the documents are around 0.5Mbytes. So.. sod it, I think I
>   need a streaming parser.
>
> 2) "streaming XPath parser"? I only half understand all this stuff,
>   but, and this is based on the little bit of SAX stuff I have written,
>   I would have thought that //para was trivial for any kind of
>   streaming XML parser.
>
> 3) Much of the confusion may be arising because the DIH wiki page is
>   not to clear on what is and is not allowed. We need better,
>   more explicit examples. What seems to be allowed is:-
>
>
>
>   I will add these to the wiki. Just to be sure, I tested
>   xpath="//para". It does not work!
>
> 4) XML documents are ether well structured with good separation of
>   data and presentation in which case absolute xpaths work fine.
>   Or older, in my case text documents, which have been forced into
>   XML format with poor structure where the data and presentation
>   is all mixed up. I suspect that the addition of //para would
>   cover many of the use cases, and what was left could be covered
>   by a preceding XSLT transform.
> --
>
> ===
> Fergus McMenemie   Email:fer...@twig.me.uk
> Techmore Ltd   Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets Analyst Programmer
> ===
>



-- 
--Noble Paul


Multiple uniqueKey problems

2009-02-04 Thread Bruno Mateus
Hello,

I'm facing some problems in generating a compound unique key. I'm
indexing some database tables not related with each other. In my
data-config.xml I have the following












Column "alias" and "id" don't exist on the database. In my schema.xml
I have the following:

  
  

   
   

   id

When I do a full import I get the following error:

18:47:40,530 ERROR [STDERR] 4/Fev/2009 18:47:40
org.apache.solr.handler.dataimport.SolrWriter upload
WARNING: Error creating document :
SolrInputDocumnt[{node_nodeid=node_nodeid(1.0)={6706},
node_name=node_name(1.0)={CPE_106122644}}]
org.apache.solr.common.SolrException: Document [null] missing required field: id
at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:289)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:58)
at 
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69)
at 
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:288)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)


I suppose I'm missing some configuration. Is the way I'm generating
the id correct?

Thks


Custom Sorting Algorithm

2009-02-04 Thread wojtekpia

Is an easy way to choose/create an alternate sorting algorithm? I'm
frequently dealing with large result sets (a few million results) and I
might be able to benefit domain knowledge in my sort.
-- 
View this message in context: 
http://www.nabble.com/Custom-Sorting-Algorithm-tp21837721p21837721.html
Sent from the Solr - User mailing list archive at Nabble.com.



Spell checking not returning "full" terms

2009-02-04 Thread Rupert Fiasco
We are using Solr 1.3 and trying to get spell checking functionality.

FYI, our index contains a lot of medical terms (which might or might
not make a difference as they are not English-y words, if that makes
any sense?)

If I specify a spellcheck query of "spellcheck.q=diabtes"

I get suggestions of:

diabet
diabetogen
dilat
diamet
diatom
diastol
diactin
dialect

If I re-mis-spell Diabetes to "q=diabets" then I go no suggestions.

So first off two things:

1) Why would leaving out one "e" over the other affect the spelling
suggestions so substantially?
2) In the former list of suggestions, notice the first suggestion is
"diabet", which isnt all that helpful, it should return something like
"diabetes" or maybe even "diabetic".

Note that if I do a normal search against "diabetes" then I get a ton
of results, in other words, our index is filled with terms of
"diabetes".

My relevant solrconfig is:


text


  default
  text_t
  ./spellchecker1
  0.1



  jarowinkler
  text_t
  
  org.apache.lucene.search.spell.JaroWinklerDistance
  ./spellchecker2
  0.1



and I have

spellcheck.count = 8

Notice that I severely bumped down the "accuracy" setting to get more
results. Bumping it up higher yields less results (not sure what
setting really meant so I dont know in what direction I want to change
that value - I am guessing that a lower value allows for more
mis-spellings, e.g. its more promiscuous).

Our "text" and "text_t" fields are defined in schema.xml as:


and


Any help would be appreciated.

Thanks
-Rupert


Queued Requests during GC

2009-02-04 Thread wojtekpia

During full garbage collection, Solr doesn't acknowledge incoming requests.
Any requests that were received during the GC are timestamped the moment GC
finishes (at least that's what my logs show). Is there a limit to how many
requests can queue up during a full GC? This doesn't seem like a Solr
setting, but rather a container/OS setting (I'm using Tomcat on Linux).

Thanks.

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Queued-Requests-during-GC-tp21837898p21837898.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Spell checking not returning "full" terms

2009-02-04 Thread Grant Ingersoll
I'm guessing the field you are checking against is being stemmed.  The  
field you spell check against should have minimal analysis done to it,  
i.e. tokenization and probably downcasing.  See http://wiki.apache.org/solr/SpellCheckComponent 
 and http://wiki.apache.org/solr/SpellCheckerRequestHandler for tips  
on how to handle analysis for spelling.


On Feb 4, 2009, at 2:33 PM, Rupert Fiasco wrote:


We are using Solr 1.3 and trying to get spell checking functionality.

FYI, our index contains a lot of medical terms (which might or might
not make a difference as they are not English-y words, if that makes
any sense?)

If I specify a spellcheck query of "spellcheck.q=diabtes"

I get suggestions of:

diabet
diabetogen
dilat
diamet
diatom
diastol
diactin
dialect

If I re-mis-spell Diabetes to "q=diabets" then I go no suggestions.

So first off two things:

1) Why would leaving out one "e" over the other affect the spelling
suggestions so substantially?
2) In the former list of suggestions, notice the first suggestion is
"diabet", which isnt all that helpful, it should return something like
"diabetes" or maybe even "diabetic".

Note that if I do a normal search against "diabetes" then I get a ton
of results, in other words, our index is filled with terms of
"diabetes".

My relevant solrconfig is:


   text

   
 default
 text_t
 ./spellchecker1
 0.1

   
   
 jarowinkler
 text_t
 
 name 
= 
"distanceMeasure 
">org.apache.lucene.search.spell.JaroWinklerDistance

 ./spellchecker2
 0.1

   

and I have

spellcheck.count = 8

Notice that I severely bumped down the "accuracy" setting to get more
results. Bumping it up higher yields less results (not sure what
setting really meant so I dont know in what direction I want to change
that value - I am guessing that a lower value allows for more
mis-spellings, e.g. its more promiscuous).

Our "text" and "text_t" fields are defined in schema.xml as:


and


Any help would be appreciated.

Thanks
-Rupert


--
Grant Ingersoll
http://www.lucidimagination.com/

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ













Re: Queued Requests during GC

2009-02-04 Thread Sridhar Basam


That is the expected behaviour, all application threads are paused 
during GC (CMS collector being an exception, there are smaller pauses 
but the application threads continue to mostly run). The number of 
connections that could end up being queued would depend on your 
acceptCount setting in the server.xml file and also the inbound request 
rate and the time the GC takes to complete.


The OS will queue upto acceptCount requests before it begins to ignore 
incoming tcp connection requests. So if your inbound request rate is 2 
per second and a full GC takes 6 seconds to complete, you should have 12 
(2x6) new requests waiting for you when GC completes.


 Sridhar


wojtekpia wrote:

During full garbage collection, Solr doesn't acknowledge incoming requests.
Any requests that were received during the GC are timestamped the moment GC
finishes (at least that's what my logs show). Is there a limit to how many
requests can queue up during a full GC? This doesn't seem like a Solr
setting, but rather a container/OS setting (I'm using Tomcat on Linux).

Thanks.

Wojtek
  




Re: exceeded limit of maxWarmingSearchers

2009-02-04 Thread Otis Gospodnetic
Jon,

If you can, don't commit on every update and that should help or fully solve 
your problem.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Jon Drukman 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, February 4, 2009 1:09:00 PM
> Subject: Re: exceeded limit of maxWarmingSearchers
> 
> Otis Gospodnetic wrote:
> > That should be fine (but apparently isn't), as long as you don't have some 
> very slow machine or if your caches are are large and configured to copy a 
> lot 
> of data on commit.
> 
> 
> this is becoming more and more problematic.  we have periods where we get 10 
> of 
> these exceptions in a 4 second period.  how do i diagnose what the cause is, 
> or 
> alternatively work around it?
> 
> when you say "copy" are you talking about copyFields or something else?
> 
> we commit on every update, but each update is very small... just a few 
> hundred 
> bytes on average.



Re: Differences in output of spell checkers

2009-02-04 Thread Grant Ingersoll


On Feb 4, 2009, at 11:02 AM, Marcus Stratmann wrote:


Hello,

I'm trying to learn how to use the spell checkers of solr (1.3). I  
found out that FileBasedSpellChecker and IndexBasedSpellChecker  
produce different outputs.


IndexBasedSpellChecker says




1
0
4
0

85
game


false



whereas FileBasedSpellChecker returns




1
0
4

game





The differences are the usage of  respectively  for markup  
of the suggestions, missing frequences and missing  
"correctlySpelled" in FileBasedSpellChecker. Is that a bug or a  
feature? Or are there simply no universal rules for the format of  
the ouput? The differences make parsing more difficult if you use  
IndexBasedSpellChecker and FileBasedSpellChecker.


Are you sending in the same query to both?  Frequency and word only  
get printed when extendedResults == true.  correctlySpelled only gets  
printed when there is Index frequency information.  For the  
FileBasedSpellChecker, there is no Frequency information, so it isn't  
returned.


The logic for constructing this is all handled in the  
SpellCheckComponent.toNamedList() method and is completely separated  
from the individual SpellChecker implementations.


HTH,
Grant


--
Grant Ingersoll
http://www.lucidimagination.com/

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ













Re: Custom Sorting Algorithm

2009-02-04 Thread Otis Gospodnetic
Hi,

You can use one of the exiting function queries (if they fit your need) or 
write a custom function query to reorder the results of a query.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: wojtekpia 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, February 4, 2009 2:28:56 PM
> Subject: Custom Sorting Algorithm
> 
> 
> Is an easy way to choose/create an alternate sorting algorithm? I'm
> frequently dealing with large result sets (a few million results) and I
> might be able to benefit domain knowledge in my sort.
> -- 
> View this message in context: 
> http://www.nabble.com/Custom-Sorting-Algorithm-tp21837721p21837721.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Queued Requests during GC

2009-02-04 Thread Otis Gospodnetic
Wojtek,

I'm not familiar with the details of Tomcat configuration, but this definitely 
sounds like a container issue, closely related to the JVM.

Doing a thread dump for the Java process (the JVM your TOmcat runs in) while 
the GC is running will show you which threads are blocked and in turn that 
should point you in the right direction as far as Tomcat setting is covered.  
Sorry for not being able to give you a more specific answer.

Is this happening with the latest JVM from Sun?

I'd be curious if you could reproduce this in Jetty

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: wojtekpia 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, February 4, 2009 2:37:46 PM
> Subject: Queued Requests during GC
> 
> 
> During full garbage collection, Solr doesn't acknowledge incoming requests.
> Any requests that were received during the GC are timestamped the moment GC
> finishes (at least that's what my logs show). Is there a limit to how many
> requests can queue up during a full GC? This doesn't seem like a Solr
> setting, but rather a container/OS setting (I'm using Tomcat on Linux).
> 
> Thanks.
> 
> Wojtek
> -- 
> View this message in context: 
> http://www.nabble.com/Queued-Requests-during-GC-tp21837898p21837898.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Custom Sorting Algorithm

2009-02-04 Thread wojtekpia

That's not quite what I meant. I'm not looking for a custom comparator, I'm
looking for a custom sorting algorithm. Is there a way to use quick sort or
merge sort or... rather than the current algorithm? Also, what is the
current algorithm?


Otis Gospodnetic wrote:
> 
> 
> You can use one of the exiting function queries (if they fit your need) or
> write a custom function query to reorder the results of a query.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Custom-Sorting-Algorithm-tp21837721p21838804.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Total count of facets

2009-02-04 Thread Erik Hatcher
What about using the luke request handler to get the distinct values  
count?  Although it is pretty seriously heavy on a big index, so  
probably not quite workable in your case.


Erik

On Feb 4, 2009, at 12:54 PM, Yonik Seeley wrote:

On Wed, Feb 4, 2009 at 5:42 AM, Bruno Aranda   
wrote:
Unfortunately, after some tests listing all the distinct surnames  
or other
fields is too slow and too memory consuming with our current  
infrastructure.
Could someone confirm that if I wanted to add this functionality  
(just count

the total of different facets) what I should do is to subclass the
SimpleFacets class and create an extended FacetComponent that  
returns the

size of the term counts list instead of the list itself?


This wouldn't be too hard to do... and I think it's been requested in
the past at least a few times:
http://www.lucidimagination.com/search/document/7ab1d7fff1fb556e/numfound_for_facet_results

The slightly harder part is changing the response format in a backward
compatible way.

-Yonik




Re: Custom Sorting Algorithm

2009-02-04 Thread Mark Miller
It would not be simple to use a new algorithm. The current 
implementation takes place at the Lucene level and uses a priority 
queue. When you ask for the top n results, a priority queue of size n is 
filled with all of the matching documents. The ordering in the priority 
queue is the sort. The on Sort method orders by relevance score - the 
Sort method orders by field, relevance, or doc id.


- Mark

wojtekpia wrote:

That's not quite what I meant. I'm not looking for a custom comparator, I'm
looking for a custom sorting algorithm. Is there a way to use quick sort or
merge sort or... rather than the current algorithm? Also, what is the
current algorithm?


Otis Gospodnetic wrote:
  

You can use one of the exiting function queries (if they fit your need) or
write a custom function query to reorder the results of a query.





  




Re: Total count of facets

2009-02-04 Thread Yonik Seeley
On Wed, Feb 4, 2009 at 3:47 PM, Erik Hatcher  wrote:
> What about using the luke request handler to get the distinct values count?

That wouldn't restrict results by the base query and filters.

-Yonik


Re: Custom Sorting Algorithm

2009-02-04 Thread wojtekpia

Ok, so maybe a better question is: should I bother trying to change the
"sorting" algorithm? I'm concerned that with large data sets, sorting
becomes a severe bottleneck (this is an assumption, I haven't profiled
anything to verify). Does it become a severe bottleneck? Do you know if
alternate sort algorithms have been tried during Lucene development? 



markrmiller wrote:
> 
> It would not be simple to use a new algorithm. The current 
> implementation takes place at the Lucene level and uses a priority 
> queue. When you ask for the top n results, a priority queue of size n is 
> filled with all of the matching documents. The ordering in the priority 
> queue is the sort. The on Sort method orders by relevance score - the 
> Sort method orders by field, relevance, or doc id.
> 

-- 
View this message in context: 
http://www.nabble.com/Custom-Sorting-Algorithm-tp21837721p21840299.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Queued Requests during GC

2009-02-04 Thread Yonik Seeley
On Wed, Feb 4, 2009 at 3:12 PM, Otis Gospodnetic
 wrote:
> I'd be curious if you could reproduce this in Jetty

All application threads are blocked... it's going to be the same in
Jetty or Tomcat or any other container that's pure Java.  There is an
OS level listening queue that has a certain depth (configurable in
both tomcat and jetty and passed down to the OS when listen() for the
socket is called).  If too many connections are initiated without
being accepted, they will start being rejected.

See UNIX man pages for listen() and connect() for more details.

For Tomcat, the config param you want is "acceptCount"
http://tomcat.apache.org/tomcat-6.0-doc/config/http.html

Increasing this will ensure that connections don't get rejected while
a long GC is going on.

-Yonik


Re: Custom Sorting Algorithm

2009-02-04 Thread Yonik Seeley
On Wed, Feb 4, 2009 at 4:45 PM, wojtekpia  wrote:
> Ok, so maybe a better question is: should I bother trying to change the
> "sorting" algorithm? I'm concerned that with large data sets, sorting
> becomes a severe bottleneck (this is an assumption, I haven't profiled
> anything to verify).

No... Lucene/Solr never sorts the complete result set.
If you ask for the top 10 results, a priority queue (heap) of the
current top 10 results is maintained... far more efficient and
scalable than sorting all the hits at the end.

-Yonik


Re: Queued Requests during GC

2009-02-04 Thread Walter Underwood
This is when a load balancer helps. The requests sent around the
time that the GC starts will be stuck on that server, but later
ones can be sent to other servers.

We use a "least connections" load balancing strategy. Each connection
represents a request in progress, so this is the same as equalizing
the queue of requests for each server.

Also, only use as much heap as you really need. A larger heap
means longer GCs.

wunder

On 2/4/09 1:59 PM, "Yonik Seeley"  wrote:

> On Wed, Feb 4, 2009 at 3:12 PM, Otis Gospodnetic
>  wrote:
>> I'd be curious if you could reproduce this in Jetty
> 
> All application threads are blocked... it's going to be the same in
> Jetty or Tomcat or any other container that's pure Java.  There is an
> OS level listening queue that has a certain depth (configurable in
> both tomcat and jetty and passed down to the OS when listen() for the
> socket is called).  If too many connections are initiated without
> being accepted, they will start being rejected.
> 
> See UNIX man pages for listen() and connect() for more details.
> 
> For Tomcat, the config param you want is "acceptCount"
> http://tomcat.apache.org/tomcat-6.0-doc/config/http.html
> 
> Increasing this will ensure that connections don't get rejected while
> a long GC is going on.
> 
> -Yonik



Re: Queued Requests during GC

2009-02-04 Thread Mark Miller

Walter Underwood wrote:

Also, only use as much heap as you really need. A larger heap
means longer GCs.
  
Right. Ideally you want to figure out how to get longer pauses down. 
There is a lot of fiddling that you can do to improve gc times.


On a multiprocessor machine you can parallelize collection of both the 
new and tenured spaces for a nice boost. You can resize spaces within 
the heap as well. There is also a low pause incremental collector you 
can try. A lot of this type of tuning takes trial and error and 
experience though. A really helpful tool is visualgc, which lets you 
watch garbage collection for your app in realtime. You can also use 
jconsole and other tools like that, but visualgc actually renders a view 
of the heap and its easier to watch and get a feel for how garbage 
collection is working. If its hard to get a GUI up, all of those tools 
work remotely as well.


You can find a lot of good info on things to try here:

http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html

If there are spots in Lucene/Solr that are producing so much garbage 
that we can't keep up, perhaps work can be done to address this upon 
pinpointing the issues.


- Mark


Re: Queued Requests during GC

2009-02-04 Thread Walter Underwood
On 2/4/09 2:48 PM, "Mark Miller"  wrote:

> If there are spots in Lucene/Solr that are producing so much garbage
> that we can't keep up, perhaps work can be done to address this upon
> pinpointing the issues.
> 
> - Mark

I have not had the time to pin it down, but I suspect that items
evicted from the query result cache contain a lot of objects.
Are the keys a full parse tree? That could be big.

wunder



Re: Queued Requests during GC

2009-02-04 Thread Yonik Seeley
On Wed, Feb 4, 2009 at 5:52 PM, Walter Underwood  wrote:
> I have not had the time to pin it down, but I suspect that items
> evicted from the query result cache contain a lot of objects.
> Are the keys a full parse tree? That could be big.

Yes, keys are full Query objects.
It would be non-trivial to switch to String given all of the things
that can affect how a Query object is built.

-Yonik


Re: Queued Requests during GC

2009-02-04 Thread Walter Underwood
Aha! I bet that the full Query object became a lot more complicated
between Solr 1.1 and 1.3. That would explain why we did 4X as much GC
after the upgrade.

Items evicted from cache are tenured, so they contribute to the full GC.
With an HTTP cache in front, there is hardly anything left to be
cached, so there are lots of evictions. We get a query result cache
hit rate around 0.12.

wunder

On 2/4/09 3:01 PM, "Yonik Seeley"  wrote:

> On Wed, Feb 4, 2009 at 5:52 PM, Walter Underwood 
> wrote:
>> I have not had the time to pin it down, but I suspect that items
>> evicted from the query result cache contain a lot of objects.
>> Are the keys a full parse tree? That could be big.
> 
> Yes, keys are full Query objects.
> It would be non-trivial to switch to String given all of the things
> that can affect how a Query object is built.
> 
> -Yonik



Re: Queued Requests during GC

2009-02-04 Thread Mark Miller

Walter Underwood wrote:

Aha! I bet that the full Query object became a lot more complicated
between Solr 1.1 and 1.3. That would explain why we did 4X as much GC
after the upgrade.

Items evicted from cache are tenured, so they contribute to the full GC.
With an HTTP cache in front, there is hardly anything left to be
cached, so there are lots of evictions. We get a query result cache
hit rate around 0.12.

wunder
  
At 10%, have you considered just not using the cache? Is that worth all 
the extra work? Or are you not paying as much as your losing in GC/cache 
time?


- Mark



Re: Highlighting Oddities

2009-02-04 Thread ashokc

I have seen some of these oddities that Chris is referring to. In my case,
terms that are NOT in the query get highlighted. For example searching for
'Intel' highlights 'Microsot Corp' as well. I do not have them as synonyms
either. Do these filter factories add some extra intelligence to the index
in that if you search for 'Samsung' even 'LG' is considered a highlightable
term?

I believe this was not the case when I was working with an earlier
development version (from Nov or early Dec). Right now I am using
solr-2008-12-29.war.

- ashok



ryguasu wrote:
> 
> I'm testing out the default (gap) fragmenter with some simple,
> single-word queries on a patched 1.3.0 release populated with some
> real-world data. (I think the primary quirk in my setup is that I'm
> using ShingleFilterFactory to put word bigrams (aka shingles) into my
> index. I was worried that this might mess up highlighting, but
> highlighting is *mostly* working.) There are some oddities here, and
> I'm wondering if people have any suggestions for debugging my setup
> and/or trying to make a good, reproducible test case.
> 
> 1. The main weird thing is that, the vast majority of the time, the
> highlighted term is the last term in the fragment. For example, if I
> search for "cat", then almost all my fragments look like this:
> 
> fragment 1: "to the *cat*"
> fragment 2: "with the *cat*"
> fragment 3: "it's what the *cat*"
> fragment 4: "Once upon a time the *cat*"
> 
> (My actual fragments are longer. The key to note is that all of these
> examples end in "cat".)
> 
> Sometimes "cat" will appear at somewhere other than the last position,
> but this is rare. My expectation, in contrast, is that "cat" would
> tend to be more or less evenly distributed throughout fragment
> positions.
> 
> Note: I tried to reproduce this on 1.3.0 with my patches applied but
> using the example dataset/schema from the Solr source tree rather than
> my own dataset/schema. With the example dataset this didn't seem to be
> an issue.
> 
> I've experienced three other highlighting issues, which may or may not
> be related:
> 
> 2. Sometimes, if a term appears multiple times in a fragment, not just
> the term but all the words in between the two appearances will get
> highlighted too. For example, I searched for "fear", and got this as
> one of the snippets:
> 
> SETTLEMENT AGREEMENT This Agreement ("the Agreement") is entered
> into this 18th day of August, 2008, by
> and between Cape Fear Bank Corporation, a North Carolina
> corporation (the "Company"), and Cape Fear
> 
> In contrast, I would have expected
> 
> SETTLEMENT AGREEMENT This Agreement ("the Agreement") is entered
> into this 18th day of August, 2008, by
> and between Cape Fear Bank Corporation, a North Carolina
> corporation (the "Company"), and Cape Fear
> 
> 3. My install seems to have a curiously liberal interpretation of
> hl.fragsize. Now if I put hl.fragsize=0, then things are as expected,
> i.e. it highlights the whole field. And it also seems more or less
> true (as it should) that as I increase hl.fragsize, the fragments get
> longer. However, I was surprised to see that when I put hl.fragsize=1
> or hl.fragsize=5, I can get fragments as long as this one:
> 
> addition, we believe the wireless feature for our controller will
> facilitate exceptional customer services and
> response time." About GpsLatitude GpsLatitude, a Montreal-based
> company, is a provider of security
> solutions and tracking for mobile assets. It is also a developer
> of advanced " Videlocalisation" , a cost-effective,
> integrated mobile digital video
> 
> That seems shockingly long for something of size "five".
> 
> 4. Very rarely I'll get a fragment that doesn't actually contain any
> of the search terms. For example, maybe I'll search for "cat", and
> I'll get back "three ounces of milk" as a snippet. I need to explore
> this more, though the last time this happened when I opened the
> document and found that when I located "three ounces of milk" in the
> document text, the word "cat" did appear nearby; so maybe the document
> did contain "three ounces of milk for the cat".
> 
> Obviously I'm not describing my setup in much detail. Let me know what
> you think would be helpful to know more about.
> 
> Thanks,
> Chris
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Highlighting-Oddities-tp20351015p21841992.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Queued Requests during GC

2009-02-04 Thread Walter Underwood
On 2/4/09 3:17 PM, "Mark Miller"  wrote:

> Walter Underwood wrote:
>> Aha! I bet that the full Query object became a lot more complicated
>> between Solr 1.1 and 1.3. That would explain why we did 4X as much GC
>> after the upgrade.
>> 
>> Items evicted from cache are tenured, so they contribute to the full GC.
>> With an HTTP cache in front, there is hardly anything left to be
>> cached, so there are lots of evictions. We get a query result cache
>> hit rate around 0.12.
>> 
>> wunder
>>   
> At 10%, have you considered just not using the cache? Is that worth all
> the extra work? Or are you not paying as much as your losing in GC/cache
> time?

I was going to verify the source of the tenured garbage before starting
another round of trial-and-error tuning. Now that I have a good hunch,
I might spend some time on that after the Oscars (our peak day for the
year at Netflix).

Another approach is to get fancy with the load balancing and always
send the same query back to the same server. That increases the
effective cache size by the number of servers, but it forces a
simplistic round-robin load balancing and you have to be careful
with down servers to avoid blowing all the caches simultaneously.

At Infoseek, we learned that blowing all the caches when one server
goes down is a very bad idea.

wunder




Re: Queued Requests during GC

2009-02-04 Thread Chris Hostetter

: >> Aha! I bet that the full Query object became a lot more complicated
: >> between Solr 1.1 and 1.3. That would explain why we did 4X as much GC
: >> after the upgrade.

I don't thinkg the Query class implementations themselves changed in 
anyway that would have made them larger -- but if you switched from the 
standard parser to dismax parser, or started using lots of boost 
queries, or started using prefix or wildcard queries, then yes: the Query 
objects used would have gotten bigger.

: Another approach is to get fancy with the load balancing and always
: send the same query back to the same server. That increases the
: effective cache size by the number of servers, but it forces a
: simplistic round-robin load balancing and you have to be careful
: with down servers to avoid blowing all the caches simultaneously.

at a certain point, if you have enough machines, a two tiered LB situation 
starts to be worth consideration.  tier#1 can use hashing on the 
querystring to pick which tier#2 cluster to send the query to.  each 
tier#2 cluster can be fronted by a load balancer that picks the server to 
use based on whatever "workload" metric you want.  a small percentage of 
machines in any given cluster (or in every cluster) can be down w/o 
worrying about screwing up the caches or adversly afecting traffic -- you 
just can't let an entire cluster be down at once.



-Hoss



Latest on DataImportHandler and Tika?

2009-02-04 Thread Chris Harris
Back in November, Shalin and Grant were discussing integrating
DataImportHandler and Tika. Shalin's estimation about the best way to
do this was as follows:

**

I think the best way would be a TikaEntityProcessor which knows how to
handle documents. I guess a typical use-case would be
FileListEntityProcessor->TikaEntityProcessor as parent-child entities.

Also see SOLR-833 which adds a FieldReaderDataSource using which you can
pass any field's content to an entity for processing. So you can have a
[SqlEntityProcessor, JdbcDataSource] producing a blob and a
[FieldReaderDataSource, TikaEntityProcessor] consuming it.

(http://www.nabble.com/DataImportHandler-and-Blobs-td20464891.html)

**

Has there been any work on something like this? Alternatively, is
anyone else put together an alternative way to get DataImportHandler
to extract body text from PDFs, Word files, etc.?

Thanks,
Chris


Re: Queued Requests during GC

2009-02-04 Thread Walter Underwood
On 2/4/09 3:44 PM, "Chris Hostetter"  wrote:

> I don't thinkg the Query class implementations themselves changed in
> anyway that would have made them larger -- but if you switched from the
> standard parser to dismax parser, or started using lots of boost
> queries, or started using prefix or wildcard queries, then yes: the Query
> objects used would have gotten bigger.

Could have been caused by fuzzy search, since we did that around
the same time. Lucene changed from 1.9 to 2.4, so I thought there
might have been some changes there.

wunder




Maximum Term Frequency and Minimum Document Length

2009-02-04 Thread Jonah Schwartz
We want to configure solr so that fields are indexed with a maximum term
frequency and a minimum document length. If a term appears more than N times
in a field it will be considered to have appeared only N times. If a
document length is under M terms, it will be considered to exactly M terms.
We have done this in the past in raw Lucene by writing a Similarity class
like this:

public class LimitingSimilarity extends DefaultSimilarity {
   public float lengthNorm(String fieldName, int numTerms) {
   return super.lengthNorm(fieldName, Math.max(minNumTerms, numTerms));
   }
   public float tf(float freq) {
   freq = Math.min(maxTermFrequency,freq);
   return super.tf(freq);
   }
}


Is there a better way to this within solr configuration files?

Thanks,
Jonah


Re: Spell checking not returning "full" terms

2009-02-04 Thread Rupert Fiasco
Awesome! After reading up on the links you sent me I got it all working. Thanks!

FYI - I did previously come across one of the links you sent over:

http://wiki.apache.org/solr/SpellCheckerRequestHandler

But what threw me off is that when I started reading about that
yesterday, in the first paragraph it says that this component is
deprecated and to use SpellCheckComponent - so at that point I stopped
reading and went over to the component page. If I had kept reading I
would have encountered all of the gritty details that I in fact needed
to get it to work. The wiki entry makes it seem old and deprecated and
is no longer relevant, but it certainly is.

-Rupert

On Wed, Feb 4, 2009 at 11:57 AM, Grant Ingersoll  wrote:
> I'm guessing the field you are checking against is being stemmed.  The field
> you spell check against should have minimal analysis done to it, i.e.
> tokenization and probably downcasing.  See
> http://wiki.apache.org/solr/SpellCheckComponent and
> http://wiki.apache.org/solr/SpellCheckerRequestHandler for tips on how to
> handle analysis for spelling.
>
> On Feb 4, 2009, at 2:33 PM, Rupert Fiasco wrote:
>
>> We are using Solr 1.3 and trying to get spell checking functionality.
>>
>> FYI, our index contains a lot of medical terms (which might or might
>> not make a difference as they are not English-y words, if that makes
>> any sense?)
>>
>> If I specify a spellcheck query of "spellcheck.q=diabtes"
>>
>> I get suggestions of:
>>
>> diabet
>> diabetogen
>> dilat
>> diamet
>> diatom
>> diastol
>> diactin
>> dialect
>>
>> If I re-mis-spell Diabetes to "q=diabets" then I go no suggestions.
>>
>> So first off two things:
>>
>> 1) Why would leaving out one "e" over the other affect the spelling
>> suggestions so substantially?
>> 2) In the former list of suggestions, notice the first suggestion is
>> "diabet", which isnt all that helpful, it should return something like
>> "diabetes" or maybe even "diabetic".
>>
>> Note that if I do a normal search against "diabetes" then I get a ton
>> of results, in other words, our index is filled with terms of
>> "diabetes".
>>
>> My relevant solrconfig is:
>>
>>
>>   text
>>
>>   
>> default
>> text_t
>> ./spellchecker1
>> 0.1
>>
>>   
>>   
>> jarowinkler
>> text_t
>> 
>> > name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance
>> ./spellchecker2
>> 0.1
>>
>>   
>>
>> and I have
>>
>> spellcheck.count = 8
>>
>> Notice that I severely bumped down the "accuracy" setting to get more
>> results. Bumping it up higher yields less results (not sure what
>> setting really meant so I dont know in what direction I want to change
>> that value - I am guessing that a lower value allows for more
>> mis-spellings, e.g. its more promiscuous).
>>
>> Our "text" and "text_t" fields are defined in schema.xml as:
>>
>> > multiValued="true"/>
>> and
>> > stored="true" multiValued="true" />
>>
>> Any help would be appreciated.
>>
>> Thanks
>> -Rupert
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
>
>
>


Re: Highlighting Oddities

2009-02-04 Thread ashokc

This problem went away when I updated to use the latest nightly release
(2009-02-04)

- ashok

ashokc wrote:
> 
> I have seen some of these oddities that Chris is referring to. In my case,
> terms that are NOT in the query get highlighted. For example searching for
> 'Intel' highlights 'Microsot Corp' as well. I do not have them as synonyms
> either. Do these filter factories add some extra intelligence to the index
> in that if you search for 'Samsung' even 'LG' is considered a
> highlightable term?
> 
> I believe this was not the case when I was working with an earlier
> development version (from Nov or early Dec). Right now I am using
> solr-2008-12-29.war.
> 
> - ashok
> 
> 
> 
> ryguasu wrote:
>> 
>> I'm testing out the default (gap) fragmenter with some simple,
>> single-word queries on a patched 1.3.0 release populated with some
>> real-world data. (I think the primary quirk in my setup is that I'm
>> using ShingleFilterFactory to put word bigrams (aka shingles) into my
>> index. I was worried that this might mess up highlighting, but
>> highlighting is *mostly* working.) There are some oddities here, and
>> I'm wondering if people have any suggestions for debugging my setup
>> and/or trying to make a good, reproducible test case.
>> 
>> 1. The main weird thing is that, the vast majority of the time, the
>> highlighted term is the last term in the fragment. For example, if I
>> search for "cat", then almost all my fragments look like this:
>> 
>> fragment 1: "to the *cat*"
>> fragment 2: "with the *cat*"
>> fragment 3: "it's what the *cat*"
>> fragment 4: "Once upon a time the *cat*"
>> 
>> (My actual fragments are longer. The key to note is that all of these
>> examples end in "cat".)
>> 
>> Sometimes "cat" will appear at somewhere other than the last position,
>> but this is rare. My expectation, in contrast, is that "cat" would
>> tend to be more or less evenly distributed throughout fragment
>> positions.
>> 
>> Note: I tried to reproduce this on 1.3.0 with my patches applied but
>> using the example dataset/schema from the Solr source tree rather than
>> my own dataset/schema. With the example dataset this didn't seem to be
>> an issue.
>> 
>> I've experienced three other highlighting issues, which may or may not
>> be related:
>> 
>> 2. Sometimes, if a term appears multiple times in a fragment, not just
>> the term but all the words in between the two appearances will get
>> highlighted too. For example, I searched for "fear", and got this as
>> one of the snippets:
>> 
>> SETTLEMENT AGREEMENT This Agreement ("the Agreement") is entered
>> into this 18th day of August, 2008, by
>> and between Cape Fear Bank Corporation, a North Carolina
>> corporation (the "Company"), and Cape Fear
>> 
>> In contrast, I would have expected
>> 
>> SETTLEMENT AGREEMENT This Agreement ("the Agreement") is entered
>> into this 18th day of August, 2008, by
>> and between Cape Fear Bank Corporation, a North Carolina
>> corporation (the "Company"), and Cape Fear
>> 
>> 3. My install seems to have a curiously liberal interpretation of
>> hl.fragsize. Now if I put hl.fragsize=0, then things are as expected,
>> i.e. it highlights the whole field. And it also seems more or less
>> true (as it should) that as I increase hl.fragsize, the fragments get
>> longer. However, I was surprised to see that when I put hl.fragsize=1
>> or hl.fragsize=5, I can get fragments as long as this one:
>> 
>> addition, we believe the wireless feature for our controller will
>> facilitate exceptional customer services and
>> response time." About GpsLatitude GpsLatitude, a Montreal-based
>> company, is a provider of security
>> solutions and tracking for mobile assets. It is also a developer
>> of advanced " Videlocalisation" , a cost-effective,
>> integrated mobile digital video
>> 
>> That seems shockingly long for something of size "five".
>> 
>> 4. Very rarely I'll get a fragment that doesn't actually contain any
>> of the search terms. For example, maybe I'll search for "cat", and
>> I'll get back "three ounces of milk" as a snippet. I need to explore
>> this more, though the last time this happened when I opened the
>> document and found that when I located "three ounces of milk" in the
>> document text, the word "cat" did appear nearby; so maybe the document
>> did contain "three ounces of milk for the cat".
>> 
>> Obviously I'm not describing my setup in much detail. Let me know what
>> you think would be helpful to know more about.
>> 
>> Thanks,
>> Chris
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Highlighting-Oddities-tp20351015p21843092.html
Sent from the Solr - User mailing list archive at Nabble.com.



Query on Level of Access to lucene in Solr

2009-02-04 Thread Nick
Hello there,

 I'm a solr newbie but i've used lucene for some complex
IR projects before.
 Can someone please help me understand the extent to which solr allows
access to lucene?
To elaborate, say, i'm considering the use of solr for all its wonderful
properties like scaling,
 distributed search, ease of updates,etc. I've a corpus of data that i'd
like lucene to index.
Further, I'm working on some graph research, where i'd like to disjunctively
query keyword
terms and use the indepent result sets as entry points into my graph of
documents.
I have my own data structures (in java) that handle efficient graph
walks,etc and eventually
apply a whole bunch of math to re-rank results/result trees.
In a more traditional setting, i can imagine using lucene as an external jar
dependency, hook
it up with the rest of my code in java and ship it off into Tomcat.

Is this doable with solr? Please help with comments on the specifc mechanics
of hooking up
custom java application logic with lucene before integrating with the rest
of the tomcat ecosystem.

Thank you very much.
Nick.


instanceDir value is incorrect in multicore environment

2009-02-04 Thread Mark Ferguson
Hello,

I have a problem with setting the instanceDir property for the cores in
solr.xml. When I set the value to be relative, it sets it as relative to the
location from which I started the application, instead of relative to the
solr.home property.

I am using Tomcat and I am creating a context for each instance of solr that
I am running in the conf/Catalina/localhost directory, as per the
instructions. For example, my solr1.xml file looks like this:


   


My solr.xml file in /srv/solr/solr1 looks something like this:


  
  
  ...


Now, from whatever location I start the app, it considers that my root
directory. For example if I run the command from this prompt:

m...@linux-1hpr:/tmp> ~/bin/tomcat/bin/catalina.sh start

It then creates a 'data' directory in /tmp with subdirectories p20, p0 etc.

Any ideas what I'm doing wrong? Thanks a lot.

Mark


Re: instanceDir value is incorrect in multicore environment

2009-02-04 Thread Mark Ferguson
I looked at the core status page and it looks like the problem isn't
actually the instanceDir property, but rather dataDir. It's not being
appended to instanceDir so its path is relative to cwd.

I'm using a patched version of Solr with some of my own custom changes
relating to dataDir, so this is probably just something I screwed up, so
feel free to ignore this email.

Mark


On Wed, Feb 4, 2009 at 6:25 PM, Mark Ferguson wrote:

> Hello,
>
> I have a problem with setting the instanceDir property for the cores in
> solr.xml. When I set the value to be relative, it sets it as relative to the
> location from which I started the application, instead of relative to the
> solr.home property.
>
> I am using Tomcat and I am creating a context for each instance of solr
> that I am running in the conf/Catalina/localhost directory, as per the
> instructions. For example, my solr1.xml file looks like this:
>
> 
> value="/srv/solr/solr1" override="true" />
> 
>
> My solr.xml file in /srv/solr/solr1 looks something like this:
>
> 
>   
>   
>   ...
> 
>
> Now, from whatever location I start the app, it considers that my root
> directory. For example if I run the command from this prompt:
>
> m...@linux-1hpr:/tmp> ~/bin/tomcat/bin/catalina.sh start
>
> It then creates a 'data' directory in /tmp with subdirectories p20, p0 etc.
>
> Any ideas what I'm doing wrong? Thanks a lot.
>
> Mark
>


Re: Latest on DataImportHandler and Tika?

2009-02-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
We have not taken up anything yet. The idea is to create another
contrib which will contain extensions to DIH which has external
dependencies as SOLR-934.
TikaEntityProcessor is something we wish to do but our limited
bandwidth has been the problem

On Thu, Feb 5, 2009 at 5:15 AM, Chris Harris  wrote:
> Back in November, Shalin and Grant were discussing integrating
> DataImportHandler and Tika. Shalin's estimation about the best way to
> do this was as follows:
>
> **
>
> I think the best way would be a TikaEntityProcessor which knows how to
> handle documents. I guess a typical use-case would be
> FileListEntityProcessor->TikaEntityProcessor as parent-child entities.
>
> Also see SOLR-833 which adds a FieldReaderDataSource using which you can
> pass any field's content to an entity for processing. So you can have a
> [SqlEntityProcessor, JdbcDataSource] producing a blob and a
> [FieldReaderDataSource, TikaEntityProcessor] consuming it.
>
> (http://www.nabble.com/DataImportHandler-and-Blobs-td20464891.html)
>
> **
>
> Has there been any work on something like this? Alternatively, is
> anyone else put together an alternative way to get DataImportHandler
> to extract body text from PDFs, Word files, etc.?
>
> Thanks,
> Chris
>



-- 
--Noble Paul


Severe errors in solr configuration

2009-02-04 Thread David Trainor
Hello,

I am running Ubuntu 8.10, with Tomcat 6.0.18 installed via the package
manager, and I am trying to get Solr 1.3.0 up and running, with no success.
I believe I am having the same problem described here:

http://www.nabble.com/Severe-errors-in-solr-configuration-td21829562.html

When I attempt to access solr/admin on the web server, I am greeted with the
following exception:

HTTP Status 500 - Severe errors in solr configuration. Check your log files
for more detailed information on what may be wrong. If you want solr to
continue after configuration errors, change:
false in null
-
java.security.AccessControlException: access denied (java.io.FilePermission
/var/lib/tomcat6/solr/solr.xml read) at
java.security.AccessControlContext.checkPermission(AccessControlContext.java:342)
at java.security.AccessController.checkPermission(AccessController.java:553)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:549) at
java.lang.SecurityManager.checkRead(SecurityManager.java:888) at
java.io.File.exists(File.java:748) at
[...snip...]

There is no solr.xml in /var/lib/tomcat6/solr/ (which is the example solr
home directory provided).  However, I did set the solr home with JDNI, under

/var/lib/tomcat6/conf/Catalina/localhost/solr.xml, which reads:





I am kind of at my wits end (and to make matters worse, I am new to Tomcat
as well as solr).  Can anybody supply any hints to get this baby up and
running?  If I have omitted any vital information, please just let me know.

Best regards,

Dave.