Re: OutOfMemoryErrors

2010-08-17 Thread Grijesh.singh

U can set up in startup script of tomcat
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/OutOfMemoryErrors-tp1181731p1182582.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: OutOfMemoryErrors

2010-08-17 Thread Grijesh.singh

put that line in your startup script or u can set as env var 
export CATALINA_OPTS=-Xms256m -Xmx1024m;
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/OutOfMemoryErrors-tp1181731p1182708.html
Sent from the Solr - User mailing list archive at Nabble.com.


Creating new Solr cores using relative paths

2010-08-17 Thread Jay Hill
I'm having trouble getting the core CREATE command to work with relative
paths in the solr.xml configuration.

I'm working with a layout like this:
/opt/solr [this is solr.solr.home: $SOLR_HOME]
/opt/solr/solr.xml
/opt/solr/core0/ [this is the template core]
/opt/solr/core0/conf/schema.xml [etc.]

/opt/tomcat/bin [where tomcat is started from: $TOMCAT_HOME/bin]

My very basic solr.xml:
solr persistent=true
cores adminPath=/admin/cores
  core name=core0 instanceDir=core0//
/cores
/solr

The CREATE core command works fine with absolute paths, but I have a
requirement to use relative paths. I want to be able to create a new core
like this:

http://localhost:8080/solr/admin/cores
?action=CREATE
name=core1
instanceDir=core1
config=core0/conf/solrconfig.xml
schema=core0/conf/schema.xml
(core1 is the name for the new core to be created, and I want to use the
config and schema from core0 to create the new core).

but the error is always due to the servlet container thinking
$TOMCAT_HOME/bin is the current working directory:
Caused by: java.lang.RuntimeException: *Can't find resource
'core0/conf/solrconfig.xml'* in classpath or '/opt/solr/core1/conf/', *
cwd=/opt/tomcat/bin
*
Does anyone know how to make this happen?

Thanks,
-Jay


Re: maxMergeDocs and performance tuning

2010-08-17 Thread Andrew Clegg

Okay, thanks Marc. I don't really have any complaints about performance
(yet!) but I'm still wondering how the mechanics work, e.g. when you have a
number of segments equal to mergeFactor, and each contains maxMergeDocs
documents.

The docs are a bit fuzzy on this...
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/maxMergeDocs-and-performance-tuning-tp1162695p1183064.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: OutOfMemoryErrors

2010-08-17 Thread Chamnap Chhorn
Should I add this line with double quote or not? because if I don't, it
doesn't work at all in my /etc/init.d/tomcat6.

export CATALINA_OPTS=-Xms256m -Xmx1024m;

On Tue, Aug 17, 2010 at 1:36 PM, Grijesh.singh pintu.grij...@gmail.comwrote:


 put that line in your startup script or u can set as env var
 export CATALINA_OPTS=-Xms256m -Xmx1024m;
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/OutOfMemoryErrors-tp1181731p1182708.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Chhorn Chamnap
http://chamnapchhorn.blogspot.com/


Re: OutOfMemoryErrors

2010-08-17 Thread Chamnap Chhorn
Is there a way to verify that I have added correctlly?

On Tue, Aug 17, 2010 at 2:41 PM, Chamnap Chhorn chamnapchh...@gmail.comwrote:

 Should I add this line with double quote or not? because if I don't, it
 doesn't work at all in my /etc/init.d/tomcat6.


 export CATALINA_OPTS=-Xms256m -Xmx1024m;

 On Tue, Aug 17, 2010 at 1:36 PM, Grijesh.singh pintu.grij...@gmail.comwrote:


 put that line in your startup script or u can set as env var
 export CATALINA_OPTS=-Xms256m -Xmx1024m;
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/OutOfMemoryErrors-tp1181731p1182708.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Chhorn Chamnap
 http://chamnapchhorn.blogspot.com/




-- 
Chhorn Chamnap
http://chamnapchhorn.blogspot.com/


Re: OutOfMemoryErrors

2010-08-17 Thread Grijesh.singh

U can add like this it will work I am using it

JAVA_OPTS=$JAVA_OPTS -Xms1024m -Xmx4096m 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/OutOfMemoryErrors-tp1181731p1183229.html
Sent from the Solr - User mailing list archive at Nabble.com.


Search document design problem

2010-08-17 Thread Wenca

Hi all,

I would like to use Solr to replace our site search based on MySQL but I 
am not sure how to map entities into the search index. The model is 
described byt the attached UML class diagram.


I have a Hotel that resides in some City in some Country. The hotel has 
various Rooms. For each Room in a Hotel there are some Packages that can 
be purchased by the client.


The entity returned from the search will be mainly the Hotel. E.g.:
- all hotels in USA
- all hotels in New York
- all hotels with name containing Hilton
- all hotels in Egypt with packages with all inclusive boarding
  and price lower than 400 and startDate between 2010-08-20
  and 2010-08-30

Our application also uses faceting a lot. e.g:
- # of hotels per country/city
- # of hotels based on room size
(# of beds - 1 bed - 100 hotels, 2 beds - 200 hotels, ...)
- # of hotels based on all inclusive package prices
(0-100 EUR, 100-200 EUR, ...)

But there are also use cases when a search should return a Room or 
Package directly.


I'd like to use Data Import Handler to index directly from our database. 
But which approach of mapping entities into the search index to use? It 
seems to me that there are at least 2 ways.


1) One index based on Hotel with multivalued fields for Rooms and 
multivalued fields for Packages. In DIH:

document
entity name=hotel ...
   field name=id .../
   entity name=room ...
  field name=room_id .../
  entity name=package...
 field .../
  /entity
   /entity
/entity
/document

But I am not sure whether this will work due to multivalued fields. The 
queries may span accross all the entities - I want only hotels that have 
room with 2 beds and the room has a package with all inclusive boarding 
and price lower than 400.


2) Denormalize data, so that there will be only one index based on 
Packages containing (duplicated) all the data from Room and Hotel and 
then use Field Collapsing on Hotel ID for search results and faceting too.
This would enable also direct search for Packages or Rooms but I am not 
sure about Field Collapsing which is still a kind of beta functionality 
and about potential performance costs.


Can anybody give me some advice or share their experiences?

Thanks a lot
Wenca


Re: Search document design problem

2010-08-17 Thread Wenca
Oops, it seems that the mailing list does not support attachments. 
Here's a link to the diagram image:


http://dl.dropbox.com/u/10214557/model.png

Wenca

Dne 17.8.2010 11:30, Wenca napsal(a):

Hi all,

I would like to use Solr to replace our site search based on MySQL but I
am not sure how to map entities into the search index. The model is
described byt the attached UML class diagram.

I have a Hotel that resides in some City in some Country. The hotel has
various Rooms. For each Room in a Hotel there are some Packages that can
be purchased by the client.

The entity returned from the search will be mainly the Hotel. E.g.:
- all hotels in USA
- all hotels in New York
- all hotels with name containing Hilton
- all hotels in Egypt with packages with all inclusive boarding
and price lower than 400 and startDate between 2010-08-20
and 2010-08-30

Our application also uses faceting a lot. e.g:
- # of hotels per country/city
- # of hotels based on room size
(# of beds - 1 bed - 100 hotels, 2 beds - 200 hotels, ...)
- # of hotels based on all inclusive package prices
(0-100 EUR, 100-200 EUR, ...)

But there are also use cases when a search should return a Room or
Package directly.

I'd like to use Data Import Handler to index directly from our database.
But which approach of mapping entities into the search index to use? It
seems to me that there are at least 2 ways.

1) One index based on Hotel with multivalued fields for Rooms and
multivalued fields for Packages. In DIH:
document
entity name=hotel ...
field name=id .../
entity name=room ...
field name=room_id .../
entity name=package...
field .../
/entity
/entity
/entity
/document

But I am not sure whether this will work due to multivalued fields. The
queries may span accross all the entities - I want only hotels that have
room with 2 beds and the room has a package with all inclusive boarding
and price lower than 400.

2) Denormalize data, so that there will be only one index based on
Packages containing (duplicated) all the data from Room and Hotel and
then use Field Collapsing on Hotel ID for search results and faceting too.
This would enable also direct search for Packages or Rooms but I am not
sure about Field Collapsing which is still a kind of beta functionality
and about potential performance costs.

Can anybody give me some advice or share their experiences?

Thanks a lot
Wenca


stream.url problem

2010-08-17 Thread satya swaroop
hi all,
   i am indexing the documents to solr that are in my system. now i need
to index the files that are in remote system, i enabled the remote streaming
to true in solrconfig.xml and when i use the stream.url it shows the error
as connection refused and the detail of the error is:::

when i sent the request in my browser as::

http://localhost:8080/solr/update/extract?stream.url=http://remotehost/home/san/Desktop/programming_erlang_armstrong.pdfliteral.id=schb2

i get the error as

HTTP Status 500 - Connection refused java.net.ConnectException: Connection
refused at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown
Source) at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at
sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1368)
at java.security.AccessController.doPrivileged(Native Method) at
sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1362)
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1016)
at
org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:88)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:161)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:619) Caused by:
java.net.ConnectException: Connection refused at
java.net.PlainSocketImpl.socketConnect(Native Method) at
java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at
java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at
java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at
java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at
java.net.Socket.connect(Socket.java:525) at
java.net.Socket.connect(Socket.java:475) at
sun.net.NetworkClient.doConnect(NetworkClient.java:163) at
sun.net.www.http.HttpClient.openServer(HttpClient.java:394) at
sun.net.www.http.HttpClient.openServer(HttpClient.java:529) at
sun.net.www.http.HttpClient.init(HttpClient.java:233) at
sun.net.www.http.HttpClient.New(HttpClient.java:306) at
sun.net.www.http.HttpClient.New(HttpClient.java:323) at
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:860)
at
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:801)
at
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:726)
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1049)
at
sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:2173)
at java.net.URLConnection.getContentType(URLConnection.java:485) at
org.apache.solr.common.util.ContentStreamBase$URLStream.init(ContentStreamBase.java:81)
at
org.apache.solr.servlet.SolrRequestParsers.buildRequestFrom(SolrRequestParsers.java:136)
at
org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:116)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:225)
...


if any body know
please help me with this

regards,
satya


Re: stream.url problem

2010-08-17 Thread Tim Terlegård
 hi all,
       i am indexing the documents to solr that are in my system. now i need
 to index the files that are in remote system, i enabled the remote streaming
 to true in solrconfig.xml and when i use the stream.url it shows the error
 as connection refused and the detail of the error is:::

 when i sent the request in my browser as::

 http://localhost:8080/solr/update/extract?stream.url=http://remotehost/home/san/Desktop/programming_erlang_armstrong.pdfliteral.id=schb2

You probably use the wrong port. Try 8983 instead.

/Tim


Re: stream.url problem

2010-08-17 Thread rajini maski
If the connector port number in your localhost is same as in other system
then this error is probable..You can change port number in server.xml of
your system or other system and make them different...If it is different
only then one other probablity is remote access enabled or not...

Rajani Maski


2010/8/17 Tim Terlegård tim.terleg...@gmail.com

  hi all,
i am indexing the documents to solr that are in my system. now i
 need
  to index the files that are in remote system, i enabled the remote
 streaming
  to true in solrconfig.xml and when i use the stream.url it shows the
 error
  as connection refused and the detail of the error is:::
 
  when i sent the request in my browser as::
 
 
 http://localhost:8080/solr/update/extract?stream.url=http://remotehost/home/san/Desktop/programming_erlang_armstrong.pdfliteral.id=schb2

 You probably use the wrong port. Try 8983 instead.

 /Tim



Re: OutOfMemoryErrors

2010-08-17 Thread Peter Karich

 Is there a way to verify that I have added correctlly?
   

on linux you can do
ps -elf | grep Boot
and see if the java command has the parameters added.

@all: why and when do you get those OOMs? while querying? which queries
in detail?

Regards,
Peter.


Re: OutOfMemoryErrors

2010-08-17 Thread rajini maski
I am getting it while indexing data to solr not while querying...
Though I have enough memory space upto 40GB and I my indexing data is just
5-6 GB yet that particular error is seldom observed... (SEVERE ERROR : JAVA
HEAP SPACE , OUT OF MEMORY ERROR )
I could see one lock file generated in the data/index path just after this
error.



On Tue, Aug 17, 2010 at 4:49 PM, Peter Karich peat...@yahoo.de wrote:


  Is there a way to verify that I have added correctlly?
 

 on linux you can do
 ps -elf | grep Boot
 and see if the java command has the parameters added.

 @all: why and when do you get those OOMs? while querying? which queries
 in detail?

 Regards,
 Peter.



Re: Search document design problem

2010-08-17 Thread Peter Karich
Hi Wenca,

I am not sure wether my information here is really helpful for you,
sorry if not ;-)

 I want only hotels that have room with 2 beds and the room has a
package with all inclusive boarding and price lower than 400.

you should tell us what you want to search and filter? Do you want only
available or all beds/rooms of a hotel?
The requirements seems to be a bit tricky but a combination of dynamic
fields and the collapse feature could do it (with only one query).

In your case I would start indexing the hotels like:
name: hilton
country: USA
city: New York
beds_i (multivalued): 2 | 1 | 1 | ...
rooms_i: 123
...

I am not sure how I would handle the booking/prices. Maybe you will have
to add an additional dynamic
field free_beds_periodX_i or price_periodX_i which reports the free beds
or prices for a specific period?
(where one period could be a week or even a day ...)

For the other searches I would create another index although it is
possible to put all the data in one index
and e.g. add a 'type' field to each document. With that field you can
than append a filter query to each query:
q=xyfq=type:hotel or type:room
I would prefer this trick over the collapse feature (if you really want
to setup only one index) at the beginning
and see if this could work for you. (the collapse feature is not that
mature like the the rest of solr, but in some situations it works nicely.)

Hopes this helps a bit to get started. (Regarding the 'Data Import
Handler' I cannot help, sorry)

Regards,
Peter.


 Hi all,

 I would like to use Solr to replace our site search based on MySQL but
 I am not sure how to map entities into the search index. The model is
 described byt the attached UML class diagram.

 I have a Hotel that resides in some City in some Country. The hotel
 has various Rooms. For each Room in a Hotel there are some Packages
 that can be purchased by the client.

 The entity returned from the search will be mainly the Hotel. E.g.:
 - all hotels in USA
 - all hotels in New York
 - all hotels with name containing Hilton
 - all hotels in Egypt with packages with all inclusive boarding
   and price lower than 400 and startDate between 2010-08-20
   and 2010-08-30

 Our application also uses faceting a lot. e.g:
 - # of hotels per country/city
 - # of hotels based on room size
 (# of beds - 1 bed - 100 hotels, 2 beds - 200 hotels, ...)
 - # of hotels based on all inclusive package prices
 (0-100 EUR, 100-200 EUR, ...)

 But there are also use cases when a search should return a Room or
 Package directly.

 I'd like to use Data Import Handler to index directly from our
 database. But which approach of mapping entities into the search index
 to use? It seems to me that there are at least 2 ways.

 1) One index based on Hotel with multivalued fields for Rooms and
 multivalued fields for Packages. In DIH:
 document
 entity name=hotel ...
field name=id .../
entity name=room ...
   field name=room_id .../
   entity name=package...
  field .../
   /entity
/entity
 /entity
 /document

 But I am not sure whether this will work due to multivalued fields.
 The queries may span accross all the entities - I want only hotels
 that have room with 2 beds and the room has a package with all
 inclusive boarding and price lower than 400.

 2) Denormalize data, so that there will be only one index based on
 Packages containing (duplicated) all the data from Room and Hotel and
 then use Field Collapsing on Hotel ID for search results and faceting
 too.
 This would enable also direct search for Packages or Rooms but I am
 not sure about Field Collapsing which is still a kind of beta
 functionality and about potential performance costs.

 Can anybody give me some advice or share their experiences?

 Thanks a lot
 Wenca



Re: stream.url problem

2010-08-17 Thread Travis Low
Connection refused (in any context) almost always means that nothing is
listening on the TCP port that you are trying to connect to. So either the
process you are connecting to isn't running, or you are trying to connect to
the wrong port.

On Tue, Aug 17, 2010 at 6:18 AM, satya swaroop sswaro...@gmail.com wrote:

 hi all,
   i am indexing the documents to solr that are in my system. now i need
 to index the files that are in remote system, i enabled the remote
 streaming
 to true in solrconfig.xml and when i use the stream.url it shows the error
 as connection refused and the detail of the error is:::

 when i sent the request in my browser as::


 http://localhost:8080/solr/update/extract?stream.url=http://remotehost/home/san/Desktop/programming_erlang_armstrong.pdfliteral.id=schb2

 i get the error as

 HTTP Status 500 - Connection refused java.net.ConnectException: Connection
 refused at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown
 Source) at

 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at
 [snip]


 if any body know
 please help me with this

 regards,
 satya



synonyms in EmbeddedSolrServer

2010-08-17 Thread Tim Terlegård
Synonyms doesn't seem to work in EmbeddedSolrServer (solr 1.4.0) when
mixing in multi word synonyms. It works fine when I run solr
standalone. Did anyone else experience this?

I have this in synonyms.txt:
word = some, other stuff

I index some and then search for word. With a standalone solr I
get a hit, but not when using the exact same configuration files with
EmbeddedSolrServer. If there is a multi word synonym on the same line
it doesn't work. If I remove other stuff from synonyms.txt I get a
hit also with EmbeddedSolrServer.

/Tim


Re: Search document design problem

2010-08-17 Thread Wenca

Hi Peter,

in fact I mainly want to search Hotels by any combination of its fields 
and its rooms and packages. Users can setup any combination in a dynamic 
form that changes after every change of the query.


But maybe the right term for this use case is filter not a search. 
The form enables fitering of hotels by predefined set of properties. And 
also it allows to specify it with some text search - so probably only in 
this case I should use the term search.


To illustrate it I will describe a simple search (or filter?) form example:

Message: now available 1000 hotels

Select country: combo with countries
Select place: combo with places
Select start date: combo with available dates
Select boarding: combo with available boarding types
Select price: combo with available prices (0-100, 100-200)
Select room size: combo with available bed counts

At first the form is populated by all available values. But when user 
selects a country, all other fields are repopulated with values 
available only for hotels in the selected country and the label above 
the form changes the number of available hotels acording to the search 
criteria. Afterwards the user selects a price level and again number of 
available hotels changes and the other combo boxes are repopulated with 
available values according to selected country and price level.


If I am not wrong, this is the place for Faceting, isn't it?

When the user is done with the search form, he can display the matching 
hotels and afterwards continue with displaying matching rooms and 
packages in a particular hotel.


In fact I am always looking for a concrete Package (offer, booking) that 
matches all the given criteria, but on the search output there is always 
only the hotel containing the given Package as well for faceting counts, 
no matter whether there is 1 or 50 packages matching the query, the 
hotel is counted only once as well as it is listed only once in the results.


The Faceting is the thing I am really interested in becase the described 
functionality implemented on top of RDBMS results in multiple SQL 
queries joining multiple tables each time a user changes the search 
criteria, so the load of DB servers is quite high.


I would like to use Solr for the search form because after some test the 
Lucene seams to be really very fast and I believe it would improve the 
response time as well as the thruput of our search.


I am new to Solr so excuse me if I don't use the right terminology yet, 
but I hope that my description of the use case is quite clear now. ;-)


Thanks
Wenca

Dne 17.8.2010 13:46, Peter Karich napsal(a):

Hi Wenca,

I am not sure wether my information here is really helpful for you,
sorry if not ;-)


I want only hotels that have room with 2 beds and the room has a

package with all inclusive boarding and price lower than 400.

you should tell us what you want to search and filter? Do you want only
available or all beds/rooms of a hotel?
The requirements seems to be a bit tricky but a combination of dynamic
fields and the collapse feature could do it (with only one query).

In your case I would start indexing the hotels like:
name: hilton
country: USA
city: New York
beds_i (multivalued): 2 | 1 | 1 | ...
rooms_i: 123
...

I am not sure how I would handle the booking/prices. Maybe you will have
to add an additional dynamic
field free_beds_periodX_i or price_periodX_i which reports the free beds
or prices for a specific period?
(where one period could be a week or even a day ...)

For the other searches I would create another index although it is
possible to put all the data in one index
and e.g. add a 'type' field to each document. With that field you can
than append a filter query to each query:
q=xyfq=type:hotel or type:room
I would prefer this trick over the collapse feature (if you really want
to setup only one index) at the beginning
and see if this could work for you. (the collapse feature is not that
mature like the the rest of solr, but in some situations it works nicely.)

Hopes this helps a bit to get started. (Regarding the 'Data Import
Handler' I cannot help, sorry)

Regards,
Peter.




Re: OutOfMemoryErrors

2010-08-17 Thread Peter Karich
Which method do you use to index? If you are using SolrJ you can use the
streaming update server
it is a better option for the solr server, because the server does not
need to held it all in memory.

(if you are using the post.jar file there was bug which causes OOMs but
I didn't remember exactly ...)

then, when the solr server crashed, it cannot remove the lock file. if
you are sure there is only one indexing process you could specify remove
indexDefaults
   lockTypenative/lockType !-- or any other possible too ?? --
   unlockOnStartuptrue/unlockOnStartup
/indexDefaults

see also:

 - http://www.mail-archive.com/solr-user@lucene.apache.org/msg08049.html
 - Out Of Memory Errors on
http://wiki.apache.org/solr/SolrPerformanceFactors

Regards,
Peter.

 I am getting it while indexing data to solr not while querying...
 Though I have enough memory space upto 40GB and I my indexing data is just
 5-6 GB yet that particular error is seldom observed... (SEVERE ERROR : JAVA
 HEAP SPACE , OUT OF MEMORY ERROR )
 I could see one lock file generated in the data/index path just after this
 error.



 On Tue, Aug 17, 2010 at 4:49 PM, Peter Karich peat...@yahoo.de wrote:

   
 
 Is there a way to verify that I have added correctlly?

   
 on linux you can do
 ps -elf | grep Boot
 and see if the java command has the parameters added.

 @all: why and when do you get those OOMs? while querying? which queries
 in detail?

 Regards,
 Peter.
 



Re: indexing???

2010-08-17 Thread Ken Krugler


On Aug 16, 2010, at 10:38pm, satya swaroop wrote:


hi all,
  the error i got is Unexpected RuntimeException from
org.apache.tika.parser.pdf.pdfpar...@8210fc when i indexed a file  
similar

to the one in
  https://issues.apache.org/jira/browse/PDFBOX-709/samplerequestform.pdf


1. This URL doesn't work for me.

2. Please include the full stack trace from the RuntimeException.

3. What version of Tika are you using?

Thanks,

-- Ken


Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g






Re: Search document design problem

2010-08-17 Thread Peter Karich
Hi Wenca,

 But maybe the right term for this use case is filter not a search ...
 in this case I should use the term search.

no problem. a search query to solr could also contain filters. so
'search' is ok for all situations, I think ;-)


 Afterwards the user selects a price level and again number of
available hotels changes

every hotel can have a price range, right? or distinct values?
for a price range you can use a field price_max and price_min for every
hotel.
for distinct values (if not too much) you could use a multivalued price
field.


 If I am not wrong, this is the place for Faceting, isn't it?

yes, you can use facets to show only a subset of your combo values (i.e.
show facets with count  0).
But for every change to the combo boxes you will need to fire a query
and change the hotel numbers.
Facets then show how many hotels will be available if one would apply
the specific filter query.
You could even enhance the combo fields with the available
hotelnumber... for a better user experience.

So, simple try to get this basic search done (which shouldn't take too
much time) and ask if you have problems.
If you are convinced from solrs' relevance and performance (which I bet
on) you can think about the other searches.


 In fact I am always looking for a concrete Package (offer, booking)
that matches all the given criteria,
 but on the search output there is always only the hotel containing the
given Package as well for
 faceting counts, no matter whether there is 1 or 50 packages matching
the query, the hotel is
 counted only once as well as it is listed only once in the results.

This sounds to me that you have to use only one hotel-index, where you
add a multivalued field 'package' directly to every hotels.

Regards,
Peter.


 Hi Peter,

 in fact I mainly want to search Hotels by any combination of its
 fields and its rooms and packages. Users can setup any combination in
 a dynamic form that changes after every change of the query.

 But maybe the right term for this use case is filter not a search.
 The form enables fitering of hotels by predefined set of properties.
 And also it allows to specify it with some text search - so probably
 only in this case I should use the term search.

 To illustrate it I will describe a simple search (or filter?) form
 example:

 Message: now available 1000 hotels

 Select country: combo with countries
 Select place: combo with places
 Select start date: combo with available dates
 Select boarding: combo with available boarding types
 Select price: combo with available prices (0-100, 100-200)
 Select room size: combo with available bed counts

 At first the form is populated by all available values. But when user
 selects a country, all other fields are repopulated with values
 available only for hotels in the selected country and the label above
 the form changes the number of available hotels acording to the search
 criteria. Afterwards the user selects a price level and again number
 of available hotels changes and the other combo boxes are repopulated
 with available values according to selected country and price level.

 If I am not wrong, this is the place for Faceting, isn't it?

 When the user is done with the search form, he can display the
 matching hotels and afterwards continue with displaying matching rooms
 and packages in a particular hotel.

 In fact I am always looking for a concrete Package (offer, booking)
 that matches all the given criteria, but on the search output there is
 always only the hotel containing the given Package as well for
 faceting counts, no matter whether there is 1 or 50 packages matching
 the query, the hotel is counted only once as well as it is listed only
 once in the results.

 The Faceting is the thing I am really interested in becase the
 described functionality implemented on top of RDBMS results in
 multiple SQL queries joining multiple tables each time a user changes
 the search criteria, so the load of DB servers is quite high.

 I would like to use Solr for the search form because after some test
 the Lucene seams to be really very fast and I believe it would improve
 the response time as well as the thruput of our search.

 I am new to Solr so excuse me if I don't use the right terminology
 yet, but I hope that my description of the use case is quite clear
 now. ;-)

 Thanks
 Wenca

 Dne 17.8.2010 13:46, Peter Karich napsal(a):
 Hi Wenca,

 I am not sure wether my information here is really helpful for you,
 sorry if not ;-)

 I want only hotels that have room with 2 beds and the room has a
 package with all inclusive boarding and price lower than 400.

 you should tell us what you want to search and filter? Do you want only
 available or all beds/rooms of a hotel?
 The requirements seems to be a bit tricky but a combination of dynamic
 fields and the collapse feature could do it (with only one query).

 In your case I would start indexing the hotels like:
 name: hilton
 country: USA
 city: New York
 

autocomplete: case-insensitive and middle word

2010-08-17 Thread Paul
I have a couple questions about implementing an autocomplete function
in solr. Here's my scenario:

I have a name field that usually contains two or three names. For
instance, let's suppose it contains:

John Alfred Smith
Alfred Johnson
John Quincy Adams
Fred Jones

I'd like to have the autocomplete be case insensitive and match any of
the names, preferably just at the beginning.

In other words, if the user types alf, I want

John Alfred Smith
Alfred Johnson

if the user types fre, I want

Fred Jones

but not:
John Alfred Smith
Alfred Johnson

I can get the matches using the text_lu analyzer, but the hints that
are returned are lower case, and only one name.

If I use the string analyzer, I get the entire name like I want it,
but the user must match the case, that is, must type Alf, and it
only matches the first name, not the middle name.

How can I get the matches of the text_lu analyzer, but get the hints
like the string analyzer?

Thanks,
Paul


Re: autocomplete: case-insensitive and middle word

2010-08-17 Thread Avlesh Singh
This thread might help -
http://www.lucidimagination.com/search/document/9edc01a90a195336/enhancing_auto_complete

Cheers
Avlesh
@avlesh http://twitter.com/avlesh | http://webklipper.com

On Tue, Aug 17, 2010 at 8:30 PM, Paul p...@nines.org wrote:

 I have a couple questions about implementing an autocomplete function
 in solr. Here's my scenario:

 I have a name field that usually contains two or three names. For
 instance, let's suppose it contains:

 John Alfred Smith
 Alfred Johnson
 John Quincy Adams
 Fred Jones

 I'd like to have the autocomplete be case insensitive and match any of
 the names, preferably just at the beginning.

 In other words, if the user types alf, I want

 John Alfred Smith
 Alfred Johnson

 if the user types fre, I want

 Fred Jones

 but not:
 John Alfred Smith
 Alfred Johnson

 I can get the matches using the text_lu analyzer, but the hints that
 are returned are lower case, and only one name.

 If I use the string analyzer, I get the entire name like I want it,
 but the user must match the case, that is, must type Alf, and it
 only matches the first name, not the middle name.

 How can I get the matches of the text_lu analyzer, but get the hints
 like the string analyzer?

 Thanks,
 Paul



Re: Solr-HOW TO HANDLE THE LOCK FILE CREATION WHILE INDEXING AND OPERATION TIMED OUT WEB EXCEPTION ERROR

2010-08-17 Thread Erick Erickson
It would help a lot if you included the stack trace of the exception,
perhaps
it'll be in your SOLR logs.

Also, what is your environment? Are you using any kind of networked
drive for your index? Windows? What version of SOLR are you using?

Anything else you think would be useful.

Best
Erick

On Tue, Aug 17, 2010 at 12:10 AM, rajini maski rajinima...@gmail.comwrote:

 Hello Everyone,

  Please help me knowing the logic behind this lock file generation
 while indexing data in solr!

   The trouble I am facing is as follows:

 The data that I indexed is nearly in millions. At the initial level of
 indexing I find no errors unless it cross up-to 10lacs documents...But once
 it crosses this limit its throwing the web exception error as operation
 time
 out! And simultaneously a kind of LOCK file is generated in //data/index
 folder. I found in one thread ( this
 thread
 http://www.mail-archive.com/solr-user@lucene.apache.org/msg06782.html
 )that
 it can be fixed by making some changes in Config xml of solr and also by
 increasing java memory space in Tomcat.And I did that...Still the issue is
 not solved and i couldn't find any route cause for this error..

 Please , whoever know logic behind these two issues i.e,
  1) The web exception error as *operation timed out *
  2) The logic behind* why lock files are created and how they actually work
 like!!*


 Awaiting replies

 Regards,
 Rajani Maski



Function query to boost scores by a constant if all terms are present

2010-08-17 Thread Bill Dueber
Let me describe what I'm trying to accomplish, first, since what I think is
the solution is almost always wrong. :-)

I'm doing dismax queries with mm set such that not all terms need to match,
e.g. only 2 of 3 query terms need to match.

Most of the time, items that match all three terms will float to the top by
normal ranking, but sometimes there are only two terms that are like a rash
across the record, and they end up with a higher score than some items that
match all three query terms.

I'd like to boost items with all the query terms to the top *without
changing their order*.

My first thought was to use a simple boost query allfields:(a AND b AND c),
but the order of the set of records that contain all three terms changes
when I do that. What I *think* I need to do is basically to say, Hey, all
the items with all three terms get an extra 40,000 points, but change
nothing else.

I keep thinking I can get what I need with a subquery and map, but keep
failing.

Any advice would be very, very welcome.

 -Bill-



-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Re: Solr date NOW - format?

2010-08-17 Thread Shawn Heisey

 On 4/9/2010 7:35 PM, Lance Norskog wrote:

Function queries are notoriously slow. Another way to boost by year is
with range queries:
[NOW-6MONTHS TO NOW]^5.0 ,
[NOW-1YEARS TO NOW-6MONTHS]^3.0
[NOW-2YEARS TO NOW-1YEARS]^2.0
[* TO NOW-2YEARS]^1.0

Notice that you get to have a non-linear curve when you select the
ranges by hand.


Lance, I have worked out my major issue and now have my post date in 
Solr as a tdate field named pd.  I cannot however figure out how to 
actually send a query with a date boost like you've mentioned above.  
I'd like to embed it right into the dismax handler definition, but it 
would be good to also know how to send it in a query myself.  Can you help?


Are the boosts indicated above a multiplier, or an addition?

Thanks,
Shawn



Re: OutOfMemoryErrors

2010-08-17 Thread Erick Erickson
You shouldn't be getting this error at all unless you're doing something
out of the ordinary. So, it'd help if you told us:

What parameters you have set for merging
What parameters you have set for the JVM
What kind of documents are you indexing?

The memory you have is irrelevant if you only allocate a small
portion of it for the running process...

Best
Erick

On Tue, Aug 17, 2010 at 7:35 AM, rajini maski rajinima...@gmail.com wrote:

 I am getting it while indexing data to solr not while querying...
 Though I have enough memory space upto 40GB and I my indexing data is just
 5-6 GB yet that particular error is seldom observed... (SEVERE ERROR : JAVA
 HEAP SPACE , OUT OF MEMORY ERROR )
 I could see one lock file generated in the data/index path just after this
 error.



 On Tue, Aug 17, 2010 at 4:49 PM, Peter Karich peat...@yahoo.de wrote:

 
   Is there a way to verify that I have added correctlly?
  
 
  on linux you can do
  ps -elf | grep Boot
  and see if the java command has the parameters added.
 
  @all: why and when do you get those OOMs? while querying? which queries
  in detail?
 
  Regards,
  Peter.
 



Re: OutOfMemoryErrors

2010-08-17 Thread rajini maski
mergefactor100 /mergefactor
JVM Initial memory pool -256MB
   Maximum memory pool -1024MB

add
doc
fieldlong:ID/field
fieldstr:Body/field

12 fields
/filed
/doc
/add
I have a solr instance in solr folder (D:/Solr) free space in disc is 24.3GB
.. How will I get to know what portion of memory is solr using ?



On Tue, Aug 17, 2010 at 10:11 PM, Erick Erickson erickerick...@gmail.comwrote:

 You shouldn't be getting this error at all unless you're doing something
 out of the ordinary. So, it'd help if you told us:

 What parameters you have set for merging
 What parameters you have set for the JVM
 What kind of documents are you indexing?

 The memory you have is irrelevant if you only allocate a small
 portion of it for the running process...

 Best
 Erick

 On Tue, Aug 17, 2010 at 7:35 AM, rajini maski rajinima...@gmail.com
 wrote:

  I am getting it while indexing data to solr not while querying...
  Though I have enough memory space upto 40GB and I my indexing data is
 just
  5-6 GB yet that particular error is seldom observed... (SEVERE ERROR :
 JAVA
  HEAP SPACE , OUT OF MEMORY ERROR )
  I could see one lock file generated in the data/index path just after
 this
  error.
 
 
 
  On Tue, Aug 17, 2010 at 4:49 PM, Peter Karich peat...@yahoo.de wrote:
 
  
Is there a way to verify that I have added correctlly?
   
  
   on linux you can do
   ps -elf | grep Boot
   and see if the java command has the parameters added.
  
   @all: why and when do you get those OOMs? while querying? which queries
   in detail?
  
   Regards,
   Peter.
  
 



Re: Solr-HOW TO HANDLE THE LOCK FILE CREATION WHILE INDEXING AND OPERATION TIMED OUT WEB EXCEPTION ERROR

2010-08-17 Thread rajini maski
 Yes it is netwoked kind and in WindowsSolr version is Solr-1.4.0 ,
Tomcat 6.

Exception is system.net.web exception error Operation has timed out
httprequest.getresponse failed
For web exception error do I need to change ramBufferSize paramter  and
merge factors parameters in config.xml ??

And for lock file is there any setting I need to make? Why and how does it
get generated...?If you know please brief it...I am not able to get it
understand

Thanks a lot for reply...
Regards,
Rajani Maski

On Tue, Aug 17, 2010 at 9:41 PM, Erick Erickson erickerick...@gmail.comwrote:

 It would help a lot if you included the stack trace of the exception,
 perhaps
 it'll be in your SOLR logs.

 Also, what is your environment? Are you using any kind of networked
 drive for your index? Windows? What version of SOLR are you using?

 Anything else you think would be useful.

 Best
 Erick

 On Tue, Aug 17, 2010 at 12:10 AM, rajini maski rajinima...@gmail.com
 wrote:

  Hello Everyone,
 
   Please help me knowing the logic behind this lock file generation
  while indexing data in solr!
 
The trouble I am facing is as follows:
 
  The data that I indexed is nearly in millions. At the initial level of
  indexing I find no errors unless it cross up-to 10lacs documents...But
 once
  it crosses this limit its throwing the web exception error as operation
  time
  out! And simultaneously a kind of LOCK file is generated in //data/index
  folder. I found in one thread ( this
  thread
  http://www.mail-archive.com/solr-user@lucene.apache.org/msg06782.html
   )that
  it can be fixed by making some changes in Config xml of solr and also by
  increasing java memory space in Tomcat.And I did that...Still the issue
 is
  not solved and i couldn't find any route cause for this error..
 
  Please , whoever know logic behind these two issues i.e,
   1) The web exception error as *operation timed out *
   2) The logic behind* why lock files are created and how they actually
 work
  like!!*
 
 
  Awaiting replies
 
  Regards,
  Rajani Maski
 



Re: OutOfMemoryErrors

2010-08-17 Thread Erick Erickson
There are more merge paramaters, what values do you have for these:

mergeFactor10/mergeFactor
maxBufferedDocs1000/maxBufferedDocs
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength1/maxFieldLength

See: http://wiki.apache.org/solr/SolrConfigXml

Hope that formatting comes through the various mail programs OK

Also, what else happens while you're indexing? Do you search
while indexing? How often do you commit your changes?



On Tue, Aug 17, 2010 at 1:18 PM, rajini maski rajinima...@gmail.com wrote:

 mergefactor100 /mergefactor
 JVM Initial memory pool -256MB
   Maximum memory pool -1024MB

 add
 doc
 fieldlong:ID/field
 fieldstr:Body/field
 
 12 fields
 /filed
 /doc
 /add
 I have a solr instance in solr folder (D:/Solr) free space in disc is
 24.3GB
 .. How will I get to know what portion of memory is solr using ?



 On Tue, Aug 17, 2010 at 10:11 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  You shouldn't be getting this error at all unless you're doing something
  out of the ordinary. So, it'd help if you told us:
 
  What parameters you have set for merging
  What parameters you have set for the JVM
  What kind of documents are you indexing?
 
  The memory you have is irrelevant if you only allocate a small
  portion of it for the running process...
 
  Best
  Erick
 
  On Tue, Aug 17, 2010 at 7:35 AM, rajini maski rajinima...@gmail.com
  wrote:
 
   I am getting it while indexing data to solr not while querying...
   Though I have enough memory space upto 40GB and I my indexing data is
  just
   5-6 GB yet that particular error is seldom observed... (SEVERE ERROR :
  JAVA
   HEAP SPACE , OUT OF MEMORY ERROR )
   I could see one lock file generated in the data/index path just after
  this
   error.
  
  
  
   On Tue, Aug 17, 2010 at 4:49 PM, Peter Karich peat...@yahoo.de
 wrote:
  
   
 Is there a way to verify that I have added correctlly?

   
on linux you can do
ps -elf | grep Boot
and see if the java command has the parameters added.
   
@all: why and when do you get those OOMs? while querying? which
 queries
in detail?
   
Regards,
Peter.
   
  
 



Re: OutOfMemoryErrors

2010-08-17 Thread rajini maski
yeah sorry I forgot to mention others...

mergeFactor100/mergeFactor
maxBufferedDocs1000/maxBufferedDocs
maxMergeDocs10/maxMergeDocs
maxFieldLength1/maxFieldLength

above are the values

Is this because of values here...initially I had mergeFactor parameter -10
and maxMergedocs-1With the same error i changed them to above
values..Yet I got that error after index was about 2lacs docs...

On Tue, Aug 17, 2010 at 11:04 PM, Erick Erickson erickerick...@gmail.comwrote:

 There are more merge paramaters, what values do you have for these:

 mergeFactor10/mergeFactor
 maxBufferedDocs1000/maxBufferedDocs
 maxMergeDocs2147483647/maxMergeDocs
 maxFieldLength1/maxFieldLength

 See: http://wiki.apache.org/solr/SolrConfigXml

 Hope that formatting comes through the various mail programs OK

 Also, what else happens while you're indexing? Do you search
 while indexing? How often do you commit your changes?



 On Tue, Aug 17, 2010 at 1:18 PM, rajini maski rajinima...@gmail.com
 wrote:

  mergefactor100 /mergefactor
  JVM Initial memory pool -256MB
Maximum memory pool -1024MB
 
  add
  doc
  fieldlong:ID/field
  fieldstr:Body/field
  
  12 fields
  /filed
  /doc
  /add
  I have a solr instance in solr folder (D:/Solr) free space in disc is
  24.3GB
  .. How will I get to know what portion of memory is solr using ?
 
 
 
  On Tue, Aug 17, 2010 at 10:11 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
   You shouldn't be getting this error at all unless you're doing
 something
   out of the ordinary. So, it'd help if you told us:
  
   What parameters you have set for merging
   What parameters you have set for the JVM
   What kind of documents are you indexing?
  
   The memory you have is irrelevant if you only allocate a small
   portion of it for the running process...
  
   Best
   Erick
  
   On Tue, Aug 17, 2010 at 7:35 AM, rajini maski rajinima...@gmail.com
   wrote:
  
I am getting it while indexing data to solr not while querying...
Though I have enough memory space upto 40GB and I my indexing data is
   just
5-6 GB yet that particular error is seldom observed... (SEVERE ERROR
 :
   JAVA
HEAP SPACE , OUT OF MEMORY ERROR )
I could see one lock file generated in the data/index path just after
   this
error.
   
   
   
On Tue, Aug 17, 2010 at 4:49 PM, Peter Karich peat...@yahoo.de
  wrote:
   

  Is there a way to verify that I have added correctlly?
 

 on linux you can do
 ps -elf | grep Boot
 and see if the java command has the parameters added.

 @all: why and when do you get those OOMs? while querying? which
  queries
 in detail?

 Regards,
 Peter.

   
  
 



RE: Solr synonyms format query time vs index time

2010-08-17 Thread Steven A Rowe
Hi Michael,

I think the problem you're seeing is that no document contains reebox, and 
you've used the explicit syntax (source=dest) instead of the equivalent 
syntax (term,term,term). 

I'm guessing that if you convert your synonym file from:

reebox = Reebok

to:

reebox, Reebok

and leave expand=true, and then reindex, everything will work: your indexed 
documents containing Reebok will be made to include reebox, so queries for 
reebox will produce hits on those documents.

Steve

 -Original Message-
 From: mtdowling [mailto:mtdowl...@gmail.com]
 Sent: Tuesday, August 17, 2010 2:24 PM
 To: solr-user@lucene.apache.org
 Subject: Solr synonyms format query time vs index time
 
 
 My company recently started using Solr for site search and autocomplete.
 It's working great, but we're running into a problem with synonyms.  We
 are
 generating a synonyms.txt file from a database table and using that
 synonyms.txt file at index time on a text type field.  Here's an excerpt
 from the synonyms file:
 
 reebox = Reebok
 shinguards = Shin Guards
 shirt = T-Shirt,Shirt
 shmak = Shmack
 shocks = shox
 skateboard = Skate
 skateboarding = Skate
 skater = Skate
 skates = Skate
 skating = Skate
 skirt = Dresses
 
 When we do a search for reebox, we want the term to be mapped to Reebok
 through explicit mapping, but for some reason this isn't happening.  We do
 have multi-word synonyms, and from what I've read on the mailing list,
 those
 only work at index time, so we are only using the synonym filter factory
 at
 index time:
 
 fieldType name=search class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 catenateWords=1
 catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 catenateWords=1
 catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 /fieldType
 
 Here's more relevant schema.xml configs:
 
 field name=mashup type=search indexed=true stored=false
 multiValued=true/
 copyField source=keywords dest=mashup/
 copyField source=category dest=mashup/
 copyField source=name dest=mashup/
 copyField source=brand dest=mashup/
 copyField source=description_overview dest=mashup/
 copyField source=sku dest=mashup/
 !-- other copy fields... --
 
 The output of the query analyzer shows the following:
 
 Query Analyzer
 org.apache.solr.analysis.WhitespaceTokenizerFactory {}
 term position 1
 term text reebox
 term type word
 source start,end  0,6
 payload
 org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
 ignoreCase=true}
 term position 1
 term text reebox
 term type word
 source start,end  0,6
 payload
 org.apache.solr.analysis.WordDelimiterFilterFactory
 {generateNumberParts=0,
 catenateWords=1, generateWordParts=0, catenateAll=0, catenateNumbers=1}
 term position 1
 term text reebox
 term type word
 source start,end  0,6
 payload
 org.apache.solr.analysis.LowerCaseFilterFactory {}
 term position 1
 term text reebox
 term type word
 source start,end  0,6
 payload
 org.apache.solr.analysis.SnowballPorterFilterFactory
 {protected=protwords.txt, language=English}
 term position 1
 term text reebox
 term type word
 source start,end  0,6
 payload
 org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
 term position 1
 term text reebox
 term type word
 source start,end  0,6
 payload
 
 So reebox is never being converted to Reebok.  I thought that if I had
 index time synonyms with expansion configured that I wouldn't need query
 time synonyms.  Maybe my dynamic synonyms generation isn't formatted
 correctly for my desired result?
 
 If I use the same synonyms.txt file and use the index analyzer, reebox is
 mapped to Reebok and then indexed correctly:
 
 Index Analyzer
 

Re: OutOfMemoryErrors

2010-08-17 Thread Jay Hill
A merge factor of 100 is very high and out of the norm. Try starting with a
value of 10. I've never seen a running system with a value anywhere near
this high.

Also, what is your setting for ramBufferSizeMB?

-Jay

On Tue, Aug 17, 2010 at 10:46 AM, rajini maski rajinima...@gmail.comwrote:

 yeah sorry I forgot to mention others...

 mergeFactor100/mergeFactor
 maxBufferedDocs1000/maxBufferedDocs
 maxMergeDocs10/maxMergeDocs
 maxFieldLength1/maxFieldLength

 above are the values

 Is this because of values here...initially I had mergeFactor parameter -10
 and maxMergedocs-1With the same error i changed them to above
 values..Yet I got that error after index was about 2lacs docs...

 On Tue, Aug 17, 2010 at 11:04 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  There are more merge paramaters, what values do you have for these:
 
  mergeFactor10/mergeFactor
  maxBufferedDocs1000/maxBufferedDocs
  maxMergeDocs2147483647/maxMergeDocs
  maxFieldLength1/maxFieldLength
 
  See: http://wiki.apache.org/solr/SolrConfigXml
 
  Hope that formatting comes through the various mail programs OK
 
  Also, what else happens while you're indexing? Do you search
  while indexing? How often do you commit your changes?
 
 
 
  On Tue, Aug 17, 2010 at 1:18 PM, rajini maski rajinima...@gmail.com
  wrote:
 
   mergefactor100 /mergefactor
   JVM Initial memory pool -256MB
 Maximum memory pool -1024MB
  
   add
   doc
   fieldlong:ID/field
   fieldstr:Body/field
   
   12 fields
   /filed
   /doc
   /add
   I have a solr instance in solr folder (D:/Solr) free space in disc is
   24.3GB
   .. How will I get to know what portion of memory is solr using ?
  
  
  
   On Tue, Aug 17, 2010 at 10:11 PM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
  
You shouldn't be getting this error at all unless you're doing
  something
out of the ordinary. So, it'd help if you told us:
   
What parameters you have set for merging
What parameters you have set for the JVM
What kind of documents are you indexing?
   
The memory you have is irrelevant if you only allocate a small
portion of it for the running process...
   
Best
Erick
   
On Tue, Aug 17, 2010 at 7:35 AM, rajini maski rajinima...@gmail.com
 
wrote:
   
 I am getting it while indexing data to solr not while querying...
 Though I have enough memory space upto 40GB and I my indexing data
 is
just
 5-6 GB yet that particular error is seldom observed... (SEVERE
 ERROR
  :
JAVA
 HEAP SPACE , OUT OF MEMORY ERROR )
 I could see one lock file generated in the data/index path just
 after
this
 error.



 On Tue, Aug 17, 2010 at 4:49 PM, Peter Karich peat...@yahoo.de
   wrote:

 
   Is there a way to verify that I have added correctlly?
  
 
  on linux you can do
  ps -elf | grep Boot
  and see if the java command has the parameters added.
 
  @all: why and when do you get those OOMs? while querying? which
   queries
  in detail?
 
  Regards,
  Peter.
 

   
  
 



Sort by date, filter by score?

2010-08-17 Thread Shawn Heisey
 I have had a request from our development team.  I did some searching 
and could not find an answer.


They want to sort by a date field but filter out all results below a 
minimum relevancy score.  Is this possible?  I suspect that our only 
option will be to do the search sorted by relevancy and then sort them 
ourselves.


Thanks,
Shawn



Re: Sort by date, filter by score?

2010-08-17 Thread Ahmet Arslan
 They want to sort by a date field but filter out all
 results below a minimum relevancy score.  Is this possible? 

Earlier Yonik proposed a solution to a similar need.
http://search-lucene.com/m/4AHNF17wIJW1





sort order of missing items

2010-08-17 Thread Brad Dewar

When items are sorted, are all the docs with the sort field missing considered 
tied in terms of their sort order, or are they indeterminate, or do they 
have some arbitrary order imposed on them (e.g. _docid_)?

For example, would b be considered as part of the sort in the following 
query, or would all the missing 'a' fields be in some kind of order already, 
thus making the sort algorithm never check the 'b' field?

/select/?q=-a:[* TO *]sort=a asc,b asc

And would sortMissingLast / sortMissingFirst affect the answer to that question?

I've been seeing weird behaviour in my index with queries (a little) like this 
one, but I haven't pinpointed the problem yet.

Brad




Re: Solr date NOW - format?

2010-08-17 Thread Lance Norskog
I think 'bq=' is what you want. In dismax the main query string is
assumed to go against a bunch of fields. This query is in the standard
(Lucene++) format. The query strings should handle the ^number syntax.

http://www.lucidimagination.com/search/document/CDRG_ch07_7.4.2.9

On Tue, Aug 17, 2010 at 9:40 AM, Shawn Heisey s...@elyograg.org wrote:
  On 4/9/2010 7:35 PM, Lance Norskog wrote:

 Function queries are notoriously slow. Another way to boost by year is
 with range queries:
 [NOW-6MONTHS TO NOW]^5.0 ,
 [NOW-1YEARS TO NOW-6MONTHS]^3.0
 [NOW-2YEARS TO NOW-1YEARS]^2.0
 [* TO NOW-2YEARS]^1.0

 Notice that you get to have a non-linear curve when you select the
 ranges by hand.

 Lance, I have worked out my major issue and now have my post date in Solr as
 a tdate field named pd.  I cannot however figure out how to actually send
 a query with a date boost like you've mentioned above.  I'd like to embed it
 right into the dismax handler definition, but it would be good to also know
 how to send it in a query myself.  Can you help?

 Are the boosts indicated above a multiplier, or an addition?

 Thanks,
 Shawn





-- 
Lance Norskog
goks...@gmail.com


Re: Solr synonyms format query time vs index time

2010-08-17 Thread Lance Norskog
solr/admin/analysis.jsp lets you see how this works. Use the index boxes.

Lance

On Tue, Aug 17, 2010 at 11:56 AM, Steven A Rowe sar...@syr.edu wrote:
 Hi Michael,

 I think the problem you're seeing is that no document contains reebox, and 
 you've used the explicit syntax (source=dest) instead of the equivalent 
 syntax (term,term,term).

 I'm guessing that if you convert your synonym file from:

        reebox = Reebok

 to:

        reebox, Reebok

 and leave expand=true, and then reindex, everything will work: your indexed 
 documents containing Reebok will be made to include reebox, so queries 
 for reebox will produce hits on those documents.

 Steve

 -Original Message-
 From: mtdowling [mailto:mtdowl...@gmail.com]
 Sent: Tuesday, August 17, 2010 2:24 PM
 To: solr-user@lucene.apache.org
 Subject: Solr synonyms format query time vs index time


 My company recently started using Solr for site search and autocomplete.
 It's working great, but we're running into a problem with synonyms.  We
 are
 generating a synonyms.txt file from a database table and using that
 synonyms.txt file at index time on a text type field.  Here's an excerpt
 from the synonyms file:

 reebox = Reebok
 shinguards = Shin Guards
 shirt = T-Shirt,Shirt
 shmak = Shmack
 shocks = shox
 skateboard = Skate
 skateboarding = Skate
 skater = Skate
 skates = Skate
 skating = Skate
 skirt = Dresses

 When we do a search for reebox, we want the term to be mapped to Reebok
 through explicit mapping, but for some reason this isn't happening.  We do
 have multi-word synonyms, and from what I've read on the mailing list,
 those
 only work at index time, so we are only using the synonym filter factory
 at
 index time:

 fieldType name=search class=solr.TextField
 positionIncrementGap=100
             analyzer type=index
                 tokenizer class=solr.WhitespaceTokenizerFactory/
                 filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/
                 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
                 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 catenateWords=1
 catenateNumbers=1 catenateAll=0/
                 filter class=solr.LowerCaseFilterFactory/
                 filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/
                 filter class=solr.RemoveDuplicatesTokenFilterFactory/
             /analyzer
             analyzer type=query
                 tokenizer class=solr.WhitespaceTokenizerFactory/
                 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
                 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 catenateWords=1
 catenateNumbers=1 catenateAll=0/
                 filter class=solr.LowerCaseFilterFactory/
                 filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/
                 filter class=solr.RemoveDuplicatesTokenFilterFactory/
             /analyzer
         /fieldType

 Here's more relevant schema.xml configs:

 field name=mashup type=search indexed=true stored=false
 multiValued=true/
 copyField source=keywords dest=mashup/
 copyField source=category dest=mashup/
 copyField source=name dest=mashup/
 copyField source=brand dest=mashup/
 copyField source=description_overview dest=mashup/
 copyField source=sku dest=mashup/
 !-- other copy fields... --

 The output of the query analyzer shows the following:

 Query Analyzer
 org.apache.solr.analysis.WhitespaceTokenizerFactory {}
 term position         1
 term text     reebox
 term type     word
 source start,end      0,6
 payload
 org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
 ignoreCase=true}
 term position         1
 term text     reebox
 term type     word
 source start,end      0,6
 payload
 org.apache.solr.analysis.WordDelimiterFilterFactory
 {generateNumberParts=0,
 catenateWords=1, generateWordParts=0, catenateAll=0, catenateNumbers=1}
 term position         1
 term text     reebox
 term type     word
 source start,end      0,6
 payload
 org.apache.solr.analysis.LowerCaseFilterFactory {}
 term position         1
 term text     reebox
 term type     word
 source start,end      0,6
 payload
 org.apache.solr.analysis.SnowballPorterFilterFactory
 {protected=protwords.txt, language=English}
 term position         1
 term text     reebox
 term type     word
 source start,end      0,6
 payload
 org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
 term position         1
 term text     reebox
 term type     word
 source start,end      0,6
 payload

 So reebox is never being converted to Reebok.  I thought that if I had
 index time synonyms with expansion configured that I wouldn't need query
 time synonyms.  Maybe my dynamic synonyms generation isn't formatted
 correctly for my desired result?

 If I use the same synonyms.txt 

Re: Function query to boost scores by a constant if all terms are present

2010-08-17 Thread Ahmet Arslan
 Most of the time, items that match all three terms will
 float to the top by
 normal ranking, but sometimes there are only two terms that
 are like a rash
 across the record, and they end up with a higher score than
 some items that
 match all three query terms.
 
 I'd like to boost items with all the query terms to the top
 *without
 changing their order*.
 
 My first thought was to use a simple boost query
 allfields:(a AND b AND c),
 but the order of the set of records that contain all three
 terms changes
 when I do that. What I *think* I need to do is basically to
 say, Hey, all
 the items with all three terms get an extra 40,000 points,
 but change
 nothing else.

This is a hard task, and I am not sure it is possible. But you need to change 
similarity algorithm for that. Final score is composed of many factors. coord, 
norm, tf-idf ... 

http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html

May be you can try to customize coord(q,d). But there can be always some cases 
that you describe. For example very long document containing three terms will 
be punished due to its length. A very short document with two query terms can 
pop-up before it.

It is easy to rank items with all three terms so that they comes first, 
(omitNorms=true and omitTermFreqAndPositions=true should almost do it) but 
change nothing else part is not.

Easiest thing can be throw additional query with pure AND operator and display 
these result in a special way.


  


queryResultCache has no hits for date boost function

2010-08-17 Thread Peter Karich
Hi all,

my queryResultCache has no hits. But if I am removing one line from the
bf section in my dismax handler all is fine. Here is the line:
recip(ms(NOW,date),3.16e-11,1,1)

According to
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents

this should be fine. So what is the problem with this and how could I
fix it?

Regards,
Peter.

PS: the problem raised in the thread 'Improve Query Time For Large Index'.


changable DIH datasource based on environment variables

2010-08-17 Thread Tommy Chheng
 I defined my DIH datasource in solrconfig.xml. Is there a way to 
define two sets of data sources and use one based on the current 
system's environment variable?(ex. APP_ENV=production or 
APP_ENV=development)


I run the DIH on my local machine and remote server. They use different 
mysql datasources for importing.


--
@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests: http://gradschoolnow.com



Integrating Solr's SynonymFilter in lucene

2010-08-17 Thread Arun Rangarajan
I am trying to have multi-word synonyms work in lucene using Solr's *
SynonymFilter*.

I need to match synonyms at index time, since many of the synonym lists are
huge. Actually they are really not synonyms, but are words that belong to a
concept. For example, I would like to map {New York, Los Angeles, New
Orleans, Salt Lake City...}, a bunch of city names, to the concept called
city. While searching, the user query for the concept city will be
translated to a keyword like, say CONCEPTcity, which is the synonym for
any city name.

Using lucene's SynonymAnalyzer, as explained in Lucene in Action (p. 131),
all I could match for CONCEPTcity is single word city names like
Chicago, Seattle, Boston, etc., It would not match multi-word city
names like New York, Los Angeles, etc.,

I tried using Solr's SynonymFilter in tokenStream method in a custom
Analyzer (that extends org.apache.lucene.analysis.
Analyzer - lucene ver. 2.9.3) using:

*public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream result = new SynonymFilter(
new WhitespaceTokenizer(reader),
synonymMap);
return result;
}
*
where *synonymMap* is loaded with synonyms using

*synonymMap.add(conceptTerms, listOfTokens, true, true);*

where *conceptTerms* is of type *ArrayListString* of all the terms in a
concept and *listofTokens* is of type *ListToken  *and contains only the
generic synonym identifier like *CONCEPTcity*.

When I print synonymMap using synonymMap.toString(), I get the output like

{New York={Chicago={Seattle={New
Orleans=[(CATEGORYcity,0,0,type=SYNONYM),ORIG],null}}}}

so it looks like all the synonyms are loaded. But if I search for
CATEGORYcity then it says no matches found. I am not sure whether I have
loaded the synonyms correctly in the synonymMap.

Any help will be deeply appreciated. Thanks!


Re: Solr date NOW - format?

2010-08-17 Thread Shawn Heisey
 Would I do separate bq values for each of the ranges, or is there a 
way to include them all at once?  If it's the latter, I'll need a full 
example with a field name, because I'm clueless. :)


On 8/17/2010 2:29 PM, Lance Norskog wrote:

I think 'bq=' is what you want. In dismax the main query string is
assumed to go against a bunch of fields. This query is in the standard
(Lucene++) format. The query strings should handle the ^number syntax.

http://www.lucidimagination.com/search/document/CDRG_ch07_7.4.2.9

On Tue, Aug 17, 2010 at 9:40 AM, Shawn Heiseys...@elyograg.org  wrote:

  On 4/9/2010 7:35 PM, Lance Norskog wrote:

Function queries are notoriously slow. Another way to boost by year is
with range queries:
[NOW-6MONTHS TO NOW]^5.0 ,
[NOW-1YEARS TO NOW-6MONTHS]^3.0
[NOW-2YEARS TO NOW-1YEARS]^2.0
[* TO NOW-2YEARS]^1.0

Notice that you get to have a non-linear curve when you select the
ranges by hand.

Lance, I have worked out my major issue and now have my post date in Solr as
a tdate field named pd.  I cannot however figure out how to actually send
a query with a date boost like you've mentioned above.  I'd like to embed it
right into the dismax handler definition, but it would be good to also know
how to send it in a query myself.  Can you help?

Are the boosts indicated above a multiplier, or an addition?




Re: OutOfMemoryErrors

2010-08-17 Thread rajini maski
Yeah fine..I will do that...Before the merge Factor was 10 itself ...After
finding this error I just set its value higher assuming if that could be
error anyway... Will re change it..

The ramBufferSize  is 256MB... Do I need to  change this value to higher?


On Wed, Aug 18, 2010 at 12:27 AM, Jay Hill jayallenh...@gmail.com wrote:

 A merge factor of 100 is very high and out of the norm. Try starting with a
 value of 10. I've never seen a running system with a value anywhere near
 this high.

 Also, what is your setting for ramBufferSizeMB?

 -Jay

 On Tue, Aug 17, 2010 at 10:46 AM, rajini maski rajinima...@gmail.com
 wrote:

  yeah sorry I forgot to mention others...
 
  mergeFactor100/mergeFactor
  maxBufferedDocs1000/maxBufferedDocs
  maxMergeDocs10/maxMergeDocs
  maxFieldLength1/maxFieldLength
 
  above are the values
 
  Is this because of values here...initially I had mergeFactor parameter
 -10
  and maxMergedocs-1With the same error i changed them to above
  values..Yet I got that error after index was about 2lacs docs...
 
  On Tue, Aug 17, 2010 at 11:04 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
   There are more merge paramaters, what values do you have for these:
  
   mergeFactor10/mergeFactor
   maxBufferedDocs1000/maxBufferedDocs
   maxMergeDocs2147483647/maxMergeDocs
   maxFieldLength1/maxFieldLength
  
   See: http://wiki.apache.org/solr/SolrConfigXml
  
   Hope that formatting comes through the various mail programs OK
  
   Also, what else happens while you're indexing? Do you search
   while indexing? How often do you commit your changes?
  
  
  
   On Tue, Aug 17, 2010 at 1:18 PM, rajini maski rajinima...@gmail.com
   wrote:
  
mergefactor100 /mergefactor
JVM Initial memory pool -256MB
  Maximum memory pool -1024MB
   
add
doc
fieldlong:ID/field
fieldstr:Body/field

12 fields
/filed
/doc
/add
I have a solr instance in solr folder (D:/Solr) free space in disc is
24.3GB
.. How will I get to know what portion of memory is solr using ?
   
   
   
On Tue, Aug 17, 2010 at 10:11 PM, Erick Erickson 
   erickerick...@gmail.com
wrote:
   
 You shouldn't be getting this error at all unless you're doing
   something
 out of the ordinary. So, it'd help if you told us:

 What parameters you have set for merging
 What parameters you have set for the JVM
 What kind of documents are you indexing?

 The memory you have is irrelevant if you only allocate a small
 portion of it for the running process...

 Best
 Erick

 On Tue, Aug 17, 2010 at 7:35 AM, rajini maski 
 rajinima...@gmail.com
  
 wrote:

  I am getting it while indexing data to solr not while querying...
  Though I have enough memory space upto 40GB and I my indexing
 data
  is
 just
  5-6 GB yet that particular error is seldom observed... (SEVERE
  ERROR
   :
 JAVA
  HEAP SPACE , OUT OF MEMORY ERROR )
  I could see one lock file generated in the data/index path just
  after
 this
  error.
 
 
 
  On Tue, Aug 17, 2010 at 4:49 PM, Peter Karich peat...@yahoo.de
wrote:
 
  
Is there a way to verify that I have added correctlly?
   
  
   on linux you can do
   ps -elf | grep Boot
   and see if the java command has the parameters added.
  
   @all: why and when do you get those OOMs? while querying? which
queries
   in detail?
  
   Regards,
   Peter.
  
 

   
  
 



Re: OutOfMemoryErrors

2010-08-17 Thread Grijesh.singh

ramBufferSize is preferred to be 128MB more than that it does not seemes to
improve performance
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/OutOfMemoryErrors-tp1181731p1199592.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr date NOW - format?

2010-08-17 Thread Shawn Heisey

 My first attempt was adding this to the dismax handler:

str name=bqpd:[NOW-1MONTH TO NOW]^5.0/str
str name=bqpd:[NOW-3MONTHS TO NOW-1MONTH]^3.0/str
str name=bqpd:[NOW-1YEAR TO NOW-3MONTHS]^2.0/str
str name=bqpd:[* TO NOW-1YEAR]^1.0/str

This results in scores that are quite a bit lower (9.5 max score instead 
of 11.7), but the order looks the same.  No real change other than a 
higher max score (10) if I leave only the first bq entry.


I wasn't able to figure out a way to put all the ranges in one bq, 
everything I tried got zero results.


What am I doing wrong?


On 8/17/2010 8:36 PM, Shawn Heisey wrote:
 Would I do separate bq values for each of the ranges, or is there a 
way to include them all at once?  If it's the latter, I'll need a full 
example with a field name, because I'm clueless. :)


On 8/17/2010 2:29 PM, Lance Norskog wrote:

I think 'bq=' is what you want. In dismax the main query string is
assumed to go against a bunch of fields. This query is in the standard
(Lucene++) format. The query strings should handle the ^number syntax.

http://www.lucidimagination.com/search/document/CDRG_ch07_7.4.2.9

On Tue, Aug 17, 2010 at 9:40 AM, Shawn Heiseys...@elyograg.org  wrote:

  On 4/9/2010 7:35 PM, Lance Norskog wrote:

Function queries are notoriously slow. Another way to boost by year is
with range queries:
[NOW-6MONTHS TO NOW]^5.0 ,
[NOW-1YEARS TO NOW-6MONTHS]^3.0
[NOW-2YEARS TO NOW-1YEARS]^2.0
[* TO NOW-2YEARS]^1.0

Notice that you get to have a non-linear curve when you select the
ranges by hand.
Lance, I have worked out my major issue and now have my post date in 
Solr as
a tdate field named pd.  I cannot however figure out how to 
actually send
a query with a date boost like you've mentioned above.  I'd like to 
embed it
right into the dismax handler definition, but it would be good to 
also know

how to send it in a query myself.  Can you help?

Are the boosts indicated above a multiplier, or an addition?






Re: indexing???

2010-08-17 Thread satya swaroop
hi,

1) i use tika 0.8...

2)the url is  https://issues.apache.org/jira/browse/PDFBOX-709 and the
file is samplerequestform.pdf

 3)the entire error is::;
curl 
http://localhost:8080/solr/update/extract?stream.file=/home/satya/my_workings/satya_ebooks/8-Linux/samplerequestform.pdfliteral.id=linuxc




  htmlheadtitleApache Tomcat/6.0.26 - Error
report/titlestyle!--H1
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
H2
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
H3
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
BODY
{font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
P
{font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
{color : black;}A.name {color : black;}HR {color : #525D76;}--/style
/headbodyh1HTTP Status 500 - org.apache.tika.exception.TikaException:
Unexpected RuntimeException from
org.apache.tika.parser.pdf.pdfpar...@1d688e2

org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.pdf.pdfpar...@1d688e2
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:214)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.pdf.pdfpar...@1d688e2
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:144)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:193)
... 18 more
Caused by: java.lang.ClassCastException:
org.apache.pdfbox.pdmodel.font.PDFontDescriptorAFM cannot be cast to
org.apache.pdfbox.pdmodel.font.PDFontDescriptorDictionary
at
org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:167)
at
org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.lt;initgt;(PDTrueTypeFont.java:117)
at
org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:140)
at
org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:76)
at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:225)
at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:207)
at
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:367)
at
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:291)
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:247)
at
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:79)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142)
... 21 more
/h1HR size=1 noshade=noshadepbtype/b Status
report/ppbmessage/b uorg.apache.tika.exception.TikaException:
Unexpected RuntimeException from

Fun with Spatial (Haversine formula)

2010-08-17 Thread Lance Norskog
The Haversine formula in o.a.s.s.f.d.DistanceUtils.java gives these
results for a 0.1 degree difference in miles:

equator horizontal 0.1 deg: lat/lon 0.0/0.0 - 396.320504
equator vertical   0.1 deg: lat/lon 0.0/0.0 - 396.320504
NYC horizontal 0.1 deg: lat/lon -72.0/0.0   - 383.33093669272654
NYC vertical   0.1 deg: lat/lon -72.0/0.0   - 396.3204997747
arctic horizontal  0.1 deg: lat/lon 89.0/0.0- 202.13129169290906
arctic vertical0.1 deg: lat/lon 89.0/0.0- 396.3204997747
N. Pole horizontal 0.1 deg: lat/lon 89.8/0.0- 103.61036292825034
N. Pole vertical   0.1 deg: lat/lon 89.8/0.0- 396.320500338

That is, a horizontal shift of 0.1 at the equator, New York City's
latitude, 1 degree south of the North Pole and almost-almost-almost at
the North Pole, these are the distances in miles.
The latitude changes make perfect sense, but one would expect the
longitudes to shrink as well.

Here is the code, added to DistanceUtils.java. What am I doing wrong?


  public static void main(String[] args) {
  show(equator horizontal 0.1 deg, 0.0, 0.0, 0.0, 0.1);
  show(equator vertical   0.1 deg, 0.0, 0.0, 0.1, 0.0);
  show(NYC horizontal 0.1 deg, -72, 0.0, -72, 0.1);
  show(NYC vertical   0.1 deg, -72, 0, -72.1, 0.0);
  show(arctic horizontal  0.1 deg, 89.0, 0.0, 89.0, 0.1);
  show(arctic vertical0.1 deg, 89.0, 0.0, 89.1, 0.0);
  show(N. Pole horizontal 0.1 deg, 89.8, 0.0, 89.8, 0.1);
  show(N. Pole vertical   0.1 deg, 89.8, 0.0, 89.9, 0.0);
  }

  private static void show(String label, double d, double e, double f,
double g) {
  System.out.println(label + : lat/lon  + d + / + e +  \t-  +
haversine(d,e,f,g, 3963.205));
  }

(This is from the Solr trunk.)

-- 
Lance Norskog
goks...@gmail.com