Re: why sorl is slower than lucene so much?

2010-10-22 Thread kafka0102

thanks a lot.
I got it.

On 2010年10月21日 22:36, Yonik Seeley wrote:

2010/10/21 kafka0102kafka0...@163.com:

I found the problem's cause.It's the DocSetCollector. my fitler query result's 
size is about 300,so the DocSetCollector.getDocSet() is OpenBitSet. And 
300 OpenBitSet.fastSet(doc) op is too slow.


As I said in my other response to you, that's a perfect reason why you
want Solr to cache that for you (unless the filter will be different
each time).

-Yonik
http://www.lucidimagination.com





Re: Import From MYSQL database

2010-10-22 Thread do3do3

i really try to index tables with english keywords in mysql database but
fail, and also try to import data from this database during java and
successed 
i don't know how to use the dataimport folder in contrib folder, may be this
is the problem
what i done was build configurations file (shema.xml, solrconfig.xml,
db-data-config.xml)  put
mysql lib in lib folder


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Import-From-MYSQL-database-tp1738753p1751201.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Javascript+JSON not optimized for SEO

2010-10-22 Thread PeterKerk

Hi,

When I retrieve data via javascript+JSON method (instead of REST via URL),
the link which I click does not reflect what the user will end up seeing.

Example for showing the features belonging to a LED TV product:
JSON

getFeatureFacets('LEDTV') Get features for LED TV 

REST
www.domain.com/TVs/LEDTV Get features for LED TV 

As you see the href in the second example clearly displays what the user may
expect when clicking this link. That is VERY important for search engines.
So how can I still use javascript+JSON, but not loose the SEO value?

Regards,
Peter
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Javascript-JSON-not-optimized-for-SEO-tp1751641p1751641.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MoreLikeThis explanation?

2010-10-22 Thread Darren Govoni
Hi Koji,
   I tried to apply your patch to the 1.4.0 tagged branch, but it didn't
take completely.
What branch does it work for? 

Darren

On Thu, 2010-10-21 at 23:03 +0900, Koji Sekiguchi wrote:

 (10/10/21 20:33), dar...@ontrenet.com wrote:
  Hi,
 Does the latest Solr provide an explanation for results returned by MLT?
 
 No, but there is an open issue:
 
 https://issues.apache.org/jira/browse/SOLR-860
 
 Koji
 




Re: different results depending on result format

2010-10-22 Thread Savvas-Andreas Moysidis
strange..are you absolutely sure the two queries are directed to the same
Solr instance? I'm running the same query from the admin page (which
specifies the xml format) and I get the exact same results as solrj.

On 21 October 2010 22:25, Mike Sokolov soko...@ifactory.com wrote:

 quick follow-up: I also notice that the query from solrj gets version=1,
 whereas the admin webapp puts version=2.2 on the query string, although this
 param doesn't seem to change the xml results at all.  Does this indicate an
 older version of solrj perhaps?

 -Mike


 On 10/21/2010 04:47 PM, Mike Sokolov wrote:

 I'm experiencing something really weird: I get different results depending
 on whether I specify wt=javabin, and retrieve using SolrJ, or wt=xml.  I
 spent quite a while staring at query params to make sure everything else is
 the same, and they do seem to be.  At first I thought the problem related to
 the javabin format change that has been talked about recently, but I am
 using solr 1.4.0 and solrj 1.4.0.

 Notice in the two entries that the wt param is different and the hits
 result count is different.

 Oct 21, 2010 4:22:19 PM org.apache.solr.core.SolrCore execute
 INFO: [bopp.ba] webapp=/solr path=/select/
 params={wt=xmlrows=20start=0facet=truefacet.field=ref_taxid_msq=*:*fl=uri,meta_ssversion=1}
 hits=261 status=0 QTime=1
 Oct 21, 2010 4:22:28 PM org.apache.solr.core.SolrCore execute
 INFO: [bopp.ba] webapp=/solr path=/select
 params={wt=javabinrows=20start=0facet=truefacet.field=ref_taxid_msq=*:*fl=uri,meta_ssversion=1}
 hits=57 status=0 QTime=0


 The xml format results seem to be the correct ones. So one thought I had
 is that I could somehow fall back to using xml format in solrj, but I tried
 SolrQuery.set('wt','xml') and that didn't have the desired effect (I get
 'wt=javabinwt=javabin' in the log - ie the param is repeated, but still
 javabin).


 Am I crazy? Is this a known issue?

 Thanks for any suggestions




Re: mincount doesn't work with FacetQuery

2010-10-22 Thread Mark Allan
This is a response to a thread from several months ago ( http://lucene.472066.n3.nabble.com/mincount-doesn-t-work-with-FacetQuery-tp473162p473162.html 
 ) Sorry, I don't know where to get the thread number to request that  
specific thread from listserv and reply properly via email.


Anyway, I've recently come across the same problem while working with  
branch_3x of Solr and I'm wondering if anyone ever opened a JIRA for  
this feature request?  I can't find one but that doesn't mean it's not  
there, and I don't want to create a duplicate.


Cheers
Mark

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



facet Prefix (or term prefix)

2010-10-22 Thread Jason Brown

I am aware of the facet.prefix facility. I am using SOLR to return a facetted 
fields contents - I use the facet.prefix to restrict what returns from SOLR - 
this is very useful for predictive search functionality (autocomplete).

My only issue is that the field I facet on is a string and could have 2 or 3 
words in it, thus this process will only return strings that begin with what 
the user is typing into my UI search box. It would be useful if I could get 
facets back where I could match somewhere in the facetted field (not just at 
the begninning), i.e. is there a fact.contains method?

If not I'll just have to code this in my service layer having received all 
facets from SOLR (without the prefix)

Thanks for any help.




If you wish to view the St. James's Place email disclaimer, please use the link 
below

http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer


Re: facet Prefix (or term prefix)

2010-10-22 Thread Markus Jelsma
Hi,

There is no facet.contains facility there are alternatives. Instead of using 
the faceting engine, you will need to create a field that has an 
NGramTokenizer.  Properly configured, you can use this field to query upon and 
it will return what you would expect from a facet.contains feature.

Here's a post on the subject which you may find useful:
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-
queries-using-edgengrams/

Cheers,

On Friday 22 October 2010 13:20:56 Jason Brown wrote:
 I am aware of the facet.prefix facility. I am using SOLR to return a
 facetted fields contents - I use the facet.prefix to restrict what returns
 from SOLR - this is very useful for predictive search functionality
 (autocomplete).
 
 My only issue is that the field I facet on is a string and could have 2 or
 3 words in it, thus this process will only return strings that begin with
 what the user is typing into my UI search box. It would be useful if I
 could get facets back where I could match somewhere in the facetted field
 (not just at the begninning), i.e. is there a fact.contains method?
 
 If not I'll just have to code this in my service layer having received all
 facets from SOLR (without the prefix)
 
 Thanks for any help.
 
 
 
 
 If you wish to view the St. James's Place email disclaimer, please use the
 link below
 
 http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: how well does multicore scale?

2010-10-22 Thread Tharindu Mathew
On Fri, Oct 22, 2010 at 11:18 AM, Lance Norskog goks...@gmail.com wrote:
 There is an API now for dynamically loading, unloading, creating and
 deleting cores.
 Restarting a Solr with thousands of cores will take, I don't know, hours.

Is this in the trunk? Any docs available?
 On Thu, Oct 21, 2010 at 10:44 PM, Tharindu Mathew mcclou...@gmail.com wrote:
 Hi Mike,

 I've also considered using a separate cores in a multi tenant
 application, ie a separate core for each tenant/domain. But the cores
 do not suit that purpose.

 If you check out documentation no real API support exists for this so
 it can be done dynamically through SolrJ. And all use cases I found,
 only had users configuring it statically and then using it. That was
 maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks.

 So your better off using a single index and with a user id and use a
 query filter with the user id when fetching data.

 On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind rochk...@jhu.edu wrote:
 No, it does not seem reasonable.  Why do you think you need a seperate core
 for every user?
 mike anderson wrote:

 I'm exploring the possibility of using cores as a solution to bookmark
 folders in my solr application. This would mean I'll need tens of
 thousands
 of cores... does this seem reasonable? I have plenty of CPUs available for
 scaling, but I wonder about the memory overhead of adding cores (aside
 from
 needing to fit the new index in memory).

 Thoughts?

 -mike






 --
 Regards,

 Tharindu




 --
 Lance Norskog
 goks...@gmail.com




-- 
Regards,

Tharindu


Re: how well does multicore scale?

2010-10-22 Thread Mark Miller
On 10/22/10 1:44 AM, Tharindu Mathew wrote:
 Hi Mike,
 
 I've also considered using a separate cores in a multi tenant
 application, ie a separate core for each tenant/domain. But the cores
 do not suit that purpose.
 
 If you check out documentation no real API support exists for this so
 it can be done dynamically through SolrJ. And all use cases I found,
 only had users configuring it statically and then using it. That was
 maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks.

You can dynamically manage cores with solrj. See
org.apache.solr.client.solrj.request.CoreAdminRequest's static methods
for a place to start.

You probably want to turn solr.xml's persist option on so that your
cores survive restarts.

 
 So your better off using a single index and with a user id and use a
 query filter with the user id when fetching data.

Many times this is probably the case - pro's and con's to each depending
on what you are up to.

- Mark
lucidimagination.com

 
 On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind rochk...@jhu.edu wrote:
 No, it does not seem reasonable.  Why do you think you need a seperate core
 for every user?
 mike anderson wrote:

 I'm exploring the possibility of using cores as a solution to bookmark
 folders in my solr application. This would mean I'll need tens of
 thousands
 of cores... does this seem reasonable? I have plenty of CPUs available for
 scaling, but I wonder about the memory overhead of adding cores (aside
 from
 needing to fit the new index in memory).

 Thoughts?

 -mike



 
 
 



solr performance

2010-10-22 Thread Markus.Rietzler
last week we put our solr in production. it was a very smooth start. solr 
really works great and without any problems so far. 
its a huge improvement over our old intranet search 

i wonder however whether we can increase the search performance of our solr 
installation, just to make the search experience even better.

i know that performance is depended on many different things and parameters so 
a general answer is hard to make.

here are some figures:

- at the moment we have a bout 20.000 search queries a day.
- median query time is about 400ms
- ca. 80% are running under 500ms
- ca. 90% are running under 1s, 
- ca. 10% over 1s, 3% over 2s, 
- there are even some queries which lasts way to looong, over 6s and up to 18s
there are even simple queries for one word which last that long.
maybe there is one special thing to mention. we do have a kind of user-filter 
with each query,
these parameters differs for each usergroup, so i think at least one of the 
caches won't work very well, because even if the query (foobar) is the same, fq 
and bq can (and will) differ from user to user.

fq=__intern:0+OR+__intern:344 together with a boost query 
bq=__lokal:0^6+OR+__lokal:344^2

our query looks like:

INFO: [core] webapp=/solr path=/select 
params={spellcheck=truefacet=onfacet.limit=500initSearch=1hl=onversion=1.2bq=__lokal:0^6+OR+__lokal:344^2fl=score,+id,+title,+visiblePath,+__doctype,+_erstelldatum,+_dienststelle,+_dokumententyp,+__source,+__intern,+objClass,+jurislinkUrl,+destinationUrl,+_aktenzeichen,+_stelle,+_zielgruppen,+_stichwort,+_kurzbeschreibung,+_autor,+_hauptthema,+_unterthemafacet.field=__sourcefacet.field=__dstfacet.field=__cyearfacet.field=_dokumententypfacet.field=__mikronavfacet.field=_zielgruppenfacet.field=__doctypespellcheck.count=2qt=dismaxfq=__intern:0+OR+__intern:344hl.fragsize=640facet.mincount=1spellcheck.extendedResults=truejson.nl=maphl.fl=body,+_kurzbeschreibung,+_stichwortwt=jsonspellcheck.collate=truehl.maxAnalyzedChars=9rows=20spellcheck.onlyMorePopular=falsestart=0facet.sort=indexq=foobar}
 hits=93 status=0 QTime=113

- we have indexed 115.000 documents, our index size is about 720 MB

any hints where to look? what will the ramBufferSizeMB in mainIndex in 
solrconfig.xml do? does it make sense to increase this value? should we 
increase one of your caches? 

- we're using jetty, java jdk 1.6.0_21, java settings are -D64 -server -Xms892m 
-Xmx2048m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC 
-XX:-HeapDumpOnOutOfMemoryError -XX:+CMSClassUnloadingEnabled 
-XX:+CMSPermGenSweepingEnabled 
- our machine as 4GB of mem and 4 CPUs, load is about 0.6%, the java process 
seems to use only one CPU, no other services are running on this machine.

- from the beginning we have a master/slave setup. but at the moment we are 
only working with the master. yesterday i included the slave in our search 
application, so that half the queries were handled by the master and the other 
half by the slave. the query times didn't change. so it is not a bottleneck 
with our machine or I/O or memory. 

- cache stats from admin panel

queryResultCache, LRU Cache(maxSize=65536, initialSize=65536)  
lookups : 1159
hits : 498
hitratio : 0.42  (=== seems a bit low compared to the other)
inserts : 697
evictions : 0
size : 661
warmupTime : 0
cumulative_lookups : 91470
cumulative_hits : 41370
cumulative_hitratio : 0.45
cumulative_inserts : 52835
cumulative_evictions : 0 

documentCache, LRU Cache(maxSize=32768, initialSize=32768)   
lookups : 53099 
hits : 45429 
hitratio : 0.85 
inserts : 7670 
evictions : 0 
size : 7670 
warmupTime : 0 
cumulative_lookups : 4254335 
cumulative_hits : 3760521 
cumulative_hitratio : 0.88 
cumulative_inserts : 493814 
cumulative_evictions : 0

fieldValueCache, Concurrent LRU Cache(maxSize=1, initialSize=10, 
minSize=9000, acceptableSize=9500, cleanupThread=false) 
lookups : 3312 
hits : 3306 
hitratio : 0.99 
inserts : 3 
evictions : 0 
size : 3 
warmupTime : 0 
cumulative_lookups : 261969 
cumulative_hits : 261351 
cumulative_hitratio : 0.99 
cumulative_inserts : 306 
cumulative_evictions : 0 
item__zielgruppen : 
{field=_zielgruppen,memSize=491861,tindexSize=46,time=10,phase1=10,nTerms=46,bigTerms=13,termInstances=53913,uses=1187}
 
item___mikronav : 
{field=__mikronav,memSize=464524,tindexSize=82,time=5,phase1=5,nTerms=39,bigTerms=4,termInstances=18817,uses=1187}
 
item___dst : 
{field=__dst,memSize=464640,tindexSize=66,time=10,phase1=9,nTerms=160,bigTerms=5,termInstances=86516,uses=1187}
 
(these are a few of our facet fields)

filterCache Concurrent LRU Cache(maxSize=16384, initialSize=16384, 
minSize=14745, acceptableSize=15564, cleanupThread=false) 
lookups : 26851 
hits : 26434 
hitratio : 0.98 
inserts : 417 
evictions : 0 
size : 417 
warmupTime : 0 
cumulative_lookups : 1985851 
cumulative_hits : 1959304 
cumulative_hitratio : 0.98 
cumulative_inserts : 26547 
cumulative_evictions : 0



Markus Rietzler
rietzler_software/
Rechenzentrum der 

Re: different results depending on result format

2010-10-22 Thread Mike Sokolov
Yes - I really only have the one solr instance.  And I have plenty of 
other cases where I am getting good results back via solrj.  It's really 
a mystery.  Unfortunately I have to catch up on other stuff I have been 
neglecting, but I'll follow up when I'm able to get a solution...


-Mike


On 10/22/2010 06:58 AM, Savvas-Andreas Moysidis wrote:

strange..are you absolutely sure the two queries are directed to the same
Solr instance? I'm running the same query from the admin page (which
specifies the xml format) and I get the exact same results as solrj.

On 21 October 2010 22:25, Mike Sokolovsoko...@ifactory.com  wrote:

   

quick follow-up: I also notice that the query from solrj gets version=1,
whereas the admin webapp puts version=2.2 on the query string, although this
param doesn't seem to change the xml results at all.  Does this indicate an
older version of solrj perhaps?

-Mike


On 10/21/2010 04:47 PM, Mike Sokolov wrote:

 

I'm experiencing something really weird: I get different results depending
on whether I specify wt=javabin, and retrieve using SolrJ, or wt=xml.  I
spent quite a while staring at query params to make sure everything else is
the same, and they do seem to be.  At first I thought the problem related to
the javabin format change that has been talked about recently, but I am
using solr 1.4.0 and solrj 1.4.0.

Notice in the two entries that the wt param is different and the hits
result count is different.

Oct 21, 2010 4:22:19 PM org.apache.solr.core.SolrCore execute
INFO: [bopp.ba] webapp=/solr path=/select/
params={wt=xmlrows=20start=0facet=truefacet.field=ref_taxid_msq=*:*fl=uri,meta_ssversion=1}
hits=261 status=0 QTime=1
Oct 21, 2010 4:22:28 PM org.apache.solr.core.SolrCore execute
INFO: [bopp.ba] webapp=/solr path=/select
params={wt=javabinrows=20start=0facet=truefacet.field=ref_taxid_msq=*:*fl=uri,meta_ssversion=1}
hits=57 status=0 QTime=0


The xml format results seem to be the correct ones. So one thought I had
is that I could somehow fall back to using xml format in solrj, but I tried
SolrQuery.set('wt','xml') and that didn't have the desired effect (I get
'wt=javabinwt=javabin' in the log - ie the param is repeated, but still
javabin).


Am I crazy? Is this a known issue?

Thanks for any suggestions


   
   


Re: Solr sorting problem

2010-10-22 Thread Moazzam Khan
The field type of the first name and last name is text. Could that be
why it's not sorting properly? I just changed it to string and started
a full-import. Hopefully that will work.

Thanks,
Moazzam

On Thu, Oct 21, 2010 at 7:42 PM, Jayendra Patil
jayendra.patil@gmail.com wrote:
 need additional information .
 Sorting is easy in Solr just by passing the sort parameter

 However, when it comes to text sorting it depends on how you analyse
 and tokenize your fields
 Sorting does not work on fields with multiple tokens.
 http://wiki.apache.org/solr/FAQ#Why_Isn.27t_Sorting_Working_on_my_Text_Fields.3F

 On Thu, Oct 21, 2010 at 7:24 PM, Moazzam Khan moazz...@gmail.com wrote:

 Hey guys,

 I have a list of people indexed in Solr. I am trying to sort by their
 first names but I keep getting results that are not alphabetically
 sorted (I see the names starting with W before the names starting with
 A). I have a feeling that the results are first being sorted by
 relevancy then sorted by first name.

 Is there a way I can get the results to be sorted alphabetically?

 Thanks,
 Moazzam




Re: Solr sorting problem

2010-10-22 Thread Moazzam Khan
For anyone who faced the same problem, changing the field to string
from text worked!

-Moazzam

On Fri, Oct 22, 2010 at 8:50 AM, Moazzam Khan moazz...@gmail.com wrote:
 The field type of the first name and last name is text. Could that be
 why it's not sorting properly? I just changed it to string and started
 a full-import. Hopefully that will work.

 Thanks,
 Moazzam

 On Thu, Oct 21, 2010 at 7:42 PM, Jayendra Patil
 jayendra.patil@gmail.com wrote:
 need additional information .
 Sorting is easy in Solr just by passing the sort parameter

 However, when it comes to text sorting it depends on how you analyse
 and tokenize your fields
 Sorting does not work on fields with multiple tokens.
 http://wiki.apache.org/solr/FAQ#Why_Isn.27t_Sorting_Working_on_my_Text_Fields.3F

 On Thu, Oct 21, 2010 at 7:24 PM, Moazzam Khan moazz...@gmail.com wrote:

 Hey guys,

 I have a list of people indexed in Solr. I am trying to sort by their
 first names but I keep getting results that are not alphabetically
 sorted (I see the names starting with W before the names starting with
 A). I have a feeling that the results are first being sorted by
 relevancy then sorted by first name.

 Is there a way I can get the results to be sorted alphabetically?

 Thanks,
 Moazzam





Re: Strange file name after installing solr

2010-10-22 Thread Grant Ingersoll

On Oct 21, 2010, at 11:52 PM, Bac Hoang wrote:

 apache-solr-1.4.1Hello folks,
 
 I'm very new user to solr. Please help
 
 What I have in hand: 1) apache-solr-1.4.1; 2) Geronimo
 
 After installing solr.war using Geronimo administration GUI, I got a 
 strange file, under the 
 opt/dev/ofwi-geronimo2.1.6/repository/default/*solr/1287558884961/solr-1287558884961.war.
  *Is this alright or any thing abnormal? My Geronimo says that solr running 
 status, but when start, I got an error java.lang.RuntimeException: Can't 
 find resource 'solrconfig.xml' in classpath or 'solr/conf/', cwd=/opt/dev...
 

Looks like you don't have your Solr Home set.  I would try starting with the 
Solr tutorial or one of the Solr books and get a basic understanding of how it 
works and then go towards deploying in Geronimo.

 Thanks indeed for your time
 
 With regards,
 Bac Hoang
 
 

--
Grant Ingersoll
http://www.lucidimagination.com



Re: different results depending on result format

2010-10-22 Thread Mike Sokolov
OK I solved the problem.  It turns out that I was connecting to the 
server using its FQDN (rosen.ifactory.com).  When, instead, I connect to 
it using the name rosen (which maps to the same IP using the default 
domain name configured in my resolver, ifactory.com), I get results back.


I am looking into the virtual hosts config in tomcat; it seems as if 
there must indeed be another solr instance running; in fact I'm now 
concerned there might be two solr instances running against the same 
data folder. yargh.


-Mike


On 10/22/2010 09:05 AM, Mike Sokolov wrote:
Yes - I really only have the one solr instance.  And I have plenty of 
other cases where I am getting good results back via solrj.  It's 
really a mystery.  Unfortunately I have to catch up on other stuff I 
have been neglecting, but I'll follow up when I'm able to get a 
solution...


-Mike


On 10/22/2010 06:58 AM, Savvas-Andreas Moysidis wrote:
strange..are you absolutely sure the two queries are directed to the 
same

Solr instance? I'm running the same query from the admin page (which
specifies the xml format) and I get the exact same results as solrj.

On 21 October 2010 22:25, Mike Sokolovsoko...@ifactory.com  wrote:

quick follow-up: I also notice that the query from solrj gets 
version=1,
whereas the admin webapp puts version=2.2 on the query string, 
although this
param doesn't seem to change the xml results at all.  Does this 
indicate an

older version of solrj perhaps?

-Mike


On 10/21/2010 04:47 PM, Mike Sokolov wrote:

I'm experiencing something really weird: I get different results 
depending
on whether I specify wt=javabin, and retrieve using SolrJ, or 
wt=xml.  I
spent quite a while staring at query params to make sure everything 
else is
the same, and they do seem to be.  At first I thought the problem 
related to
the javabin format change that has been talked about recently, but 
I am

using solr 1.4.0 and solrj 1.4.0.

Notice in the two entries that the wt param is different and the hits
result count is different.

Oct 21, 2010 4:22:19 PM org.apache.solr.core.SolrCore execute
INFO: [bopp.ba] webapp=/solr path=/select/
params={wt=xmlrows=20start=0facet=truefacet.field=ref_taxid_msq=*:*fl=uri,meta_ssversion=1} 


hits=261 status=0 QTime=1
Oct 21, 2010 4:22:28 PM org.apache.solr.core.SolrCore execute
INFO: [bopp.ba] webapp=/solr path=/select
params={wt=javabinrows=20start=0facet=truefacet.field=ref_taxid_msq=*:*fl=uri,meta_ssversion=1} 


hits=57 status=0 QTime=0


The xml format results seem to be the correct ones. So one thought 
I had
is that I could somehow fall back to using xml format in solrj, but 
I tried
SolrQuery.set('wt','xml') and that didn't have the desired effect 
(I get
'wt=javabinwt=javabin' in the log - ie the param is repeated, but 
still

javabin).


Am I crazy? Is this a known issue?

Thanks for any suggestions




Re: Import From MYSQL database

2010-10-22 Thread virtas

In the main directory of jetty should be directory called 'logs'

log name is usually coded like this:
2010_07_31.request.log

change the date and try searching your system
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Import-From-MYSQL-database-tp1738753p1752946.html
Sent from the Solr - User mailing list archive at Nabble.com.


Failing to successfully import international characters via DIH

2010-10-22 Thread virtas

Hi, 

wanted to share problem i have got with importing text from different
languages. All international text looks wrong on luke and on AJAX solr. 


What I see for chinese and japanese characters is this:
æ˜ ç”»ã‚„éŸ³æ¥½ãŒæ¥½ã—ã„ï¼AIのサイモンのファンです。アダãƒ
やマットが好きです。LeeDeWyze優勝!I

Although it should be:
映画や音楽が楽しい!AIのサイモンのファンです。アダムやマットが好きです。

My setup is Ubuntu server 10.04, Tomcat6, Solr 1.4 and mysql. 

Things i have configured but with no luck:
 1. /etc/tomcat6/server.xml contains this
Connector port=8080 protocol=HTTP/1.1 
   connectionTimeout=2 
   URIEncoding=UTF-8
   redirectPort=8443 /
 2. /etc/mysql/my.cnf contains:
 [mysqld]
   
 default-character-set = utf8
  character-set-server = utf8
  
 3. /etc/solr/conf/data-config.xml 
 dataConfig
  dataSource type=JdbcDataSource 
  driver=com.mysql.jdbc.Driver
 
url=jdbc:mysql://localhost:3306/spuvocom_spuvo?characterEncoding=UTF-8 

   encoding = UTF-8 /
  document
 4. my mysql table collation is utf8_bin

What would you recommend changing or checking?

Thanks in advance 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Failing-to-successfully-import-international-characters-via-DIH-tp1753190p1753190.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Failing to successfully import international characters via DIH

2010-10-22 Thread Pradeep Singh

 What would you recommend changing or checking?


Tomcat *Connector* URIEncoding. I have done this several times on tomcat,
might be at a loss on other servers though.

- Pradeep


Re: Failing to successfully import international characters via DIH

2010-10-22 Thread Pradeep Singh
Holy cow, you already have this in place. I apologize. This looked exactly
the kind of problem I have solved this way.

On Fri, Oct 22, 2010 at 8:38 AM, Pradeep Singh pksing...@gmail.com wrote:



 What would you recommend changing or checking?


 Tomcat *Connector* URIEncoding. I have done this several times on tomcat,
 might be at a loss on other servers though.

 - Pradeep



Using different schemas when syncing with PostgreSQL and DIH

2010-10-22 Thread Juan Manuel Alvarez
Hello everyone!

I am using Solr synced with a PostgreSQL database using DIH and I am
facing an issue.

The thing is that I use one Solr server and different Postgre schemas
in the same database, with the same tables inside each one, so the
following queries:

SELECT * FROM schema1.Objects;
and
SELECT * FROM schema2.Objects;

are both valid. The schemas are completely dynamic, so I can't do
anything manually each time I add a new schema.

In the DIH id field I am using a combination of the schema name and PK
of the Objects table, to avoid duplicates.

My question is:
Every time I do an import operation (delta or full) with DIH, I only
need to sync the index with one schema only, so... is there a way to
pass a custom parameter with the schema name to DIH so I can build the
query with the corresponding schema name?

Thank you very much!
Juan M.


How to index long words with StandardTokenizerFactory?

2010-10-22 Thread Sergey Bartunov
I'm trying to force solr to index words which length is more than 255
symbols (this constant is DEFAULT_MAX_TOKEN_LENGTH in lucene
StandardAnalyzer.java) using StandardTokenizerFactory as 'filter' tag
in schema configuration XML. Specifying the maxTokenLength attribute
won't work.

I'd tried to make the dirty hack: I downloaded lucene-core-2.9.3 src
and changed the DEFAULT_MAX_TOKEN_LENGTH to 100, built it to jar
and replaced original lucene-core jar in solr /lib. But seems like
that it had bring no effect.


Re: Solr Javascript+JSON not optimized for SEO

2010-10-22 Thread Dennis Gearon
How can we see what each will do?

Dennis Gearon

--- On Fri, 10/22/10, PeterKerk vettepa...@hotmail.com wrote:

 From: PeterKerk vettepa...@hotmail.com
 Subject: Solr Javascript+JSON not optimized for SEO
 To: solr-user@lucene.apache.org
 Date: Friday, October 22, 2010, 2:59 AM
 
 Hi,
 
 When I retrieve data via javascript+JSON method (instead of
 REST via URL),
 the link which I click does not reflect what the user will
 end up seeing.
 
 Example for showing the features belonging to a LED TV
 product:
 JSON
 
 getFeatureFacets('LEDTV') Get features for LED TV 
 
 REST
 www.domain.com/TVs/LEDTV Get features for LED TV 
 
 As you see the href in the second example clearly displays
 what the user may
 expect when clicking this link. That is VERY important for
 search engines.
 So how can I still use javascript+JSON, but not loose the
 SEO value?
 
 Regards,
 Peter
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Javascript-JSON-not-optimized-for-SEO-tp1751641p1751641.html
 Sent from the Solr - User mailing list archive at
 Nabble.com.
 


Re: Failing to successfully import international characters via DIH

2010-10-22 Thread Dennis Gearon
Sounds like one of three things:
1/ Everything is set to UTF-*, but the content has another encoding.
2/ Something 'mirocosoftish' is adding a BOM (byte order mark) that is being 
incorrectly interpreted.
3/ The byte order is wrong somewhere along the way and not being translated 
correctly across machine/media boundaries.


You need to look at what your source is providing, directly first, before it 
gets into the database. Then do the following.

I would open up an editor that you KNOW outputs utf-8:

1/ Compose a web page, view it with fonts set to UTF8, that will tell you that 
it is really creating UTF-8 files. (Obviously use some character over 0xFF)

2/ Build an SQL query with it that inserts one record, or many, using those 
characters. Try commandline, server side language, and any  DBase management 
program also. Make the records distinct relative to where they are being 
inserted from.

3/ Select these records and view on a web page set to UTF-8 and see if they 
come out of the database OK.

4/ Import inot Solr, and view again in a browser set to UTF-8
Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Fri, 10/22/10, virtas pkaro...@gmail.com wrote:

 From: virtas pkaro...@gmail.com
 Subject: Failing to successfully import international characters via DIH
 To: solr-user@lucene.apache.org
 Date: Friday, October 22, 2010, 8:20 AM
 
 Hi, 
 
 wanted to share problem i have got with importing text from
 different
 languages. All international text looks wrong on luke and
 on AJAX solr. 
 
 
 What I see for chinese and japanese characters is this:
 æ˜
 画や音楽ãŒæ¥½ã—ã„ï¼AIã®ã‚µã‚¤ãƒ¢ãƒ³ã®ãƒ•ã‚¡ãƒ³ã§ã™ã€‚アダãƒ
 やマットãŒå¥½ãã§ã™ã€‚LeeDeWyze優å‹ï¼I
 
 Although it should be:
 映画や音楽が楽しい!AIのサイモンのファンです。アダムやマットが好きです。
 
 My setup is Ubuntu server 10.04, Tomcat6, Solr 1.4 and
 mysql. 
 
 Things i have configured but with no luck:
  1. /etc/tomcat6/server.xml contains this
 Connector port=8080 protocol=HTTP/1.1 
            
    connectionTimeout=2 
            
    URIEncoding=UTF-8
            
    redirectPort=8443 /
  2. /etc/mysql/my.cnf contains:
  [mysqld]
    
  default-character-set = utf8
   character-set-server = utf8
   
  3. /etc/solr/conf/data-config.xml 
  dataConfig
   dataSource type=JdbcDataSource 
              
 driver=com.mysql.jdbc.Driver
              
 url=jdbc:mysql://localhost:3306/spuvocom_spuvo?characterEncoding=UTF-8
 
 
            
    encoding = UTF-8 /
   document
  4. my mysql table collation is utf8_bin   
 
 
 What would you recommend changing or checking?
 
 Thanks in advance 
 
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Failing-to-successfully-import-international-characters-via-DIH-tp1753190p1753190.html
 Sent from the Solr - User mailing list archive at
 Nabble.com.



RE: How to index long words with StandardTokenizerFactory?

2010-10-22 Thread Steven A Rowe
Hi Sergey,

I've opened an issue to add a maxTokenLength param to the 
StandardTokenizerFactory configuration:

https://issues.apache.org/jira/browse/SOLR-2188

I'll work on it this weekend.

Are you using Solr 1.4.1?  I ask because of your mention of Lucene 2.9.3.  I'm 
not sure there will ever be a Solr 1.4.2 release.  I plan on targeting Solr 3.1 
and 4.0 for the SOLR-2188 fix.

I'm not sure why you didn't get the results you wanted with your Lucene hack - 
is it possible you have other Lucene jars in your Solr classpath?

Steve

 -Original Message-
 From: Sergey Bartunov [mailto:sbos@gmail.com]
 Sent: Friday, October 22, 2010 12:08 PM
 To: solr-user@lucene.apache.org
 Subject: How to index long words with StandardTokenizerFactory?
 
 I'm trying to force solr to index words which length is more than 255
 symbols (this constant is DEFAULT_MAX_TOKEN_LENGTH in lucene
 StandardAnalyzer.java) using StandardTokenizerFactory as 'filter' tag
 in schema configuration XML. Specifying the maxTokenLength attribute
 won't work.
 
 I'd tried to make the dirty hack: I downloaded lucene-core-2.9.3 src
 and changed the DEFAULT_MAX_TOKEN_LENGTH to 100, built it to jar
 and replaced original lucene-core jar in solr /lib. But seems like
 that it had bring no effect.


Confusion about entities and documents

2010-10-22 Thread Olson, Ron
Hi all-

I've been checking the online docs about this, but I haven't found a suitable 
explanation about how entities and sub-entities work within a document. I am 
loading records from a SQL database and everything seems to be getting 
flattened in a way I was not expecting.

For example, I have a document that defines, say, engine. The engine is made 
up of parts, which are manufactured by various companies. A hypothetical, 
abbreviated config would be:

document name=engines
entity name=engine query=select id, name, desc from engines
entity name=parts query = select part_id, part_name from 
parts where engine_id = ${engine.id}
entity name=parts_manu query=select manu_name from 
manufacturer where id = ${parts.part_id}
...
/entity
/entity
/entity
/document

What I get when I search for, say, XYZ, is a document that has XYZ Corp as a 
manufacturer name, but the array of parts_manu appears to be a child of the 
document, not the parts array.

Is this the correct behavior, insofar as a document has a single level of 
elements, and that's it? If so, what might be a better strategy for being able 
to maintain the hierarchy of information within a document?

Thanks for any info,

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Re: A bug in ComplexPhraseQuery ?

2010-10-22 Thread Ahmet Arslan
 In my opinion, ordering term in a proximity search does not
 make sense!
 So the work around for us is to generate the opposite
 search every time a
 proximity operator is used.
 not very elegant!

If you want I can make it configurable. You can define your choice in 
solrconfig.xml like this:

queryParser name=complexphrase 
class=org.apache.solr.search.ComplexPhraseQParserPlugin
bool name=inOrderfalse/bool
  /queryParser


  


Re: SolrJ addField with Reader

2010-10-22 Thread Bojan Vukojevic
Is there an example of how to use ContentStreamBase.FileStream from SolrJ
during indexing to reduce memory footprint? Using addField is requiring a
string. The only example I could find in JUnits is below and does not show
indexing...

thx!

*public* *void* testFileStream() *throws* IOException {
46:*File*
http://www.java2s.com/Open-Source/Java-Document/6.0-JDK-Core/io-nio/java/io/File.java.htm
file = *new* *File*
http://www.java2s.com/Open-Source/Java-Document/6.0-JDK-Core/io-nio/java/io/File.java.htm(README);
47:assertTrue(file.exists()); // make sure you are
running from: solr\src\test\test-files
48:
49:*ContentStreamBase*
http://www.java2s.com/Open-Source/Java-Document/Search-Engine/apache-solr-1.2.0/org/apache/solr/util/ContentStreamBase.java.htm
stream = *new* *ContentStreamBase.FileStream*
http://www.java2s.com/Open-Source/Java-Document/Search-Engine/apache-solr-1.2.0/org/apache/solr/util/ContentStreamBase.java.htm(
50:file);
51:assertEquals(file.length(), stream.getSize().intValue());
52:assertTrue(IOUtils.contentEquals(*new*
*FileInputStream*
http://www.java2s.com/Open-Source/Java-Document/6.0-JDK-Core/io-nio/java/io/FileInputStream.java.htm(file),
53:stream.getStream()));
54:assertTrue(IOUtils.contentEquals(*new* *FileReader*
http://www.java2s.com/Open-Source/Java-Document/6.0-JDK-Core/io-nio/java/io/FileReader.java.htm(file),
stream
55:.getReader()));
56:}


Re: How to index long words with StandardTokenizerFactory?

2010-10-22 Thread Sergey Bartunov
I'm using Solr 1.4.1. Now I'm successed with replacing lucene-core jar
but maxTokenValue seems to be used in very strange way. Currenty for
me it's set to 1024*1024, but I couldn't index a field with just size
of ~34kb. I understand that it's a little weird to index such a big
data, but I just want to know it doesn't work

On 22 October 2010 20:36, Steven A Rowe sar...@syr.edu wrote:
 Hi Sergey,

 I've opened an issue to add a maxTokenLength param to the 
 StandardTokenizerFactory configuration:

        https://issues.apache.org/jira/browse/SOLR-2188

 I'll work on it this weekend.

 Are you using Solr 1.4.1?  I ask because of your mention of Lucene 2.9.3.  
 I'm not sure there will ever be a Solr 1.4.2 release.  I plan on targeting 
 Solr 3.1 and 4.0 for the SOLR-2188 fix.

 I'm not sure why you didn't get the results you wanted with your Lucene hack 
 - is it possible you have other Lucene jars in your Solr classpath?

 Steve

 -Original Message-
 From: Sergey Bartunov [mailto:sbos@gmail.com]
 Sent: Friday, October 22, 2010 12:08 PM
 To: solr-user@lucene.apache.org
 Subject: How to index long words with StandardTokenizerFactory?

 I'm trying to force solr to index words which length is more than 255
 symbols (this constant is DEFAULT_MAX_TOKEN_LENGTH in lucene
 StandardAnalyzer.java) using StandardTokenizerFactory as 'filter' tag
 in schema configuration XML. Specifying the maxTokenLength attribute
 won't work.

 I'd tried to make the dirty hack: I downloaded lucene-core-2.9.3 src
 and changed the DEFAULT_MAX_TOKEN_LENGTH to 100, built it to jar
 and replaced original lucene-core jar in solr /lib. But seems like
 that it had bring no effect.



Re: A bug in ComplexPhraseQuery ?

2010-10-22 Thread Ahmet Arslan
 queryParser name=complexphrase
 class=org.apache.solr.search.ComplexPhraseQParserPlugin
     bool
 name=inOrderfalse/bool
   /queryParser
 

I added this change to SOLR-1604, can you test it give us feedback?





Re: Using different schemas when syncing with PostgreSQL and DIH

2010-10-22 Thread Shawn Heisey

On 10/22/2010 10:06 AM, Juan Manuel Alvarez wrote:

My question is:
Every time I do an import operation (delta or full) with DIH, I only
need to sync the index with one schema only, so... is there a way to
pass a custom parameter with the schema name to DIH so I can build the
query with the corresponding schema name?


Yes, there is.  Below is the latest version of my dih config used with a 
MySQL database.  I've got almost everything in the SELECT statement 
specified by the input URL, which gets built using the following template:


http://HOST:PORT/solr/CORE/dataimport?command=COMMANDdbHost=DBHOSTdbSchema=DBSCHEMAdataTable=DATATABLEsgTable=SGTABLEnumShards=NUMSHARDSmodVal=MODVALminDid=MINDIDmaxDid=MAXDID

dataConfig
dataSource type=JdbcDataSource
driver=com.mysql.jdbc.Driver
encoding=UTF-8

url=jdbc:mysql://${dataimporter.request.dbServer}:3306/${dataimporter.request.dbSchema}?zeroDateTimeBehavior=convertToNull

batchSize=-1
user=removed
password=removed/
document
entity name=dataTable pk=did
  query=
SELECT d.*,FROM_UNIXTIME(d.post_date) AS pd,
  s.search_group_map AS sg
FROM ${dataimporter.request.dataTable} d
  LEFT JOIN ${dataimporter.request.sgTable} s
  ON d.feature=s.featurecode
WHERE did gt; ${dataimporter.request.minDid}
  AND did lt;= ${dataimporter.request.maxDid}
  AND (did % ${dataimporter.request.numShards})
IN (${dataimporter.request.modVal})
GROUP BY d.did
  deltaImportQuery=
SELECT d.*,FROM_UNIXTIME(d.post_date) AS pd,
  s.search_group_map AS sg
FROM ${dataimporter.request.dataTable} d
  LEFT JOIN ${dataimporter.request.sgTable} s
  ON d.feature=s.featurecode
WHERE did gt; ${dataimporter.request.minDid}
  AND did lt;= ${dataimporter.request.maxDid}
  AND (did % ${dataimporter.request.numShards})
IN (${dataimporter.request.modVal})
GROUP BY d.did
  deltaQuery=SELECT MAX(d.did) FROM 
${dataimporter.request.dataTable} d


!-- That lone angle bracket looks wrong, but it's not. --
field column=search_group splitBy=; */
/entity
/document
/dataConfig



Re: Using different schemas when syncing with PostgreSQL and DIH

2010-10-22 Thread Juan Manuel Alvarez
Thank you Shawn! That was exactly what I was looking for! =o)

On Fri, Oct 22, 2010 at 4:29 PM, Shawn Heisey s...@elyograg.org wrote:
 On 10/22/2010 10:06 AM, Juan Manuel Alvarez wrote:

 My question is:
 Every time I do an import operation (delta or full) with DIH, I only
 need to sync the index with one schema only, so... is there a way to
 pass a custom parameter with the schema name to DIH so I can build the
 query with the corresponding schema name?

 Yes, there is.  Below is the latest version of my dih config used with a
 MySQL database.  I've got almost everything in the SELECT statement
 specified by the input URL, which gets built using the following template:

 http://HOST:PORT/solr/CORE/dataimport?command=COMMANDdbHost=DBHOSTdbSchema=DBSCHEMAdataTable=DATATABLEsgTable=SGTABLEnumShards=NUMSHARDSmodVal=MODVALminDid=MINDIDmaxDid=MAXDID

 dataConfig
 dataSource type=JdbcDataSource
    driver=com.mysql.jdbc.Driver
    encoding=UTF-8

  url=jdbc:mysql://${dataimporter.request.dbServer}:3306/${dataimporter.request.dbSchema}?zeroDateTimeBehavior=convertToNull
    batchSize=-1
    user=removed
    password=removed/
 document
 entity name=dataTable pk=did
      query=
        SELECT d.*,FROM_UNIXTIME(d.post_date) AS pd,
          s.search_group_map AS sg
        FROM ${dataimporter.request.dataTable} d
          LEFT JOIN ${dataimporter.request.sgTable} s
          ON d.feature=s.featurecode
        WHERE did gt; ${dataimporter.request.minDid}
          AND did lt;= ${dataimporter.request.maxDid}
          AND (did % ${dataimporter.request.numShards})
            IN (${dataimporter.request.modVal})
        GROUP BY d.did
      deltaImportQuery=
        SELECT d.*,FROM_UNIXTIME(d.post_date) AS pd,
          s.search_group_map AS sg
        FROM ${dataimporter.request.dataTable} d
          LEFT JOIN ${dataimporter.request.sgTable} s
          ON d.feature=s.featurecode
        WHERE did gt; ${dataimporter.request.minDid}
          AND did lt;= ${dataimporter.request.maxDid}
          AND (did % ${dataimporter.request.numShards})
            IN (${dataimporter.request.modVal})
        GROUP BY d.did
      deltaQuery=SELECT MAX(d.did) FROM ${dataimporter.request.dataTable} d

 !-- That lone angle bracket looks wrong, but it's not. --
 field column=search_group splitBy=; */
 /entity
 /document
 /dataConfig




Re: how well does multicore scale?

2010-10-22 Thread mike anderson
Thanks for the advice, everyone. I'll take a look at the API mentioned and
do some benchmarking over the weekend.

-Mike


On Fri, Oct 22, 2010 at 8:50 AM, Mark Miller markrmil...@gmail.com wrote:

 On 10/22/10 1:44 AM, Tharindu Mathew wrote:
  Hi Mike,
 
  I've also considered using a separate cores in a multi tenant
  application, ie a separate core for each tenant/domain. But the cores
  do not suit that purpose.
 
  If you check out documentation no real API support exists for this so
  it can be done dynamically through SolrJ. And all use cases I found,
  only had users configuring it statically and then using it. That was
  maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks.

 You can dynamically manage cores with solrj. See
 org.apache.solr.client.solrj.request.CoreAdminRequest's static methods
 for a place to start.

 You probably want to turn solr.xml's persist option on so that your
 cores survive restarts.

 
  So your better off using a single index and with a user id and use a
  query filter with the user id when fetching data.

 Many times this is probably the case - pro's and con's to each depending
 on what you are up to.

 - Mark
 lucidimagination.com

 
  On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind rochk...@jhu.edu
 wrote:
  No, it does not seem reasonable.  Why do you think you need a seperate
 core
  for every user?
  mike anderson wrote:
 
  I'm exploring the possibility of using cores as a solution to bookmark
  folders in my solr application. This would mean I'll need tens of
  thousands
  of cores... does this seem reasonable? I have plenty of CPUs available
 for
  scaling, but I wonder about the memory overhead of adding cores (aside
  from
  needing to fit the new index in memory).
 
  Thoughts?
 
  -mike
 
 
 
 
 
 




Re: Date faceting +1MONTH problem

2010-10-22 Thread Yonik Seeley
On Fri, Sep 17, 2010 at 9:51 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
  the default query parser
 doesn't support range queries with mixed upper/lower bound inclusion.

This has just been added to trunk.
Things like [0 TO 100} now work.

-Yonik
http://www.lucidimagination.com


Re: Confusion about entities and documents

2010-10-22 Thread harrysmith

What I get when I search for, say, XYZ, is a document that has XYZ Corp as
a manufacturer name, but the array of parts_manu appears to be a child of
the document, not the parts array. 

Is this the correct behavior, insofar as a document has a single level of
elements, and that's it? If so, what might be a better strategy for being
able to maintain the hierarchy of information within a document? 


Yes, this is the correct behavior. I still struggle with the same issue, and
there is no 'best practices' (that I have found at least) of maintaining
relationships within a Solr doc. The argument is Solr is not the correct
place for these representations and should only represent a flat version of
your document.

For a similar question see: 
http://lucene.472066.n3.nabble.com/Schema-Definition-Question-td1049966.html#a1105593

A few possible solutions are posted there, and i'm interested in how others
have tackled this issue.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Confusion-about-entities-and-documents-tp1753926p1755152.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Confusion about entities and documents

2010-10-22 Thread Olson, Ron
Hmm, okay, I guess I wasn't taking the hierarchy-flattening aspect of Solr 
seriously enough. :)

Based on your reply from the other thread, I guess the best solution, as far as 
I can tell, is to maintain the multiple value lists and take advantage of the 
fact that the arrays will always be in the right order:

arr name=manu_id
int1/int
int2/int
/arr
arr name=manu_name
strABC Corp/str !-- ID should be 1, right? --
strXYZ Inc/str  !-- Should be 2 --
/arr

So I guess the problem isn't really *sooo* bad...I just need to make sure that 
I have the appropriate names defined so I can link between two arrays in my 
client code. I suppose I could keep things straight by preserving the hierarchy 
within the name attribute.



-Original Message-
From: harrysmith [mailto:harrysmith...@gmail.com]
Sent: Friday, October 22, 2010 4:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Confusion about entities and documents


What I get when I search for, say, XYZ, is a document that has XYZ Corp as
a manufacturer name, but the array of parts_manu appears to be a child of
the document, not the parts array.

Is this the correct behavior, insofar as a document has a single level of
elements, and that's it? If so, what might be a better strategy for being
able to maintain the hierarchy of information within a document?


Yes, this is the correct behavior. I still struggle with the same issue, and
there is no 'best practices' (that I have found at least) of maintaining
relationships within a Solr doc. The argument is Solr is not the correct
place for these representations and should only represent a flat version of
your document.

For a similar question see:
http://lucene.472066.n3.nabble.com/Schema-Definition-Question-td1049966.html#a1105593

A few possible solutions are posted there, and i'm interested in how others
have tackled this issue.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Confusion-about-entities-and-documents-tp1753926p1755152.html
Sent from the Solr - User mailing list archive at Nabble.com.


DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Question about gaze

2010-10-22 Thread Lingley, Dean R
Anyone have an idea about the error below.  Installed gaze and it works ok, 
then when trying to import after installing received the following error.  
Commented out the line in solrconfig.xml and it imports fine again, but now 
gaze

Any ideas?

Thanks,
Dean


./import-marc.sh /home/filemove/vufindextract.mrc
/var/solr/solr /var/solr
Now Importing /home/filemove/vufindextract.mrc ...
/usr/java/latest/bin/java -Xms2048m -Xmx4096m -XX:+UseParallelGC -XX:NewRatio=5 
-XX:-PrintGC -XX:-PrintGCDetails -XX:-PrintGCTimeStamps 
-Dsolrmarc.solr.war.path=/var/solr/solr/jetty/webapps/solr.war 
-Dsolr.core.name=biblio -Dsolrmarc.path=/var/solr/import 
-Dsolr.path=/var/solr/solr -Dsolr.solr.home=/var/solr/solr -jar 
/var/solr/import/SolrMarc.jar /var/solr/import/import.properties 
/home/filemove/vufindextract.mrc
INFO [main] (MarcImporter.java:769) - Starting SolrMarc indexing.
INFO [main] (Utils.java:189) - Opening file: /var/solr/import/import.properties
INFO [main] (MarcHandler.java:325) - Attempting to open data file: 
/home/filemove/vufindextract.mrc
INFO [main] (MarcImporter.java:618) -  Updating to Solr index at /var/solr/solr
INFO [main] (MarcImporter.java:634) -  Using Solr core biblio
INFO [main] (SolrCoreLoader.java:102) - Using the data directory of: 
/var/solr/solr/biblio
INFO [main] (SolrCoreLoader.java:104) - Using the multicore schema file at : 
/var/solr/solr/solr.xml
INFO [main] (SolrCoreLoader.java:105) - Using the biblio core
Oct 22, 2010 5:34:39 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NoClassDefFoundError: 
com/lucidimagination/gaze/shared/GazeStorage
at 
com.lucidimagination.gaze.plugin.StatMonitor.init(StatMonitor.java:120)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at java.lang.Class.newInstance0(Class.java:355)
at java.lang.Class.newInstance(Class.java:308)
at 
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:398)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:425)
at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:449)
at 
org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:152)
at org.apache.solr.core.SolrCore.init(SolrCore.java:556)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:428)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278)
at org.apache.solr.core.CoreContainer.init(CoreContainer.java:181)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.solrmarc.solr.SolrCoreLoader.loadCore(SolrCoreLoader.java:110)
at org.solrmarc.marc.MarcImporter.getSolrProxy(MarcImporter.java:635)
at 
org.solrmarc.marc.MarcImporter.loadLocalProperties(MarcImporter.java:173)
at org.solrmarc.marc.MarcHandler.init(MarcHandler.java:112)
at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:775)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.simontuffs.onejar.Boot.run(Boot.java:334)
at com.simontuffs.onejar.Boot.main(Boot.java:170)
Caused by: java.lang.ClassNotFoundException: 
com.lucidimagination.gaze.shared.GazeStorage
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 30 more

Oct 22, 2010 5:34:39 PM org.apache.solr.core.SolrCore finalize
SEVERE: REFCOUNT ERROR: unreferenced org.apache.solr.core.solrc...@4a5f2db0 
(biblio) has a reference count of 1
Error: Problem creating updateHandler in SolrCoreProxy
ERROR [main] (MarcImporter.java:310) - Error indexing record: 8188 -- Error: 
Problem creating updateHandler in SolrCoreProxy
org.solrmarc.solr.SolrRuntimeException: Error: Problem creating updateHandler 
in SolrCoreProxy

Re: Date faceting +1MONTH problem

2010-10-22 Thread Shawn Heisey

On 10/22/2010 3:01 PM, Yonik Seeley wrote:

On Fri, Sep 17, 2010 at 9:51 PM, Chris Hostetter
hossman_luc...@fucit.org  wrote:

  the default query parser
doesn't support range queries with mixed upper/lower bound inclusion.

This has just been added to trunk.
Things like [0 TO 100} now work.


Awesome!  Is it easily ported back to branch_3x?

Shawn



Solr ExtractingRequestHandler with Compressed files

2010-10-22 Thread Joey Hanzel
Hi,

Has anyone had success using ExtractingRequestHandler and Tika with any of
the compressed file formats (zip, tar, gz, etc) ?

I am sending solr the archived.tar file using curl. curl 
http://localhost:8983/solr/update/extract?literal.id=doc1fmap.content=body_textscommit=true;
-H 'Content-type:application/octet-stream' --data-binary
@/home/archived.tar
The result I get when I query the document is that the filenames inside the
archive are indexed as the body_texts, but the content of those files is
not extracted or included.  This is not the behvior I expected. Ref:
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika#article.tika.example.
When I send 1 of the actual documents inside the archive using the same curl
command the extracted content is then stored in the body_texts field.  Am
I missing a step for the compressed files?

I have added all the extraction depednenices as indicated by mat in
http://outoftime.lighthouseapp.com/projects/20339/tickets/98-solr-cell and
am able to succesfully extract data from MS Word, PDF, HTML documents.

I'm using the following library versions.
  Solr 1.40,  Solr Cell 1.4.1, with Tika Core 0.4

Given everything I have read this version of Tika should support extracting
data from all files within a compressed file.  Any help or suggestions would
be appreciated.


Re: Date faceting +1MONTH problem

2010-10-22 Thread Yonik Seeley
On Fri, Oct 22, 2010 at 6:02 PM, Shawn Heisey s...@elyograg.org wrote:
 On 10/22/2010 3:01 PM, Yonik Seeley wrote:

 On Fri, Sep 17, 2010 at 9:51 PM, Chris Hostetter
 hossman_luc...@fucit.org  wrote:

  the default query parser
 doesn't support range queries with mixed upper/lower bound inclusion.

 This has just been added to trunk.
 Things like [0 TO 100} now work.

 Awesome!  Is it easily ported back to branch_3x?

Between the refactoring work on the QP, and the back compat concerns,
it's not trivial.

-Yonik
http://www.lucidimagination.com


RE: How to index long words with StandardTokenizerFactory?

2010-10-22 Thread Steven A Rowe
Hi Sergey,

What does your ~34kb field value look like?  Does StandardTokenizer think it's 
just one token?

What doesn't work?  What happens?

Steve

 -Original Message-
 From: Sergey Bartunov [mailto:sbos@gmail.com]
 Sent: Friday, October 22, 2010 3:18 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How to index long words with StandardTokenizerFactory?
 
 I'm using Solr 1.4.1. Now I'm successed with replacing lucene-core jar
 but maxTokenValue seems to be used in very strange way. Currenty for
 me it's set to 1024*1024, but I couldn't index a field with just size
 of ~34kb. I understand that it's a little weird to index such a big
 data, but I just want to know it doesn't work
 
 On 22 October 2010 20:36, Steven A Rowe sar...@syr.edu wrote:
  Hi Sergey,
 
  I've opened an issue to add a maxTokenLength param to the
 StandardTokenizerFactory configuration:
 
         https://issues.apache.org/jira/browse/SOLR-2188
 
  I'll work on it this weekend.
 
  Are you using Solr 1.4.1?  I ask because of your mention of Lucene
 2.9.3.  I'm not sure there will ever be a Solr 1.4.2 release.  I plan on
 targeting Solr 3.1 and 4.0 for the SOLR-2188 fix.
 
  I'm not sure why you didn't get the results you wanted with your Lucene
 hack - is it possible you have other Lucene jars in your Solr classpath?
 
  Steve
 
  -Original Message-
  From: Sergey Bartunov [mailto:sbos@gmail.com]
  Sent: Friday, October 22, 2010 12:08 PM
  To: solr-user@lucene.apache.org
  Subject: How to index long words with StandardTokenizerFactory?
 
  I'm trying to force solr to index words which length is more than 255
  symbols (this constant is DEFAULT_MAX_TOKEN_LENGTH in lucene
  StandardAnalyzer.java) using StandardTokenizerFactory as 'filter' tag
  in schema configuration XML. Specifying the maxTokenLength attribute
  won't work.
 
  I'd tried to make the dirty hack: I downloaded lucene-core-2.9.3 src
  and changed the DEFAULT_MAX_TOKEN_LENGTH to 100, built it to jar
  and replaced original lucene-core jar in solr /lib. But seems like
  that it had bring no effect.
 


Re: xpath processing

2010-10-22 Thread pghorpade

Quoting pghorp...@ucla.edu:
Can someone help me please?


I am trying to import mods xml data in solr using  the xml/http datasource

This does not work with XPathEntityProcessor of the data import handler
xpath=/mods/name/namepa...@type = 'date']

I actually have 143 records with type attribute as 'date' for  
element namePart.


Thank you
Parinita






Re: how well does multicore scale?

2010-10-22 Thread Lance Norskog
http://wiki.apache.org/solr/CoreAdmin

Since Solr 1.3

On Fri, Oct 22, 2010 at 1:40 PM, mike anderson saidthero...@gmail.com wrote:
 Thanks for the advice, everyone. I'll take a look at the API mentioned and
 do some benchmarking over the weekend.

 -Mike


 On Fri, Oct 22, 2010 at 8:50 AM, Mark Miller markrmil...@gmail.com wrote:

 On 10/22/10 1:44 AM, Tharindu Mathew wrote:
  Hi Mike,
 
  I've also considered using a separate cores in a multi tenant
  application, ie a separate core for each tenant/domain. But the cores
  do not suit that purpose.
 
  If you check out documentation no real API support exists for this so
  it can be done dynamically through SolrJ. And all use cases I found,
  only had users configuring it statically and then using it. That was
  maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks.

 You can dynamically manage cores with solrj. See
 org.apache.solr.client.solrj.request.CoreAdminRequest's static methods
 for a place to start.

 You probably want to turn solr.xml's persist option on so that your
 cores survive restarts.

 
  So your better off using a single index and with a user id and use a
  query filter with the user id when fetching data.

 Many times this is probably the case - pro's and con's to each depending
 on what you are up to.

 - Mark
 lucidimagination.com

 
  On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind rochk...@jhu.edu
 wrote:
  No, it does not seem reasonable.  Why do you think you need a seperate
 core
  for every user?
  mike anderson wrote:
 
  I'm exploring the possibility of using cores as a solution to bookmark
  folders in my solr application. This would mean I'll need tens of
  thousands
  of cores... does this seem reasonable? I have plenty of CPUs available
 for
  scaling, but I wonder about the memory overhead of adding cores (aside
  from
  needing to fit the new index in memory).
 
  Thoughts?
 
  -mike
 
 
 
 
 
 






-- 
Lance Norskog
goks...@gmail.com


Re: xpath processing

2010-10-22 Thread Ken Stanley
Parinita,

In its simplest form, what does your entity definition for DIH look like;
also, what does one record from your xml look like? We need more information
before we can really be of any help. :)

- Ken

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
-- Douglas Adams, The Hitchhikers Guide to the Galaxy


On Fri, Oct 22, 2010 at 8:00 PM, pghorp...@ucla.edu wrote:

 Quoting pghorp...@ucla.edu:
 Can someone help me please?


 I am trying to import mods xml data in solr using  the xml/http datasource

 This does not work with XPathEntityProcessor of the data import handler
 xpath=/mods/name/namepa...@type = 'date']

 I actually have 143 records with type attribute as 'date' for element
 namePart.

 Thank you
 Parinita






Re: xpath processing

2010-10-22 Thread pghorpade



dataConfig
dataSource name=myfilereader type=FileDataSource/
document
entity name=f rootEntity=false dataSource=null  
processor=FileListEntityProcessor fileName=.*xml recursive=true  
baseDir=C:\data\sample_records\mods\starr
entity name=x dataSource=myfilereader  
processor=XPathEntityProcessor url=${f.fileAbsolutePath}  
stream=false forEach=/mods  
transformer=DateFormatTransformer,RegexTransformer,TemplateTransformer

field column=id template=${f.file}/
field column=collectionKey template=starr/
field column=collectionName template=starr/
field column=fileAbsolutePath template=${f.fileAbsolutePath}/
field column=fileName template=${f.file}/
field column=fileSize template=${f.fileSize}/
field column=fileLastModified template=${f.fileLastModified}/
field column=classification_keyword xpath=/mods/classification/
field column=accessCondition_keyword xpath=/mods/accessCondition/
field column=nameNamePart_s xpath=/mods/name/namepa...@type = 'date']
/
/entity
/entity
/document
/dataConfig

Quoting Ken Stanley doh...@gmail.com:


Parinita,

In its simplest form, what does your entity definition for DIH look like;
also, what does one record from your xml look like? We need more information
before we can really be of any help. :)

- Ken

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
-- Douglas Adams, The Hitchhikers Guide to the Galaxy


On Fri, Oct 22, 2010 at 8:00 PM, pghorp...@ucla.edu wrote:


Quoting pghorp...@ucla.edu:
Can someone help me please?



I am trying to import mods xml data in solr using  the xml/http datasource

This does not work with XPathEntityProcessor of the data import handler
xpath=/mods/name/namepa...@type = 'date']

I actually have 143 records with type attribute as 'date' for element
namePart.

Thank you
Parinita