Re: Schema Lint

2013-08-06 Thread Andy Lester

On Aug 6, 2013, at 9:55 AM, Steven Bower smb-apa...@alcyon.net wrote:

 Is there an easy way in code / command line to lint a solr config (or even
 just a solr schema)?

No, there's not.  I would love there to be one, especially for the DIH.

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: Perl Solr help - doing *:* query

2013-07-09 Thread Andy Lester

On Jul 9, 2013, at 2:48 PM, Shawn Heisey s...@elyograg.org wrote:

 This is primarily to Andy Lester, who wrote the WebService::Solr module
 on CPAN, but I'll take a response from anyone who knows what I can do.
 
 If I use the following Perl code, I get an error.

What error do you get?  Never say I get an error.  Always say I get this 
error: .

  If I try to build
 some other query besides *:* to request all documents, the script runs,
 but the query doesn't do what I asked it to do.

What DOES it do?


 http://apaste.info/3j3Q

For the sake of future readers, please put your code in the message.  This 
message will get archived, and future people reading the lists will not be able 
to read the code at some arbitrary paste site.

Shawn's code is:

use strict;
use WebService::Solr;
use WebService::Solr::Query;
use WebService::Solr::Response;



my $url = http://idx.REDACTED.com:8984/solr/ncmain;;
my $solr = WebService::Solr-new($url);
my $query = WebService::Solr::Query-new(*:*);
my $response = $solr-search($query, {'rows' = '0'});
my $numFound = $response-content-{response}-{numFound};

print nf: $numFound\n;


xoa

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: Solr Security

2013-06-24 Thread Andy Lester

On Jun 24, 2013, at 12:51 AM, Aaron Greenspan aar...@thinkcomputer.com wrote:

  all of them are terrible,

 it looks like you can edit some XML files (if you can find them) 

 The wiki itself is full of semi-useless information, which is pretty 
 infuriating since it's supposed to be the best source.

 Statements like standard Java web security can be added by tuning the 
 container and the Solr web application configuration itself via web.xml are 
 not helpful to me.

  this giant mess,

 It's just common sense.

 Netscape Enterprise Server prompted you to do that a decade and a half ago

  But either way, that's a pretty ridiculous solution.

 I don't know of any other server product that disregards security so 
 willingly.


Why are you wasting your time with such an inferior project?  Perhaps 
ElasticSearch is more to your liking.

xoxo,
Andy

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: Why do FQs make my spelling suggestions so slow?

2013-05-29 Thread Andy Lester

On May 29, 2013, at 9:46 AM, Dyer, James james.d...@ingramcontent.com wrote:

 Just an instanity check, I see I had misspelled maxCollations as 
 maxCollation in my prior response.  When you tested with this set the same 
 as maxCollationTries, did you correct my spelling?

Yes, definitely.

Thanks for the ticket.  I am looking at the effects of turning on 
spellcheck.onlyMorePopular to true, which reduces the number of collations it 
seems to do, but doesn't affect the underlying question of is the spellchecker 
doing FQs properly?

Thanks,
Andy

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Why do FQs make my spelling suggestions so slow?

2013-05-28 Thread Andy Lester
I'm working on using spellcheck for giving suggestions, and collations
are giving me good results, but they turn out to be very slow if
my original query has any FQs in it.  We can do 100 maxCollationTries
in no time at all, but if there are FQs in the query, things get
very slow.  As maxCollationTries and the count of FQs increase,
things get very slow very quickly.

 1102050   100 MaxCollationTries
0FQs 8 9101110
1FQ 11   160   599  1597  1668
2FQs20   346  1163  3360  3361
3FQs29   474  1852  5039  5095
4FQs36   589  2463  6797  6807

All times are QTimes of ms.

See that top row?  With no FQs, 50 MaxCollationTries comes back
instantly.  Add just one FQ, though, and things go bad, and they
get worse as I add more of the FQs.  Also note that things seem to
level off at 100 MaxCollationTries.

Here's a query that I've been using as a test:

df=title_tracings_t
fl=flrid,nodeid,title_tracings_t
q=bagdad+AND+diaries+AND+-parent_tracings:(bagdad+AND+diaries)
spellcheck.q=bagdad+AND+diaries
rows=4
wt=xml
sort=popular_score+desc,+grouping+asc,+copyrightyear+desc,+flrid+asc
spellcheck=true
spellcheck.dictionary=direct
spellcheck.onlyMorePopular=false
spellcheck.count=15
spellcheck.extendedResults=false
spellcheck.collate=true
spellcheck.maxCollations=10
spellcheck.maxCollationTries=50
spellcheck.collateExtendedResults=true
spellcheck.alternativeTermCount=5
spellcheck.maxResultsForSuggest=10
debugQuery=off
fq=((grouping:1+OR+grouping:2+OR+grouping:3)+OR+solrtype:N)
fq=((item_source:F+OR+item_source:B+OR+item_source:M)+OR+solrtype:N)
fq={!tag%3Dgrouping}((grouping:1+OR+grouping:2)+OR+solrtype:N)
fq={!tag%3Dlanguagecode}(languagecode:eng+OR+solrtype:N)

The only thing that changes between tests is the value of
spellcheck.maxCollationTries and how many FQs are at the end.

Am I doing something wrong?  Do the collation internals not handle
FQs correctly?  The lookup/hit counts on filterCache seem to be
increasing just fine.  It will do N lookups, N hits, so I'm not
thinking that caching is the problem.

We'd really like to be able to use the spellchecker but the results
with only 10-20 maxCollationTries aren't nearly as good as if we
can bump that up to 100, but we can't afford the slow response time.
We also can't do without the FQs.

Thanks,
Andy


--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: Why do FQs make my spelling suggestions so slow?

2013-05-28 Thread Andy Lester
Thanks for looking at this.

 What are the QTimes for the 0fq,1fq,2fq,4fq  4fq cases with spellcheck 
 entirely turned off?  Is it about (or a little more than) half the total when 
 maxCollationTries=1 ?

With spellcheck off I get 8ms for 4fq query.


  Also, with the varying # of fq's, how many collation tries does it take to 
 get 10 collations?

I don't know.  How can I tell?


 Possibly, a better way to test this is to set maxCollations = 
 maxCollationTries.  The reason is that it quits trying once it finds 
 maxCollations, so if with 0fq's, lots of combinations can generate hits and 
 it doesn't need to try very many to get to 10.  But with more fq's, fewer 
 collations will pan out so now it is trying more up to 100 before (if ever) 
 it gets to 10.

It does just fine doing 100 collations so long as there are no FQs.  It seems 
to me that the FQs are taking an inordinate amount of extra time.  100 
collations in (roughly) the same amount of time as a single collation, so long 
as there are no FQs.  Why are the FQs such a drag on the collation process?


 (I'm assuming you have all non-search components like faceting turned off).

Yes, definitely.


  So say with 2fq's it takes 10ms for the query to complete with spellcheck 
 off, and 20ms with maxCollation = maxCollationTries = 1, then it will take 
 about 110ms with maxCollation = maxCollationTries = 10.

I can do maxCollation = maxCollationTries = 100 and it comes back in 14ms, so 
long as I have FQs off.  Add a single FQ and it becomes 13499ms.

I can do maxCollation = maxCollationTries = 1000 and it comes back in 45ms, so 
long as I have FQs off.  Add a single FQ and it becomes 62038ms.


 But I think you're just setting maxCollationTries too high.  You're asking it 
 to do too much work in trying teens of combinations.

The results I get back with 100 tries are about twice as many as I get with 10 
tries.  That's a big difference to the user where it's trying to figure 
misspelled phrases.

Andy

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: Any estimation for solr 4.3?

2013-05-02 Thread Andy Lester

On May 2, 2013, at 3:36 AM, Jack Krupansky j...@basetechnology.com wrote:

 RC4 of 4.3 is available now. The final release of 4.3 is likely to be within 
 days.


How can I see the Changelog of what will be in it?

Thanks,
xoa

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: Any estimation for solr 4.3?

2013-05-02 Thread Andy Lester

On May 2, 2013, at 9:03 AM, Yago Riveiro yago.rive...@gmail.com wrote:

 The road map has this release note, but I think that most of it will be move 
 to 4.3.1 or 4.4
 
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310230version=12324128
  

So, is there a way I can see what is currently pending to go in 4.3?

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: Any estimation for solr 4.3?

2013-05-02 Thread Andy Lester

On May 2, 2013, at 9:11 AM, Yago Riveiro yago.rive...@gmail.com wrote:

 In attachment the change log of solr 4.3 RC3 
 


And where would I find that?  I don't see anything at 
http://lucene.apache.org/solr/downloads.html to download?  Do I need to check 
out Subversion repo?  Is there a page somewhere that describes the process set 
up?

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: Any estimation for solr 4.3?

2013-05-02 Thread Andy Lester

On May 2, 2013, at 9:20 AM, Alexandre Rafalovitch arafa...@gmail.com wrote:

 Hopefully, this is not a secret, but the RCs are built and available
 for download and announced on the dev mailing list.


Thanks for the link.

I don't think it's a secret, but I sure don't see anything that says This is 
how the dev process works.

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: Solr indexing

2013-04-18 Thread Andy Lester

On Apr 18, 2013, at 10:49 AM, hassancrowdc hassancrowdc...@gmail.com wrote:

 Solr is not showing the dates i have in database. any help? is solr following
 any specific timezone? On my database my date is 2013-04-18 11:29:33 but
 solr shows me 2013-04-18T15:29:33Z.   Any help


Solr knows nothing of timezones.  Solr expects everything is in UTC.  If you 
want time zone support, you'll have to convert local time to UTC before 
importing, and then convert back to local time from UTC when you read from Solr.

xoa

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: [ANNOUNCE] Solr wiki editing change

2013-03-28 Thread Andy Lester

On Mar 24, 2013, at 10:18 PM, Steve Rowe sar...@gmail.com wrote:

 The wiki at http://wiki.apache.org/solr/ has come under attack by spammers 
 more frequently of late, so the PMC has decided to lock it down in an attempt 
 to reduce the work involved in tracking and removing spam.
 
 From now on, only people who appear on 
 http://wiki.apache.org/solr/ContributorsGroup will be able to 
 create/modify/delete wiki pages.
 
 Please request either on the solr-user@lucene.apache.org or on 
 d...@lucene.apache.org to have your wiki username added to the 
 ContributorsGroup page - this is a one-time step.


Please add my username, AndyLester, to the approved editors list.  Thanks.

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: Importing datetime

2013-03-19 Thread Andy Lester

On Mar 19, 2013, at 12:04 PM, Spadez james_will...@hotmail.com wrote:

 This is the datetime format SOLR requires as I understand it:
 
 1995-12-31T23:59:59Z
 
 When I try to store this as a datetime field in MySQL it says it isn't
 valid. My question is, ideally I would want to keep a datetime in my
 database so I can sort by date rather than just making it a varchar, so I
 would store it like this:
 
 1995-12-31 23:59:59 
 
 Can import date in this format into SOLR from MySQL?

Yes.  Don't change the storage type of your column in MySQL.  Changing to 
VARCHAR would be sad.

What you'll need to do is use a date formatting function in your SELECT out of 
the MySQL database to get the date into the format that MySQL likes.

See 
https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_date-format
 

xoa


--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: How can I limit my Solr search to an arbitrary set of 100,000 documents?

2013-03-12 Thread Andy Lester

On Mar 12, 2013, at 1:21 PM, Chris Hostetter hossman_luc...@fucit.org wrote:

 How are these sets of flrids created/defined?  (undertsanding the source 
 of the filter information may help inspire alternative suggestsions, ie: 
 XY Problem)


It sounds like you're looking for patterns that could potentially providing 
groupings for these FLRIDs.  We've been down that road, too, but we don't see 
how there could be one.  The arbitrariness comes from the fact that the lists 
are maintained by users and can be changed at any time.

Each book in the database has an FLRID.  Each user can create lists of books.  
These lists can be modified at any time.  

That looks like this in Oracle:   USER   1-M   LIST   1-M   LISTDETAIL  M - 
1  TITLE

The sizes we're talking about:  tens of thousands of users; hundreds of 
thousands of lists, with up to 100,000 items per list; tens of millions of 
listdetail.

We have a feature that lets the user do a keyword search on books within his 
list.  We can't update the Solr record to keep track of which lists it appears 
on because there may be, say, 20 people every second updating the contents of 
their lists, and those 20 people expect that their next search-within-a-list 
will have those new results.

Andy

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: [Beginner] wants to contribute in open source project

2013-03-11 Thread Andy Lester

On Mar 11, 2013, at 11:14 AM, chandresh pancholi 
chandreshpancholi...@gmail.com wrote:

 I am beginner in this field. It would be great if you help me out. I love
 to code in java.
 can you guys share some link so that i can start contributing in
 solr/lucene project.


This article I wrote about getting started contributing to projects may give 
you some ideas.

http://blog.smartbear.com/software-quality/bid/167051/14-Ways-to-Contribute-to-Open-Source-without-Being-a-Programming-Genius-or-a-Rock-Star

I don't have tasks specifically for the Solr project (does Solr have such a 
list for newcomers to help on?) but I hope that you'll get some ideas.

xoa

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



How can I limit my Solr search to an arbitrary set of 100,000 documents?

2013-03-08 Thread Andy Lester
We've got an 11,000,000-document index.  Most documents have a unique ID called 
flrid, plus a different ID called solrid that is Solr's PK.  For some 
searches, we need to be able to limit the searches to a subset of documents 
defined by a list of FLRID values.  The list of FLRID values can change between 
every search and it will be rare enough to call it never that any two 
searches will have the same set of FLRIDs to limit on.

What we're doing right now is, roughly:

q=title:dogs AND 
(flrid:(123 125 139  34823) OR 
 flrid:(34837 ... 59091) OR 
 ... OR 
 flrid:(101294813 ... 103049934))

Each of those FQs parentheticals can be 1,000 FLRIDs strung together.  We have 
to subgroup to get past Solr's limitations on the number of terms that can be 
ORed together.

The problem with this approach (besides that it's clunky) is that it seems to 
perform O(N^2) or so.  With 1,000 FLRIDs, the search comes back in 50ms or so.  
If we have 10,000 FLRIDs, it comes back in 400-500ms.  With 100,000 FLRIDs, 
that jumps up to about 75000ms.  We want it be on the order of 1000-2000ms at 
most in all cases up to 100,000 FLRIDs.

How can we do this better?

Things we've tried or considered:

* Tried: Using dismax with minimum-match mm:0 to simulate an OR query.  No 
improvement.
* Tried: Putting the FLRIDs into the fq instead of the q.  No improvement.
* Considered: dumping all the FLRIDs for a given search into another core and 
doing a join between it and the main core, but if we do five or ten searches 
per second, it seems like Solr would die from all the commits.  The set of 
FLRIDs is unique between searches so there is no reuse possible.
* Considered: Translating FLRIDs to SolrID and then limiting on SolrID instead, 
so that Solr doesn't have to hit the documents in order to translate 
FLRID-SolrID to do the matching.

What we're hoping for:

* An efficient way to pass a long set of IDs, or for Solr to be able to pull 
them from the app's Oracle database.
* Have Solr do big ORs as a set operation not as (what we assume is) a naive 
one-at-a-time matching.
* A way to create a match vector that gets passed to the query, because strings 
of fqs in the query seems to be a suboptimal way to do it.

I've searched SO and the web and found people asking about this type of 
situation a few times, but no answers that I see beyond what we're doing now.

* 
http://stackoverflow.com/questions/11938342/solr-search-within-subset-defined-by-list-of-keys
* 
http://stackoverflow.com/questions/9183898/searching-within-a-subset-of-data-solr
* 
http://lucene.472066.n3.nabble.com/Filtered-search-for-subset-of-ids-td502245.html
* 
http://lucene.472066.n3.nabble.com/Search-within-a-subset-of-documents-td1680475.html

Thanks,
Andy

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: solr searchHandler/searchComponent for query statistics

2012-12-06 Thread Andy Lester

On Dec 6, 2012, at 9:50 AM, joe.cohe...@gmail.com joe.cohe...@gmail.com 
wrote:

 Is there an out-of-the-box or have anyone already implemented a feature for
 collecting statistics on queries?


What sort of statistics are you talking about?  Are you talking about 
collecting information in aggregate about queries over time?  Or for giving 
statistics about individual queries, like time breakouts for benchmarking?

For the latter, you want debugQuery=true and you get a raft of stats down in 
lst name=debug.

xoa

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: stopwords in solr

2012-11-27 Thread Andy Lester

On Nov 28, 2012, at 12:33 AM, Joe Zhang smartag...@gmail.com wrote:

 that is really strange. so basic stopwords such as a the' are not
 eliminated from the index?

There is no list of basic stopwords anywhere.  If you want stop words, you 
have to put them in the file yourself.  There are not really any sensible 
defaults for stopwords, so Solr doesn't provide them.

Just add them to the stopwords.txt and reindex your core.

xoa

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



How do I best detect when my DIH load is done?

2012-11-19 Thread Andy Lester
A little while back, I needed a way to tell if my DIH load was done, so I made 
up a little Ruby program to query /dih?command=status .  The program is here: 
http://petdance.com/2012/07/a-little-ruby-program-to-monitor-solr-dih-imports/

Is this the best way to do it?  Is there some other tool or interface that I 
should be using instead?

Thanks,
xoa

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Cacti monitoring of Solr and Tomcat

2012-11-19 Thread Andy Lester
Is anyone using Cacti to track trends over time in Solr and Tomcat metrics?  We 
have Nagios set up for alerts, but want to track trends over time.

I've found a couple of examples online, but none have worked completely for me. 
 I'm looking at this one next: 
http://forums.cacti.net/viewtopic.php?f=12t=19744start=15  It looks promising 
although it doesn't monitor Solr itself.

Suggestions?

Thanks,
Andy

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: Cacti monitoring of Solr and Tomcat

2012-11-19 Thread Andy Lester

On Nov 19, 2012, at 1:46 PM, Otis Gospodnetic otis.gospodne...@gmail.com 
wrote:

 My favourite topic ;)  See my sig below for SPM for Solr. At my last
 company we used Cacti but it felt very 1990s almost. Some ppl use zabbix,
 some graphite, some newrelic, some SPM, some nothing!


SPM looks mighty tasty, but we must have it in-house on our own servers, for 
monitoring internal dev systems, and we'd like it to be open source.

We already have Cacti up and running, but it's possible we could use something 
else.

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: DataImportHandler in Solr 1.4 bug?

2012-11-15 Thread Andy Lester

On Nov 15, 2012, at 8:02 AM, Sébastien Lorber lorber.sebast...@gmail.com 
wrote:

  entity name=PARAM query=SELECT key_name AS KEY, string_val AS
 VALUE FROM BATCH_JOB_PARAMS WHERE JOB_INSTANCE_ID =
 ${JOB_EXEC.JOB_INSTANCE_ID}
field column=VALUE name=JOB_PARAM_${PARAM.KEY} /
  /entity


I don't know where you're getting the ${JOB_EXEC.JOB_INSTANCE_ID}.  I believe 
that if you want to get parameters passed in, it looks like this:

   WHERE batchid = ${dataimporter.request.batchid}

when I kick off the DIH like this:

   $url/dih?command=full-importentity=titlescommit=truebatchid=47

At least that's how it works for me in 3.6 and 4.0.

xoa

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Searchers, threads and performance

2012-11-13 Thread Andy Lester
We're getting close to deploying our Solr search solution, and we're doing 
performance testing, and we've run into some questions and concerns.

Our number one problem: Doing a commit from loading records, which can happen 
throughout the day, makes all queries stop for 5-7 seconds.  This is a 
showstopper for deployment.

Here's what we've observed: Upon commit, Solr finishes processing queries in 
flight, starts up a new searcher, warms it, shuts down the old searcher and 
puts the new searcher into effect. Does the old searcher stop taking requests 
before the new searcher is warmed or after? How wide is the window of time 
wherein Solr is not serving requests?  For us, it's about five seconds and we 
need to drop that dramatically.  In general, what is the difference between 
accepting the delay of waiting for warming vs. accepting the delay of running 
useColdSearcher=true?

Is there any such thing as/any sense in running more than one searcher in our 
scenario?  What are the benefits of multiple searchers?  Erik Erikson posts in 
2012: Unless you have warming happening, there should only be a single 
searcher open at any given time. Except: If your queries run across several 
commits you'll get multiple searchers open. Not sure if this is a general 
observation, or specific to the particular poster's situation.

Finally, what do people mean when they blog that they have Solr set up for n 
threads? Is that the same thing as saying that Solr can be processing n 
requests simultaneously?

Thanks for any insight or even links to relevant pages.  We've been Googling 
all over and haven't found answers to the above.

Thanks,
Andy

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Searchers, threads and performance

2012-11-13 Thread Andy Lester
We're getting close to deploying our Solr search solution, and we're doing 
performance testing, and we've run into some questions and concerns.

Our number one problem: Doing a commit from loading records, which can happen 
throughout the day, makes all queries stop for 5-7 seconds.  This is a 
showstopper for deployment.

Here's what we've observed: Upon commit, Solr finishes processing queries in 
flight, starts up a new searcher, warms it, shuts down the old searcher and 
puts the new searcher into effect. Does the old searcher stop taking requests 
before the new searcher is warmed or after? How wide is the window of time 
wherein Solr is not serving requests?  For us, it's about five seconds and we 
need to drop that dramatically.  In general, what is the difference between 
accepting the delay of waiting for warming vs. accepting the delay of running 
useColdSearcher=true?

Is there any such thing as/any sense in running more than one searcher in our 
scenario?  What are the benefits of multiple searchers?  Erik Erikson posts in 
2012: Unless you have warming happening, there should only be a single 
searcher open at any given time. Except: If your queries run across several 
commits you'll get multiple searchers open. Not sure if this is a general 
observation, or specific to the particular poster's situation.

Finally, what do people mean when they blog that they have Solr set up for n 
threads? Is that the same thing as saying that Solr can be processing n 
requests simultaneously?

Thanks for any insight or even links to relevant pages.  We've been Googling 
all over and haven't found answers to the above.

Thanks,
xoa

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: Debugging DIH

2012-08-24 Thread Andy Lester

On Aug 24, 2012, at 9:17 AM, Hasan Diwan wrote:

 dataConfig
dataSource type=JdbcDataSource driver=org.h2.Driver
 url=jdbc:h2:tcp://192.168.1.6/finance user=sa /
document
  entity name=receipt query=select location as location, amount as
 amount, done_on as when from RECEIPTS as r join APP_USERS as a on r.user_id
 = a.id/
/document
 /dataConfig
 
 and I've added the appropriate fields to schema.xml:
  field name=location type=string indexed=true stored=true/
   field name=amount type=currency indexed=true stored=true/
   field name=when type=date indexed=true stored=true/
 
 There's nothing in my index and 343 rows in my table. What is going on? -- H


I don't see that you have anything in the DIH that tells what columns from the 
query go into which fields in the index.  You need something like

field name=location column=location /
field name=amount column=amount /
field name=when column=when /

xoa

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: Holy cow do I love 4.0's admin screen

2012-08-23 Thread Andy Lester
 can you elaborate on your comment related to your polling script written in
 ruby and how the new data import status screen makes your polling app
 obsolete?

The 4.0 admin tools have a screen that give the status in the web app so I 
don't have to run the CLI tool to check the indexing status.

However, it will still be necessary if I need to wait for indexing to complete 
in, for example, a Makefile or a script.

xoxo
Andy

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: Bitmap field in solr

2012-08-23 Thread Andy Lester

On Aug 23, 2012, at 2:54 PM, Rohit Harchandani wrote:

 Hi all,
 Is there any way to have a bitmap field in Solr??
 I have a use case where I need to search specific attributes of a document.
 Rather than having an is_A, is_B, is_C (all related to each other)etc...how
 would i store all this data in a single field and still be able to query
 it?? Can it be done in any way apart from storing them as strings in a text
 field?


You can have a field that is multiValued.  It still needs a base type, like 
string or int.  For instance, in my book database, I have a field called 
classifications and it is multivalued.  

field name=classifications  type=string multiValued=true 
/

A classification of 1 means spiralbound, and 2 means large print and 3 
means multilingual and so on.  So if my user wants to search for a 
multilingual book, I search for classifications:3.  If you want spiralbound 
large print, you'd search for classifications:1 classifications:2.

xoa

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: Solr Index problem

2012-08-23 Thread Andy Lester

On Aug 23, 2012, at 4:46 PM, ranmatrix S ranmat...@gmail.com wrote:

 The schema and fields in db-data-config.xml are one and the same.

Please attach or post both the schema and the DIH config XML files so we can 
see them.  The DIH can be pretty tricky.

You say you can see 9 records are returned back.  How do you see that?

xoa

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Solr makes long requests about once a minute

2012-08-08 Thread Andy Lester
I'm having a problem with Solr under Tomcat unexpectedly taking a long time to 
respond to queries.  As part of some stress testing, I wrote a bot that just 
does random word searches on my Solr install, and my responses typically come 
back in 10-50 ms.  The queries are just 1-3 random words from 
/usr/share/dict/words, and I cap off the results at 2500 hits.  

The queries run just fine and I typically get responses up to 50ms for large 
result sets.  Here's an example of my log:

TIME HITS   MS SEARCH WORDS
12:33:2015 hoovey Aruru kwachas
12:33:2085 blinis twyver
12:33:20 2500   34 prework burlily sunshine
12:33:20 1928   30 rendu Solly
12:33:20   unnethe
12:33:20   gadwell afterpeak
12:33:20  792   14 steen
12:33:2047 blanchi repaving
12:33:20   326 torbanite Storz ungag
12:33:2075 chemostat
12:33:20   156 Guauaenok Adao lakist
12:33:2066 bechance viny
12:33:20   206 chagigah
12:33:22  532 2404 bonne
12:33:22  1439 nonman Norrie
12:33:22   246 repealers
12:33:22   Pfosi laniard locutory
12:33:22   516 sexipolar wordsmith enshield
12:33:22   loggiest Aryanise koels
12:33:22   fogyish unforcing
12:33:2245 Millvale chokies
12:33:2256 Melfa ripal Olva
12:33:22   156 apio Heraea latimeria
12:33:2245 nonnitric parleying

See that one line where it 2404ms to return?  I get those about once a minute, 
but not at a regular interval.  I ran this for two hours and got 122 spikes in 
120 minutes.  I ran it overnight and came in to work to find that there were 
1283 spikes in 1260 minutes.  So that one-a-minute is a pattern.

As I write this, I'm in IRC with Chris Hostetter and he says:

--snip--
Probably need to tweak your garbage collector settings to something that 
doesn't involve stop the world ... the specifics of the changes largely 
depend on what JVM you are using, what options you already have set, etc.  
markrmiller wrote a good blog about this a little while back: 
http://searchhub.org/dev/2011/03/27/garbage-collection-bootcamp-1-0/  There's 
also some notes here in the LucidWorks Solr Ref Guide: 
http://lucidworks.lucidimagination.com/display/solr/JVM+Settings
--snip--

GC certainly sounds like a reasonable suspect.  Any other suggestions?  Any 
hints on Solr-specific GC tuning?  I'm currently scouring Google.

Thanks,
xoa

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: Solr makes long requests about once a minute

2012-08-08 Thread Andy Lester

On Aug 8, 2012, at 10:53 AM, Michael Della Bitta wrote:

 What version of Solr are you running and what Directory implementation
 are you using? How much RAM does your system have, and how much is
 available for use by Solr?

Solr 3.6.0

I don't know what directory implementation means.  Are you asking about 
directoryFactory?  All I have in my solrconfig.xml is

directoryFactory name=DirectoryFactory
class=${solr.directoryFactory:solr.StandardDirectoryFactory}/

The box has 16GB in it and currently has literally nothing else running on it.  
As to the how much is available for use by Solr, is there somewhere that I'm 
setting that in a config file?

Clearly, I'm entirely new to the whole JVM ecosystem. I'm coming from the world 
of Perl.

Thanks,
xoa

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance