solrcloud without faceting, i.e. for failover only
Hi all, We have a smallish index that performs well for searches and are considering using solrcloud --but just for high availability/redundancy, i.e. without any sharding. The indexes would be replicated, but not distributed. I know that "there are no stupid questions..Only stupid people"...but here goes: -is solrcloud w/o sharding done?( I.e. "it's just not done!!" ) -any downside (i.e. aside from the lack of horizontal scalability ) will
Re: null pointer on FSTCompletionLookup
Update: this was a configuration error. In my haste/carelessness, instead of defining separate "spellcheck" and "suggest" components, I defined only "suggest". (More specifically I copied over the ch10 examples from "solr in action", but did not copy the "spellcheck" component." ) When solr complained about not finding 'spellcheck' component, I looked over my (bad) solrconfig.xml and thought "hmm. 'spellcheck..component probably should be 'suggest'". It worked after re-indexing..and appeared to function correctly. Cracking open the hard copy, sitting down in the easy chair, looking carefully over the chapter brought the issue to my attention. thanks for your patience...
Re: null pointer on FSTCompletionLookup
Hi all, I know this probably seems like an uninteresting problem and smells, even to me, like a stupid/newbie mis-configuration [Yes. I am reading the excellent solr in action and trying my hand at applying the "suggestion examples"], but I looked a bit into this tonight, fired up the debugger, stepped through code, etc to try to find where I erred: to no avail. Some questions: First, does the SpellCheck component's "FSTLookupFactory" require any extra special configuration, e.g. term vectors for the field ("suggest" below), etc.: org.apache.solr.spelling.suggest.fst.FSTLookupFactory suggest Second, why does the FSTCompletionLookup not check for nulls here for these variables: higherWeightsCompletion and normalCompletion? Wo if (higherWeightsFirst) { completions = higherWeightsCompletion.lookup(key, num); } else { completions = normalCompletion.lookup(key, num); } [Stepping through the code, I saw it execute this constructor: /** * This constructor prepares for creating a suggested FST using the * {@link #build(TermFreqIterator)} method. * * @param buckets * The number of weight discretization buckets (see * {@link FSTCompletion} for details). * * @param exactMatchFirst * If true exact matches are promoted to the top of the * suggestions list. Otherwise they appear in the order of * discretized weight and alphabetical within the bucket. */ public FSTCompletionLookup(int buckets, boolean exactMatchFirst) { This constructor never initializes the two *Completion variables ] Third: I got inconsistent results. If I started solr afresh: this error appeared. If I reindexed my test site, then executed my 'problematic searches' , the problem went away. Why would this happen Thanks in advance On Wed, Jun 4, 2014 at 9:32 AM, Will Milspec wrote: > Hi all, > > Someone posted this problem over a year ago but I did not see a clear > resolution in the thread. > > Intermittently--i.e. for some searches, not others--the > 'suggest/spellcheck' component throws a n NullPointerException (NPE) when a > user executes a search. It fails on FSTCompletionLookup (line 244) > > I'm using solr 4.4. ( I'm using 4.4 to match "what's in production")I > could upgrade if necessary. ) > > Any hints on why it occurs and how to fix? The earlier post alluded to > "changing the field type solved the problem", but did not provide details. > > Thanks > > will > > /select request handler: > > >on > suggestDictionary > false > 5 > 2 > 5 > true > true > 5 > 3 > > spellcheck component: > > > > > suggestDictionary > name="classname">org.apache.solr.spelling.suggest.Suggester > name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookupFactory > title > > 0. > true > > > > field type definition: > > > positionIncrementGap="100"> > > > > words="stopwords.txt" /> > > > > > > words="stopwords.txt" /> > ignoreCase="true" expand="true"/> > > > > > field definition: > > > multiValued="false" omitNorms="false"/> > > It fails here: > === > Here's the line that fails. > > @Override > public List lookup(CharSequence key, boolean > higherWeightsFirst, int num) { > final List completions; > if (higherWeightsFirst) { > completions = higherWeightsCompletion.lookup(key, num); > } else { > completions = normalCompletion.lookup(key, num); <-- fails on this > line > > } > >
null pointer on FSTCompletionLookup
Hi all, Someone posted this problem over a year ago but I did not see a clear resolution in the thread. Intermittently--i.e. for some searches, not others--the 'suggest/spellcheck' component throws a n NullPointerException (NPE) when a user executes a search. It fails on FSTCompletionLookup (line 244) I'm using solr 4.4. ( I'm using 4.4 to match "what's in production")I could upgrade if necessary. ) Any hints on why it occurs and how to fix? The earlier post alluded to "changing the field type solved the problem", but did not provide details. Thanks will /select request handler: on suggestDictionary false 5 2 5 true true 5 3 spellcheck component: suggestDictionary org.apache.solr.spelling.suggest.Suggester org.apache.solr.spelling.suggest.fst.FSTLookupFactory title 0. true field type definition: field definition: It fails here: === Here's the line that fails. @Override public List lookup(CharSequence key, boolean higherWeightsFirst, int num) { final List completions; if (higherWeightsFirst) { completions = higherWeightsCompletion.lookup(key, num); } else { completions = normalCompletion.lookup(key, num); <-- fails on this line }
solr multi-tenant: anyone use per-tenant synonyms file?
Hi all, I've been reading up on solr cloud (via solr in action) with an eye toward multi-tenancy. (Read: "solrcloud newbie") One question that came up: what if a "one size fits all" synonyms file does not work for all customers? i.e. different customers/industries use different sets of synonyms. Example - "bond=loan" for banking - "bond=adhere" for manufacturing In "non-cloud" solr we would have use solr cores with identical schemas, but different 'synonyms.txt' files. thanks will
localizing 'display names' for facet valus
Hi all, What's the cleanest way to solve this problem: localize the 'display names' for facet values without storing the localized names in solr. Example: -store 'country code' field in solr document -facet on country code -translate the country code based on the user's locale For the facets, the English user would see: England 10 France 20 United States 5 and the French user would see: Angleterre10 France 20 Etats Unis5 Reading through Solr in Action, I don't see that solr has any 'native' tool to 'decode facet names' I see that the 'key', will decode the facet name, but not the actual value. Additionally: we are interested in using AjaxSolr in the medium term future. Between the library and/or javascript, does ajax-solr offer additional techniques? I wonder if anyone could recommend a clean solution. thanks in advance, will
overhead of empty, unused fields
hi all, What are the cost of unused field types? Our application supports multiple languages. We envision separate Lucene/Solr fields (and field types) per language (conten_en, content_fr, content_zh_CN,etc). We thought of a few optons: a) auto-generating the 'multilingual' portion of the schema based on the application's languages, b) include fields-and-types for all languagues In A, if an implemenation only used French and Chinese, the schema would only have content_en and conten_zh_CN fields-and-types. In B, the implementation would have all field types, but a give document would only have two fields A seems "more efficiient", but less work. The downside: if a user wants to add a language, they would need to regenerate the schema (i.e. add fields-and-types for "ja") How much do empty field types and fields? Do a dozen-or-so unused field types hurt scalability of indexing or search? thanks, will
Synonym and Whitespaces and optional TokenizerFactory
Hi all, This may be obvious. My question pertains to use of tokenizerFactory together with SynonymFilterFactory. Which tokenizerFactory does one use to treat "synonyms with spaces" as one token, Example these two entries are synonyms: "lms", "learning management system" index time expansion would expand "lms" to these terms "lms" "learning management system" i.e. not like this: "lms" "learning" "management" "system" Excerpt from the wiki article: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters The optional *tokenizerFactory* parameter names a tokenizer factory class to analyze synonyms (see https://issues.apache.org/jira/browse/SOLR-319), which can help with the synonym+stemming problem described in http://search-lucene.com/m/hg9ri2mDvGk1 . thanks, will
how to build lucene-solr (espeically if behind a firewall)?
hi all, building lucene/solr behind the firewall fails for us due to proxy errors. I tried setting the ant_opts -Dhttp.proxyHost, etc, but found the "lucene" portion still failed on javadoc links. I worked round this by changing failonjavadocerror to 'false' in lucene/common-build.xml (or alternatively adding -J-Dhttp.proxyHost, etc as "args" element to the javadoc tasks), but then 'changes2html' failed to connect to https://issues.apache.org. I'm posting to the solr-user group (even though compiling is developer-ish stuff) as we need to apply a few patches lucene-solr. Would someone be so kind as to post the following? * Easiest way to build lucene-solr from source * same, but if you're behind the firewall. thanks, will
Any chance of getting SOLR-949 into the application
hi all, Our applications requires term vectors and uses SOLR-949 solrj patch to simplify the client layer. This patch eliminates the need to manually parse the xml returned by the tvrh (term vector response handler) https://issues.apache.org/jira/browse/SOLR-949 Can we get this in the head/trunk? Re-patching after each solr upgrade is a bit error prone. thanks will
Git tag for 3.1 release?
Hi all, Does the lucene-solr git repository have a tag that marks the 3.1 release? Context: I want to apply a patch to 3.1 and wish to start from a well-defined point (i.e. official 3.1 release) Executing these commands, I would have expected to see a tag marking the 3.1 release. I only see "before_flex_merge", however. $git checkout lucene_solr_3-1 Checking out files: 100% (3831/3831), done. Switched to branch 'lucene_solr_3_1' $git tag before_flex_merge thanks will
SOLR-236 (Field Collapsing) patch and 3.1
Hi all, We're using the solr-236 (field collapsing) patch on solr 1.4.1 and wish to upgrade to 3.1 Has anyone applied this patch to 3.1, successfully or unsuccessfully? [ftr, Solr 4.x includes field collapsing; 3.1 does not ] The issue has several patch files, including some for 1.4.1 specifically. I don't see one for 3.1 specifically. I can go ahead and apply it, but wanted to check for any "know 3.1 issues" jira: https://issues.apache.org/jira/browse/SOLR-236 thanks, will
Anyone seen measurable performance improvement using Apache Portable Runtime (APR) with Solr and Tomcat
Hi all, Has anyone seen used Apache Portable Runtime (APR) in conjunction with Solr and Tomcat? Has anyone seen (or better, measured) performance improvements when using APR? APR is a library that implements some functionality using Native C (see http://apr.apache.org/ and http://en.wikipedia.org/wiki/Apache_Portable_Runtime) >From wikipedia entry: The range of platform-independent functionality provided by APR includes: * Memory allocation and memory pool functionality * Atomic operations * Dynamic library handling * File I/O * Command argument parsing * Locking * Hash tables and arrays * Mmap functionality * Network sockets and protocols * Thread, process and mutex functionality * Shared memory functionality * Time routines * User and group ID services I could imagine benefits in file IO as network IO. But that's pure conjecture. Comments? thanks in advance
Re: Where does admin UI visually distinguish between "master" and "slave"?
Hi all, Thanks for the feedback. I've checked the code with a few different inputs and believe I have found a bug. Could someone comment as to whether I'm missing something? I will file go ahead and file it if someone can attest "looks like a bug". Bug Summary: == - Admin UI replication/index.jsp checks for master or slave with the following code: if ("true".equals(detailsMap.get("isSlave"))) - if slave, replication/index.jsp displays the "Master" and "Poll Intervals", etc. sections (everything up to "Cores") - if false, replication/index.jsp does not display the "Master", "Poll Intervals" section -This "slave check/UI difference" works correctly if the solrconfig.xml has a "slave" but not "master" section or vice versa Expected results: == Same UI difference would occur in the following scenario: a) solrconfig.xml has both master and slave entries b) use java.properties (-Dsolr.enable.master -Dsolr.enable.slave) to set "master" or "slave" at runtime *OR* c) use solrcore.properties to set "master" and "slave" at runtime Actual results: == If solrconfig.xml has both master and slave entries, replication/index.jsp shows both "master" and "slave" section regardless of system.properties On Wed, Jan 12, 2011 at 10:35 AM, Markus Jelsma wrote: > Well, slaves to show different things in the replication.jsp page. > > Master http://10cc:8080/solr/replication > Poll Interval 00:00:10 > Local Index Index Version: 1294666552434, Generation: 2515 >Location: /var/lib/solr/data/index >Size: 4.65 GB >Times Replicated Since Startup: 934 > > Where master nodes (or slaves where enabled=false) show: > > Local Index Index Version: 1294666552449, Generation: 2530 >Location: /var/lib/solr/data/index >Size: 4.65 GB > > On Wednesday 12 January 2011 17:24:57 Otis Gospodnetic wrote: > > Hi Will, > > > > I don't think we have a clean "master" or "slave" label anywhere in the > > Admin UI. > > > > Otis > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > Lucene ecosystem search :: http://search-lucene.com/ > > > > > > > > - Original Message > > > > > From: Will Milspec > > > To: solr-user@lucene.apache.org > > > Sent: Wed, January 12, 2011 11:18:17 AM > > > Subject: Where does admin UI visually distinguish between "master" and > > > > "slave"? > > > > > Hi all, > > > > > > I'm getting started with a master/slave configuration for two solr > > > instances. Two distinguish between 'master' and 'slave', I've set he > > > system properties (e.g. "-Dmaster.enabled") and using the same > > > 'solrconfig.xml'. > > > > > > I can see via the system properties admin UI that the jvm (and thus > > > solr) sees correct values, i.e.: > > > enable.master = false > > > enable.slave = true > > > > > > However, the replication admin UI is identical for both 'master' and > > > 'slave'. (i.e. > > > http://localhost:8983/solr/production/admin/replication/index.jsp) > > > > > > I'd like a clearer visual confirmation that the master node is indeed > a > > > master and the slave is a slave. > > > > > > Summary question: > > > Does the admin UI distinguish betwen "master and slave"? > > > > > > thanks > > > > > > will > > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350 >
Where does admin UI visually distinguish between "master" and "slave"?
Hi all, I'm getting started with a master/slave configuration for two solr instances. Two distinguish between 'master' and 'slave', I've set he system properties (e.g. "-Dmaster.enabled") and using the same 'solrconfig.xml'. I can see via the system properties admin UI that the jvm (and thus solr) sees correct values, i.e.: enable.master = false enable.slave = true However, the replication admin UI is identical for both 'master' and 'slave'. (i.e. http://localhost:8983/solr/production/admin/replication/index.jsp) I'd like a clearer visual confirmation that the master node is indeed a master and the slave is a slave. Summary question: Does the admin UI distinguish betwen "master and slave"? thanks will
Tips for 'staggered date facets', i.e. 'last 24 hours, last week, last month, last year' , ala google news?
hi all, We wish to implement date faceting with a 'sliding date range', 'last 24 hours, last week, last month, last year' . Google New currently implements such faceting when you search for a topic. As Solr's standard date faceting does not appear to meet this need, we will need to use faceting on arbitrary queries, i.e. by passing multiple values for facet.query The question: Any tips or suggestions for ensuring this performs well? thanks, will
How badly does NTFS file fragmentation impact search performance? 1.1X? 10X? 100X?
Hi all, Pardon if this isn't the best place to post this email...maybe it belongs on the lucene-user list . Also, it's basically windows-specific,so not of use to everyone... The question: does NTFS fragmentation affect search performance "a little bit" or "a lot"? It's obvious that "fragmentation will slow things down", but is it a factor of .1, 10 , or 100? (i.e what order of magnitude)? As a follow up: should solr/lucene users periodically remind Windows sysadmins to defrag their drives ? On a production system, I ran the windows defrag "analyzer" and found heavy fragmentation on the lucene index. 11,839 492 MB \data\index\search\_6io5.cfs 7,153 433 MB \data\index\search\_5ld6.cfs 6,953 661 MB \data\index\search\_8jvj.cfs 5,824 74 MB \data\index\search\_5ld7.frq 5,691 356 MB \data\index\search\_9eev.fdt 5,638 352 MB \data\index\search\_8mqi.fdt 5,629 352 MB \data\index\search\_8jvj.fdt 5,609 351 MB \data\index\search\_88z8.fdt 5,590 355 MB \data\index\search\_96l5.fdt 5,568 354 MB \data\index\search\_8zjn.fdt 5,471 342 MB \data\index\search\_5wgo.fdt 5,466 342 MB \data\index\search\_5uo1.fdt 5,450 340 MB \data\index\search\_5hrn.fdt 5,429 345 MB \data\index\search\_6nyy.fdt 5,371 353 MB \data\index\search\_8sob.fdt Incidentally, we periodically experience some *very* slow searches. Out of curiousity, I checked for file fragmentation (using 'analyze' mode of the nfts defragger) nota bene: Windows sysinternals has a utility "Contig.exe" whic allows you to defragment individual drives/directories. We'll use that to defragmeent the index direcotires will
can solrj swap cores?
hi all, Does solrj support "swapping cores"? One of our developers had initially tried swapping solr cores (e.g. core0 and core1) using the solrj api, but it failed. (don't have the exact error) He susequently replaced the call with straight http (i.e. http client). Unfortunately I don't have the exact error in front of me... Solrj code: CoreAdminRequest car = new CoreAdminRequest(); car.setCoreName("production"); car.setOtherCoreName("reindex"); car.setAction(CoreAdminParams.CoreAdminAction.SWAP); SolrServer solrServer = SolrUtil.getSolrServer(); car.process(solrServer); solrServer.commit(); Finally, can someone comment on the solrj javadoc on CoreAdminRequest: * This class is experimental and subject to change. thanks, will
nexus of synonyms and stemming, take 2
hi all, [This is a second attempt at emailing. The apache mailing list spam filter apparently did not like my synonyms entry, ie.. classified my email as spam. I have replaced phone with 'foo' , 'cell' with 'sell' and 'mobile' with 'nubile' ] This is a fairly basic synonyms question: how does synonyms handle stemming? Example: Synonyms.txt has entry: sell,sell foo,nubile,nubile foo,wireless foo If I want to match on 'sell foos'... a) do I need to add an entry for 'sell foos' (i.e. in addition to sell foo) b) or will the stemmer (porter/snowball) handle this already thanks will
best way to get maxDocs in java (i.e. as on stats.jsp page).
hi all, What's the best way to programmatically-in-java get the 'maxDoc' attribute (as seen on the stats.jsp page). I don't see any hooks on the solrj api. Currently I plan to use an http client to get stats.jsp (which returns xml) and parse it using xpath. If anyone can recommend a better approach, please opine. thanks will
Solr Git Tags
Hi all, (This question is more oriented to the developer but may find relevant to the solr user interested in perusing the source) I've cloned the git lucene-solr repository and was surprised to find no tags. empty here: http://git.apache.org/lucene-solr.git/refs/tags/ Whereas the 'older' git repository ((pre-lucene-solr-merge) has tags , i.e. 1.4.0, 1.4.1, etc. http://git.apache.org/solr.git/refs/tags/ Can someone point me to an explanation? Do I need to use svn instead? I seek to check out the 1.4.1 source so I could patch a class. I want to patch against the current stable version (1.4.1) rather than the latest commit. thanks, will
Any Copy Field Caveats?
Hi all, we're moving from an old lucene version to solr and plan to use the "Copy Field" functionality. Previously we had "rolled our own" implementation, sticking title, description, etc. in a field called 'content'. We lose some flexibility (i.e. java layer can no longer control what gets in the new copied field), at the expense of simplicity. A fair tradeoff IMO. My question: has anyone found any subtle issues or "gotchas" with copy fields? (from the subject line "caveat"--pronounced 'kah-VEY-AT' is Latin as in "Caveat Emptor"..."let the buyer beware"). thanks, will will
Override SynonymFilterFactory to load synonyms from alternate data source
Hi all, Can anyone comment on the ease/merit of overriding the shipped SynonymFilterFactory with a version that could load the synonyms from an alternate data source? Our application currently maintains synonyms in its database ; we could export this data to 'synonyms.txt', but would prefer a db aware implementationv of SynonymFilterFactory, i.e. avoiding that middle step. >From the looks of the class (private instances, static methods), it doesn't lend itself to easy subclassing.. Any comments or recommendations? thanks will
how to get TermVectorComponent using xml , vs. SOLR-949
Hi all, This seems a basic question: what's the best way to get TermVectorComponents. from the Solr XmL response? SolrJ does not include TermVectorComponents in its api; the SOLR-949 patch adds this ability, but after 2 years it's still not in the mainline. (And doesn't patch cleanly to the current head 1.4). I'm new to Solr and familiar with the SolrJ but not as the best means for getting/parsing the raw xml. (Typically I find the dtd and right code to parse the dom using the dtd. In this case I've seen a few examples, but nothing definiive) Our team would rather use the "out of the box" solr rather than manually apply patches and worry about consistency during upgrades... Thanks in advance, will