Re: DIH import from MySQL results in garbage text for special chars

2012-09-26 Thread Pranav Prakash
| | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ *Pranav Prakash* "temet nosce" On Wed, Sep 26, 2012 at 6:45 PM, Gora Mohanty wrote: > On 21 September 2012 11:19, Pranav Prakash wrote: > > > I am seeing the g

Re: DIH import from MySQL results in garbage text for special chars

2012-09-26 Thread Pranav Prakash
I looked at the HEX codes of the texts. The hex code in MySQL is different from that which is stored in the index. The hex code in index is longer than the hex code in MySQL, this leads me to the fact that somewhere in between smething is messing up, *Pranav Prakash* "temet nosce"

Re: DIH import from MySQL results in garbage text for special chars

2012-09-20 Thread Pranav Prakash
I am seeing the garbage text in browser, Luke Index Toolbox and everywhere it is the same. My servlet container is Jetty which is the out-of-box one. Many other special chars are getting indexed and stored properly, only few characters causes pain. *Pranav Prakash* "temet nosce" O

Re: Importing of unix date format from mysql database and dates of format 'Thu, 06 Sep 2012 22:32:33 +0000' in Solr 4.0

2012-09-10 Thread Pranav Prakash
The character is actually - “ and not " *Pranav Prakash* "temet nosce" On Mon, Sep 10, 2012 at 2:45 PM, Pranav Prakash wrote: > I am experiencing similar problem related to encoding. In my case, the > char like " (double quote) > is also garbaled. >

Re: Importing of unix date format from mysql database and dates of format 'Thu, 06 Sep 2012 22:32:33 +0000' in Solr 4.0

2012-09-10 Thread Pranav Prakash
that would resolve this. *Pranav Prakash* "temet nosce" On Sat, Sep 8, 2012 at 3:16 AM, Shawn Heisey wrote: > On 9/6/2012 6:54 PM, kiran chitturi wrote: > >> The error i am getting is 'org.apache.solr.common.**SolrException: >> Invalid >> Date String

Exact match on few fields, fuzzy on others

2012-08-01 Thread Pranav Prakash
the fields are below: -- *Pranav Prakash* "temet nosce"

Re: DIH XML configs for multi environment

2012-07-25 Thread Pranav Prakash
configs. *Pranav Prakash* "temet nosce" On Tue, Jul 24, 2012 at 1:17 AM, jerry.min...@gmail.com < jerry.min...@gmail.com> wrote: > Pranav, > > Sorry, I should have checked my response a little better as I > misspelled your name and, mentioned that I tried what Marcu

Re: can solr admin tab statistics be customized... how can this be achived.

2012-07-23 Thread Pranav Prakash
You can checkout Solr source code, do the patch work in admin JSP files and use it as your custom Solr Instance. *Pranav Prakash* "temet nosce" On Fri, Jul 20, 2012 at 12:14 PM, yayati wrote: > > > Hi, > > I want to compute my own stats in addition to solr

Re: How To apply transformation in DIH for multivalued numeric field?

2012-07-18 Thread Pranav Prakash
I had tried with splitBy for numeric field, but that also did not worked for me. However I got rid of group_concat and it was all good to go. Thanks a lot!! I really had a difficult time understanding this behavior. *Pranav Prakash* "temet nosce" On Thu, Jul 19, 2012 at 1:34 AM, D

Re: DIH XML configs for multi environment

2012-07-18 Thread Pranav Prakash
That approach would work for core dependent parameters. In my case, the params are environment dependent. I think a simpler approach would be to pass the url param as JVM options, and these XMLs get it from there. I haven't tried it yet. *Pranav Prakash* "temet nosce" On Tue,

How To apply transformation in DIH for multivalued numeric field?

2012-07-18 Thread Pranav Prakash
this case? *Pranav Prakash* "temet nosce"

Re: DIH XML configs for multi environment

2012-07-11 Thread Pranav Prakash
That's cool. Is there something similar for Jetty as well? We use Jetty! *Pranav Prakash* "temet nosce" On Wed, Jul 11, 2012 at 1:49 PM, Rahul Warawdekar < rahul.warawde...@gmail.com> wrote: > Hi Pranav, > > If you are using Tomcat to host Solr, you

DIH XML configs for multi environment

2012-07-11 Thread Pranav Prakash
dlers and so on. What is a good way to deal with this? *Pranav Prakash* "temet nosce"

Top 5 high freq words - UpdateProcessorChain or DIH Script?

2012-07-08 Thread Pranav Prakash
dd it to UpdateRequestProcessor Chain and insert the function after StopWordsFilterFactory and DuplicateRemoveFilterFactory, should be rather good way of doing this? -- *Pranav Prakash* "temet nosce"

Deduplication in MLT

2012-06-12 Thread Pranav Prakash
? *Pranav Prakash* "temet nosce"

Re: Typical Cache Values

2012-02-07 Thread Pranav Prakash
> > * > * > This is not unusual, but there's also not much reason to give this much > memory in your case. This is the cache that is hit when a user pages > through result set. Your numbers would seem to indicate one of two things: > 1> your window is smaller than 2 pages, see solrconfig.xml, >

Typical Cache Values

2012-02-07 Thread Pranav Prakash
cumulative_inserts : 1309934 cumulative_evictions : 1309245 *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google <http://www.google.com/profiles/pranny>

Re: Something like "featured results" in solr response?

2012-01-30 Thread Pranav Prakash
Wow, this looks interesting. *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google <http://www.google.com/profiles/pranny> On Mon, Jan 30, 2012 at 21:16, Erick Erickson wrote: > There's the t

Re: Something like "featured results" in solr response?

2012-01-30 Thread Pranav Prakash
restarts are as infrequent as config changes. What could be a sound way to implement this? *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google <http://www.google.com/profiles/pranny> 2012/1/30 Rafał Ku

Something like "featured results" in solr response?

2012-01-30 Thread Pranav Prakash
art from the results generated by Solr (which is based on relevancy, score), there is another set of documents which just comes up. It is very much similar to the "sponsored results" feature of Google. Can you guys point me to the appropriate resources for the same? *Pranav Prakash*

Re: Highlighting uses lots of memory and eventually slows down Solr

2011-12-19 Thread Pranav Prakash
No respinse !! Bumping it up *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google <http://www.google.com/profiles/pranny> On Fri, Dec 9, 2011 at 14:11, Pranav Prakash wrote: > Hi Group, > >

Highlighting uses lots of memory and eventually slows down Solr

2011-12-09 Thread Pranav Prakash
n&wt=ruby&hl=true&rows=12&defType=dismax&fl=id,title,description&debugQuery=false&start=0&q=asdfghjkl&bf=recip(ms(NOW,created_at),1.88e-11,1,1)&hl.simple.post=&ps=50} Any help on this would be greatly appreciated. Thanks in advance !! *Pranav Prakash* &

Howto Programatically check if the index is optimized or not?

2011-11-15 Thread Pranav Prakash
Hi, After the commit, my optimize usually takes 20 minutes. The thing is that I need to know programatically if the optimization has completed or not. Is there an API call through which I can know the status of optimization? *Pranav Prakash* "temet nosce" Twitter <http:

Re: Painfully slow indexing

2011-10-24 Thread Pranav Prakash
uld convert all of it into one XML file and then index? *are you calling commit after your batches or do an optimize by any chance?* I am not optimizing, but I am performing an autocommit every 10 docs. *Pranav Prakash* "temet nosce" Twitter <http://twitter.com

Painfully slow indexing

2011-10-19 Thread Pranav Prakash
would really appreciate kindness of community in order to get this indexing faster. false 10 10 2048 2147483647 300 1000 5 256 10 false true true 1 0 false 10 *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavpr

How to achieve Indexing @ 270GiB/hr

2011-10-04 Thread Pranav Prakash
lr in batches of max 500 docs. Even if using DataImportHandler what are the ways this could be optimized? If I am able to solve the problem of indexing data in our current setup, my life would become a lot easier. *Pranav Prakash* "temet nosce" Twitter <http://twitter.co

Suggestions on how to perform infrastructure migration from 1.4 to 3.4?

2011-09-30 Thread Pranav Prakash
might come along? *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google <http://www.google.com/profiles/pranny>

Can't use ms() function on non-numeric legacy date field

2011-09-27 Thread Pranav Prakash
field created_at * * *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google <http://www.google.com/profiles/pranny>

Re: StopWords coming in Top 10 terms despite using StopFilterFactory

2011-09-23 Thread Pranav Prakash
ng, then I'm not sure what's going on, and some cut/paste of > what you're actually seeing might be in order. term frequencyto 26164and 25804the 25566of 25022a 24918in 24590for 23646n23588 with 23055is 22510 > Did you do delete and do a full reindex after you changed y

StopWords coming in Top 10 terms despite using StopFilterFactory

2011-09-22 Thread Pranav Prakash
Hi List, I included StopFilterFactory and I can see it taking action in the Analyzer Interface. However, when I go to Schema Analyzer, I see those stop words in the top 10 terms. Is this normal? *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavpr

Re: Stemming and other tokenizers

2011-09-20 Thread Pranav Prakash
(which is expected in Solr 3.5) The point where I am unclear is, how do I specify at Index time, to use a certain field for a certain language? *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google <http:

Re: java.io.CharConversionException While Indexing in Solr 3.4

2011-09-20 Thread Pranav Prakash
file to test Solr's behavior towards UTF-8 chars. Great wok Solr team, and special thanks to Erik Hatcher. *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google <http://www.google.com/profiles/pranny>

Re: java.io.CharConversionException While Indexing in Solr 3.4

2011-09-19 Thread Pranav Prakash
there a setting so I can change the level of backtrace? This would be helpful in showing the complete stack instead of 26 more ... *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google <http://www.google.com/profi

java.io.CharConversionException While Indexing in Solr 3.4

2011-09-19 Thread Pranav Prakash
around, I see issue https://issues.apache.org/jira/browse/SOLR-2381 which seems to fix the issue. I thought this patch is already applied to Solr 3.4.0. Is there something I am missing? Is there anything else I need to mention? Logs/ My document details etc.? *Pranav Prakash* "temet

How To Implement Sweet Spot Similarity?

2011-09-16 Thread Pranav Prakash
are the good approaches for figuring out sweet spots? Can a combination of multiple Similarity Classes be used? Any information would be so appreciated. *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google

Solr 3.3. Grouping vs DeDuplication and Deduplication Use Case

2011-08-29 Thread Pranav Prakash
later (and gets added later to index). AFAIK, Deduplication targets index time. Is there a means I can specify the original which should be returned and the duplicates which could be removed from coming up.? *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pran

Re: OOM due to JRE Issue (LUCENE-1566)

2011-08-16 Thread Pranav Prakash
> > > AFAIK, solr 1.4 is on Lucene 2.9.1 so this patch is already applied to > the version you are using. > maybe you can provide the stacktrace and more deatails about your > problem and report back? > Unfortunately, I have only this much information with me. However following is my speficiations

OOM due to JRE Issue (LUCENE-1566)

2011-08-16 Thread Pranav Prakash
ve to manually apply the patch? What are the other workarounds of the problem? Thanks in adv. *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google <http://www.google.com/profiles/pranny>

Re: Is optimize needed on slaves if it replicates from optimized master?

2011-08-10 Thread Pranav Prakash
Very well explained. Thanks. Yes, we do optimize Index before replication. I am not particularly worried about disk space usage. I was more curious of that behavior. *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.

How come this query string starts with wildcard?

2011-08-10 Thread Pranav Prakash
bly my search index didn't had any. *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google <http://www.google.com/profiles/pranny>

Re: Is optimize needed on slaves if it replicates from optimized master?

2011-08-10 Thread Pranav Prakash
size. Am I doing something wrong? How can I get the master to serve only delta index instead of serving whole index and the slaves merging the new and old index? *Pranav Prakash*

Re: PivotFaceting in solr 3.3

2011-08-02 Thread Pranav Prakash
>From what I know, this is a feature in Solr 4.0 marked as SOLR-792 in JIRA. Is this what you are looking for ? https://issues.apache.org/jira/browse/SOLR-792 *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.mybli

Re: Solr 3.3 crashes after ~18 hours?

2011-08-02 Thread Pranav Prakash
performing commit and then optimize while the load from app server was at its peak. This caused slow response from search server, which caused requests getting stacked up at app server and causing 503s. Could you look if you have a similar syndrome? *Pranav Prakash* "temet nosce" Twi

Re: Solr Incremental Indexing

2011-07-31 Thread Pranav Prakash
at the db level), which would fork a process to update Solr about this change by means of delayed task. If using this approach, it is suggested to use autocommit every N documents, N could be anything depending your app. *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/

Re: Dealing with keyword stuffing

2011-07-29 Thread Pranav Prakash
ld I start beginning with it? Pl. do not assume less obvious things, I am still learning !! :-) *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google <http://www.google.com/profiles/pranny> On Thu, Jul 28, 201

Re: Index

2011-07-28 Thread Pranav Prakash
gain more insight about the index. *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google <http://www.google.com/profiles/pranny> On Fri, Jul 29, 2011 at 03:40, GAURAV PAREEK wrote: > Yes NICK you ar

Re: Dealing with keyword stuffing

2011-07-28 Thread Pranav Prakash
On Thu, Jul 28, 2011 at 08:31, Chris Hostetter wrote: > > : Presumably, they are doing this by increasing tf (term frequency), > : i.e., by repeating keywords multiple times. If so, you can use a custom > : similarity class that caps term frequency, and/or ensures that the > scoring > : increases

Dealing with keyword stuffing

2011-07-27 Thread Pranav Prakash
of things like providing different boosts to different fields, but almost everything seems to fail. I'd like to know how did you guys fixed this thing? *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google <http://www.google.com/profiles/pranny>

Re: Index Version and Epoch Time?

2011-06-28 Thread Pranav Prakash
is there a configuration that can be adjusted for this? Also, what would the index state be if after the restarting Solr, a commit is applied or a commit is not applied? I'd be happy to provide any other information that might be needed. *Pranav Prakash* "temet nosce" Twitter <http

Re: Removing duplicate documents from search results

2011-06-28 Thread Pranav Prakash
I found the deduplication thing really useful. Although I have not yet started to work on it, as there are some other low hanging fruits I've to capture. Will share my thoughts soon. *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <

Index Version and Epoch Time?

2011-06-28 Thread Pranav Prakash
ge on every commit? If not, is there a way to look into the last index time? Also, this page http://wiki.apache.org/solr/SolrReplication#Replication_Dashboard shows a Replication Dashboard. How is this dashboard invoked? Is there any URL which needs to be called? *Pranav Prakash* "temet nosce

Custom Handler support in Solr-ruby

2011-06-28 Thread Pranav Prakash
f an overhead? Or am I missing something? Also, where can I file bugs to solr-ruby? *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google <http://www.google.com/profiles/pranny>

Re: how to index data in solr form database automatically

2011-06-24 Thread Pranav Prakash
Cron is a time-based job scheduler in Unix-like computer operating systems. en.wikipedia.org/wiki/Cron *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google <http://www.google.com/profiles/pranny> O

Re: Removing duplicate documents from search results

2011-06-23 Thread Pranav Prakash
% similar. *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google <http://www.google.com/profiles/pranny> On Thu, Jun 23, 2011 at 15:16, Omri Cohen wrote: > What you need to do, is to calculate some

Removing duplicate documents from search results

2011-06-23 Thread Pranav Prakash
functionality using Solr? Does Solr has an implied or plugin which could help me with it? *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google <http://www.google.com/profiles/pranny>

Questions about Solr MLTHanlder, performance, Indexes

2011-06-20 Thread Pranav Prakash
Hi folks, I am new to Solr, and using it for web application. I have been experimenting with it and have a couple of doubts which I was unable to resolve by Google. Our portal allows users to upload content and the fields we use are - title, description, transcript, tags. Now each of the content h