Re: how to set cookie for url requesting in stream_url

2011-04-07 Thread satya swaroop
Hi All,
 I was able to set the cookie value to the Stream_url connection, i was
able to pass the cookie value upto contentstreamBase.URLStream class and i
added
conn.setRequestProperty("Cookie",cookie[0].name"="cookie[0].value) in the
connection setup.. and it is working fine now...

Regards,
satya


Re: Solr Php Client

2011-04-07 Thread Haspadar
I'm entering only a query parameter.
I posted a bug description there - http://pecl.php.net/bugs/bug.php?id=22634


2011/4/8 Israel Ekpo 

> Hi,
>
> Could you send the enter list of parameters you are ending to solr via the
> SolrClient and SolrQuery object?
>
> Please open a bug request here with the details
>
> http://pecl.php.net/bugs/report.php?package=solr
>
> On Thu, Apr 7, 2011 at 7:59 PM, Haspadar  wrote:
>
> > Hello
> > I updated Solr to version 3.1 on my project. And now when the application
> > calls getResponse () method (PECL extension) I get the following:
> > "Fatal error: Uncaught exception 'SolrException' with message 'Error
> > un-serializing response' in /home/.../Adapter/Solr.php: 78"
> >
> > How can I fix it?
> >
> > Thanks
> >
>
>
>
> --
> °O°
> "Good Enough" is not good enough.
> To give anything less than your best is to sacrifice the gift.
> Quality First. Measure Twice. Cut Once.
> http://www.israelekpo.com/
>


Re: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-07 Thread Andrea Campi
On Fri, Apr 8, 2011 at 6:23 AM, Jens Mueller wrote:

> Hello all,
>
> thanks for your generous help.
>
> I think I now know everything:  (What I want to do is to build a web
> crawler
> and index the documents found). I will start with the setup as suggested by
>
>
Write a web crawler from scratch is... ambitious.
Have you looked at Nutch (http://nutch.apache.org/)?  It uses Solr for
indexing, it may help you get a head start.
If you've never used Hadoop before it may take some getting used to, but I
have helped a customer implement it and helped a couple of their devs
(medium-seniority) get up to speed, and it didn't take them too long to get
used to it.

Andrea


Re: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-07 Thread Jens Mueller
Hello all,

thanks for your generous help.

I think I now know everything:  (What I want to do is to build a web crawler
and index the documents found). I will start with the setup as suggested by
Ephraim (Several sharded masters, each with at least one slave for reads and
some aggregators for querying). This is only a prototype to learn more...

And the Google PDF from Walter is very interesting, that is something that I
can then try if I hit the limits with the setup above.  But before that, I
have to learn much more about all this indexing / index building and
solr/lucene stuff.

Thanks again for your help!!
best regards
jens



2011/4/7 Walter Underwood 

> On Apr 6, 2011, at 10:29 PM, Jens Mueller wrote:
>
> > Walter, thanks for the advice: Well you are right, mentioning google. My
> > question was also to understand how such large systems like
> google/facebook
> > are actually working. So my numbers are just theoretical and made up. My
> > system will be smaller,  but I would be very happy to understand how such
> > large systems are build and I think the approach Ephraim showd should be
> > working quite well at large scale.
>
> Understanding what Google does will NOT help you build your engine. Just
> like understanding a F1 race car does not help you build a Toyota Camry. One
> is built for performance only, and requires LOTS of support, the other for
> supportability and stability. Very different engineering goals and designs.
>
> Here is one view of Google's search setup:
> http://www.linesave.co.uk/google_search_engine.html
>
> This talk gives a lot more detail. Summary in the blog post, slides in the
> PDF. Google's search is entirely in-memory. They load off disk and run.
>
> http://glinden.blogspot.com/2009/02/jeff-dean-keynote-at-wsdm-2009.html
> http://research.google.com/people/jeff/WSDM09-keynote.pdf
>
> How big will your system be? Does it require real-time updates?
>
> wunder
> --
> Walter Underwood
> Lead Engineer, MarkLogic
>
>


Re: How to index PDF file stored in SQL Server 2008

2011-04-07 Thread Roy Liu
Thanks Lance,

I'm using Solr 1.4.
If I want to using TikaEP, need to upgrade to Solr 3.1 or import jar files?

Best Regards,
Roy Liu


On Fri, Apr 8, 2011 at 10:22 AM, Lance Norskog  wrote:

> You need the TikaEntityProcessor to unpack the PDF image. You are
> sticking binary blobs into the index. Tika unpacks the text out of the
> file.
>
> TikaEP is not in Solr 1.4, but it is in the new Solr 3.1 release.
>
> On Thu, Apr 7, 2011 at 7:14 PM, Roy Liu  wrote:
> > Hi,
> >
> > I have a table named *attachment *in MS SQL Server 2008.
> >
> > COLUMNTYPE
> > - 
> > id   int
> > titlevarchar(200)
> > attachment image
> >
> > I need to index the attachment(store pdf files) column from database via
> > DIH.
> >
> > After access this URL, it returns "Indexing completed. Added/Updated: 5
> > documents. Deleted 0 documents."
> > http://localhost:8080/solr/dataimport?command=full-import
> >
> > However, I can not search anything.
> >
> > Anyone can help me ?
> >
> > Thanks.
> >
> >
> > 
> > *data-config-sql.xml*
> > 
> >   >  driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> >  url="jdbc:sqlserver://localhost:1433;databaseName=master"
> >  user="user"
> >  password="pw"/>
> >  
> > >query="select id,title,attachment from attachment">
> >
> >  
> > 
> >
> > *schema.xml*
> > 
> >
> >
> >
> > Best Regards,
> > Roy Liu
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Re: How to index PDF file stored in SQL Server 2008

2011-04-07 Thread Lance Norskog
You need the TikaEntityProcessor to unpack the PDF image. You are
sticking binary blobs into the index. Tika unpacks the text out of the
file.

TikaEP is not in Solr 1.4, but it is in the new Solr 3.1 release.

On Thu, Apr 7, 2011 at 7:14 PM, Roy Liu  wrote:
> Hi,
>
> I have a table named *attachment *in MS SQL Server 2008.
>
> COLUMN    TYPE
> -     
> id               int
> title            varchar(200)
> attachment image
>
> I need to index the attachment(store pdf files) column from database via
> DIH.
>
> After access this URL, it returns "Indexing completed. Added/Updated: 5
> documents. Deleted 0 documents."
> http://localhost:8080/solr/dataimport?command=full-import
>
> However, I can not search anything.
>
> Anyone can help me ?
>
> Thanks.
>
>
> 
> *data-config-sql.xml*
> 
>                driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
>              url="jdbc:sqlserver://localhost:1433;databaseName=master"
>              user="user"
>              password="pw"/>
>  
>                query="select id,title,attachment from attachment">
>    
>  
> 
>
> *schema.xml*
> 
>
>
>
> Best Regards,
> Roy Liu
>



-- 
Lance Norskog
goks...@gmail.com


How to index PDF file stored in SQL Server 2008

2011-04-07 Thread Roy Liu
Hi,

I have a table named *attachment *in MS SQL Server 2008.

COLUMNTYPE
- 
id   int
titlevarchar(200)
attachment image

I need to index the attachment(store pdf files) column from database via
DIH.

After access this URL, it returns "Indexing completed. Added/Updated: 5
documents. Deleted 0 documents."
http://localhost:8080/solr/dataimport?command=full-import

However, I can not search anything.

Anyone can help me ?

Thanks.



*data-config-sql.xml*

  
  


  


*schema.xml*




Best Regards,
Roy Liu


Re: Solr Php Client

2011-04-07 Thread Israel Ekpo
Hi,

Could you send the enter list of parameters you are ending to solr via the
SolrClient and SolrQuery object?

Please open a bug request here with the details

http://pecl.php.net/bugs/report.php?package=solr

On Thu, Apr 7, 2011 at 7:59 PM, Haspadar  wrote:

> Hello
> I updated Solr to version 3.1 on my project. And now when the application
> calls getResponse () method (PECL extension) I get the following:
> "Fatal error: Uncaught exception 'SolrException' with message 'Error
> un-serializing response' in /home/.../Adapter/Solr.php: 78"
>
> How can I fix it?
>
> Thanks
>



-- 
°O°
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Hook to do stuff when searcher is reopened?

2011-04-07 Thread Sujit Pal
Thanks Erick. This looks like it would work... I sent out an update to
my original query, there is another approach that would probably also
work for my case that is being used by SpellCheckerComponent.

I will check out both approaches.

Thanks very much for your help.

-sujit

On Thu, 2011-04-07 at 20:58 -0400, Erick Erickson wrote:
> I haven't built one myself, but have you considered the Solr
> UserCache?
> See: http://wiki.apache.org/solr/SolrCaching#User.2BAC8-Generic_Caches
> 
> 
> It even receives warmup signals I believe...
> 
> 
> Best
> Erick
> 
> On Thu, Apr 7, 2011 at 7:39 PM, Sujit Pal 
> wrote:
> Hi,
> 
> I am developing a SearchComponent that needs to build some
> initial
> DocSets and then intersect with the result DocSet during each
> query (in
> process()).
> 
> When the searcher is reopened, I need to regenerate the
> initial DocSets.
> 
> I am on Solr 1.4.1.
> 
> My question is, which method in SearchComponent should I
> override to
> ensure that this regeneration happens whenever the searcher is
> reopened
> (for example in response to an update followed by a commit)?
> 
> If no such hook method exists, how would this need to be done?
> 
> Thanks
> Sujit
> 
> 
> 
> 



Re: Hook to do stuff when searcher is reopened?

2011-04-07 Thread Sujit Pal
I think I found the answer by looking through the code...specifically
SpellCheckComponent.

So my component would have to implement SolrCoreAware and in the
inform() method, register a custom SolrEventListener which will execute
the regeneration code in the postCommit and newSearcher methods.

Would still appreciate knowing if there is a simpler way, or if I am
wildly off the mark.

Thanks
Sujit

On Thu, 2011-04-07 at 16:39 -0700, Sujit Pal wrote:
> Hi,
> 
> I am developing a SearchComponent that needs to build some initial
> DocSets and then intersect with the result DocSet during each query (in
> process()).
> 
> When the searcher is reopened, I need to regenerate the initial DocSets.
> 
> I am on Solr 1.4.1.
> 
> My question is, which method in SearchComponent should I override to
> ensure that this regeneration happens whenever the searcher is reopened
> (for example in response to an update followed by a commit)?
> 
> If no such hook method exists, how would this need to be done?
> 
> Thanks
> Sujit
> 
> 



Re: Hook to do stuff when searcher is reopened?

2011-04-07 Thread Erick Erickson
I haven't built one myself, but have you considered the Solr UserCache?
See: http://wiki.apache.org/solr/SolrCaching#User.2BAC8-Generic_Caches

It even receives warmup signals I believe...

Best
Erick

On Thu, Apr 7, 2011 at 7:39 PM, Sujit Pal  wrote:

> Hi,
>
> I am developing a SearchComponent that needs to build some initial
> DocSets and then intersect with the result DocSet during each query (in
> process()).
>
> When the searcher is reopened, I need to regenerate the initial DocSets.
>
> I am on Solr 1.4.1.
>
> My question is, which method in SearchComponent should I override to
> ensure that this regeneration happens whenever the searcher is reopened
> (for example in response to an update followed by a commit)?
>
> If no such hook method exists, how would this need to be done?
>
> Thanks
> Sujit
>
>
>


Solr Php Client

2011-04-07 Thread Haspadar
Hello
I updated Solr to version 3.1 on my project. And now when the application
calls getResponse () method (PECL extension) I get the following:
"Fatal error: Uncaught exception 'SolrException' with message 'Error
un-serializing response' in /home/.../Adapter/Solr.php: 78"

How can I fix it?

Thanks


Hook to do stuff when searcher is reopened?

2011-04-07 Thread Sujit Pal
Hi,

I am developing a SearchComponent that needs to build some initial
DocSets and then intersect with the result DocSet during each query (in
process()).

When the searcher is reopened, I need to regenerate the initial DocSets.

I am on Solr 1.4.1.

My question is, which method in SearchComponent should I override to
ensure that this regeneration happens whenever the searcher is reopened
(for example in response to an update followed by a commit)?

If no such hook method exists, how would this need to be done?

Thanks
Sujit




Re: Migrating from solr 1.4.1 to 3.1.0

2011-04-07 Thread Chris Hostetter

: Solr 3.1.0 uses different javabin format from 1.4.1
: So if I use Solrj 1.4.1 jar  , then i get javabin error while saving to
: 3.1.0
: and if I use Solrj 3.1.0 jar , then I get javabin error  while reading the
: document from solr 1.4.1.

you can use the XML format to get portability during the upgrade process 
(solrJ 1.4 can talk to Solr 3.1 using the XML format, and vice-versa) but 
in general your comment scares me -- reading from one solr instance as the 
input to indexing in another solr instance will only work if *every* field 
in your old index was stored.

if you are in that situation, then go ahead -- but be careful if you are 
not in that situation, you will loose any fields that were indexed but not 
stored.

-Hoss


Re: Tips for getting unique results?

2011-04-07 Thread Erick Erickson
I think you can specify the in-group sort, and specify a very small number
(perhaps
even one) to go in each group. But you'd have to store the length of each
body and sort by that.

I'm pretty sure grouping is trunk-only.

The problem here is getting something that applies
just within the group and not across groups... I'm not sure how to tackle
that
other than perhaps the grouping idea...

Best
Erick

On Thu, Apr 7, 2011 at 6:36 PM, Peter Spam  wrote:

> Would grouping solve this?  I'd rather not move to a pre-release solr ...
>
> To clarify the problem:
>
> The data are fine and not duplicated - however, I want to analyze the data,
> and summarize one field (kind of like faceting), to understand what the
> largest value is.
>
> For example:
>
> Document 1:   label=1A1A1; body="adfasdfadsfasf"
> Document 2:   label=5A1B1; body="adfaasdfasdfsdfadsfasf"
> Document 3:   label=1A1A1; body="adasdfasdfasdffaasdfasdfsdfadsfasf"
> Document 4:   label=7A1A1; body="azxzxcvdfaasdfasdfsdfadsfasf"
> Document 5:   label=7A1A1; body="azxzxcvdfaasdfasdfsdasdafadsfasf"
> Document 6:   label=5A1B1; body="adfaasdfasdfsdfadsfasfzzz"
>
> How do I get back just ONE of the largest "label" item?
>
> In other words, what query will return the 7A1A1 label just once?  If I
> search for q=* and sort the results, it works, except I get back multiple
> hits for each label.  If I do a facet, I can only sort by increasing order,
> when what I want is decreasing order.
>
>
> -Peter
>
> On Apr 7, 2011, at 10:02 AM, Erick Erickson wrote:
>
> > What version of Solr are you using? And, assuming the version that
> > has it in, have you seen grouping?
> >
> > Which is another way of asking why you want to do this, perhaps it's an
> > XY problem
> >
> > Best
> > Erick
> >
> > On Thu, Apr 7, 2011 at 1:13 AM, Peter Spam  wrote:
> >
> >> Hi,
> >>
> >> I have documents with a field that has "1A2B3C" alphanumeric characters.
>  I
> >> can query for * and sort results based on this field, however I'd like
> to
> >> "uniq" these results (remove duplicates) so that I can get the 5 largest
> >> unique values.  I can't use the StatsComponent because my values have
> >> letters in them too.
> >>
> >> Faceting (and ignoring the counts) gets me half of the way there, but I
> can
> >> only sort ascending.  If I could also sort facet results descending, I'd
> be
> >> done.  I'd rather not return all documents and just parse the last few
> >> results to work around this.
> >>
> >> Any ideas?
> >>
> >>
> >> -Pete
> >>
>
>


Re: Tips for getting unique results?

2011-04-07 Thread Peter Spam
Would grouping solve this?  I'd rather not move to a pre-release solr ...

To clarify the problem:

The data are fine and not duplicated - however, I want to analyze the data, and 
summarize one field (kind of like faceting), to understand what the largest 
value is.

For example:

Document 1:   label=1A1A1; body="adfasdfadsfasf"
Document 2:   label=5A1B1; body="adfaasdfasdfsdfadsfasf"
Document 3:   label=1A1A1; body="adasdfasdfasdffaasdfasdfsdfadsfasf"
Document 4:   label=7A1A1; body="azxzxcvdfaasdfasdfsdfadsfasf"
Document 5:   label=7A1A1; body="azxzxcvdfaasdfasdfsdasdafadsfasf"
Document 6:   label=5A1B1; body="adfaasdfasdfsdfadsfasfzzz"

How do I get back just ONE of the largest "label" item?

In other words, what query will return the 7A1A1 label just once?  If I search 
for q=* and sort the results, it works, except I get back multiple hits for 
each label.  If I do a facet, I can only sort by increasing order, when what I 
want is decreasing order.


-Peter

On Apr 7, 2011, at 10:02 AM, Erick Erickson wrote:

> What version of Solr are you using? And, assuming the version that
> has it in, have you seen grouping?
> 
> Which is another way of asking why you want to do this, perhaps it's an
> XY problem
> 
> Best
> Erick
> 
> On Thu, Apr 7, 2011 at 1:13 AM, Peter Spam  wrote:
> 
>> Hi,
>> 
>> I have documents with a field that has "1A2B3C" alphanumeric characters.  I
>> can query for * and sort results based on this field, however I'd like to
>> "uniq" these results (remove duplicates) so that I can get the 5 largest
>> unique values.  I can't use the StatsComponent because my values have
>> letters in them too.
>> 
>> Faceting (and ignoring the counts) gets me half of the way there, but I can
>> only sort ascending.  If I could also sort facet results descending, I'd be
>> done.  I'd rather not return all documents and just parse the last few
>> results to work around this.
>> 
>> Any ideas?
>> 
>> 
>> -Pete
>> 



Re: class not found

2011-04-07 Thread Ahmet Arslan
> The jar containing the class is in
> here:
> 
> /usr/local/apache-tomcat-6.0.20/webapps/solr/WEB-INF/lib
> 

http://wiki.apache.org/solr/SolrPlugins#How_to_Load_Plugins




Re: Queries with undetermined field count

2011-04-07 Thread Erick Erickson
One possibility is to have just a "groups" field with a positionIncrementGap
of, say, 100.
that is multiValued.

Now, index values like

"group1 foo bar happy joy joy"
"group2 some more words to search"
etc.

Now do phrase queries with a slop of less than 100. Then searches like
groups:"group1 more"~99 would not match because the gap is greater
than the slop.

Of course this works better if the values are single tokens and you can
index
values like
group1 foo
group1 bar
group1 happy
group1 joy
group2 some

with the same increment trick. In that case, the slop could just be, say, 2
and the
increment gap 10 or some such.

Best
Erick

On Thu, Apr 7, 2011 at 2:18 PM, jisenhart  wrote:

>
> I have a question on how to set up queries not having a predetermined field
> list to search on.
>
> Here are some sample docs,
> 
>   1234
>   hihello
>   lalachika chika boom boom
> 
> 
>   1235
>   foobarhappy happy joy
> joy
>   some textsome more words to
> search
> 
> .
> .
> .
> 
>   4567
>   bedrock
>   memeyou you
>   super duperare we done?
> 
>
> Now a given user user, say fred, belongs to any number of groups, say only
> fred, and group1 for this example.
> A query on 'foo' is easy if I know that fred belongs to only these two:
>
>_fred:foo OR _group1:foo //will find a hit on doc 1235
>
> However, a user can belong to any number of groups. How do I perform such a
> search if the users group list is arbitrarily large?
>
> Could I somehow make use of reference docs like so:
>
> 
>   fred
>   fredgroup1
> 
> .
> .
> .
> 
>   wilma
>name="_groups">wilmagroup1group5group9group11group31group40
> 
>
>


Re: class not found

2011-04-07 Thread Tri Nguyen
The jar containing the class is in here:

/usr/local/apache-tomcat-6.0.20/webapps/solr/WEB-INF/lib

for my setup.

Tri





From: Erick Erickson 
To: solr-user@lucene.apache.org
Sent: Thu, April 7, 2011 3:24:14 PM
Subject: Re: class not found

Can you give us some more details? I suspect the jar file containing
your plugin isn't in the Solr lib directory and/or you don't have a lib
directive in your solrconfig.xml file pointing to where your jar is.

But that's a guess since you haven't provided any information about
what you did to try to use your plugin, like how you deployed it, how
you compiled it, how

Best
Erick

On Thu, Apr 7, 2011 at 4:43 PM, Tri Nguyen  wrote:

> Hi,
>
> I wrote my own parser plugin.
>
> I'm getting a NoClassCefFoundError.  Any ideas why?
>
> Apr 7, 2011 1:12:43 PM org.apache.solr.common.SolrException log
> SEVERE: java.lang.NoClassDefFoundError: Could not initialize class
> org.apache.solr.search.QParserPlugin
>        at org.apache.solr.core.SolrCore.initQParsers(SolrCore.java:1444)
>        at org.apache.solr.core.SolrCore.(SolrCore.java:548)
>        at
> org.apache.solr.core.CoreContainer.create(CoreContainer.java:428)
>        at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278)
>        at
>
>org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
>)
>
>        at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
>        at
>
>org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
>)
>
>        at
>
>org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
>)
>
>        at
>
>org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
>)
>
>        at
>
> 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)
>        at
> org.apache.catalina.core.StandardContext.start(StandardContext.java:4450)
>        at
>
> 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
>        at
> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
>        at
> org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
>        at
> org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:850)
>        at
> org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:724)
>        at
> org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:493)
>        at
> org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206)
>        at
> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314)
>        at
>
>org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
>)
>
>        at
> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
>        at
> org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
>        at
> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
>        at
> org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
>        at
> org.apache.catalina.core.StandardService.start(StandardService.java:516)
>        at
> org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
>        at org.apache.catalina.startup.Catalina.start(Catalina.java:583)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
>
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>)
>
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
>        at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)
>
> Tri


Re: How to Setup Solr Collection Distribution

2011-04-07 Thread Ahmet Arslan
> Date: Friday, April 8, 2011, 1:19 AM
> I have 1 Master, and 3 slaves. The
> master holds the solr index. How do I
> connect the slaves to the master? I have the script in the
> bin folders. I
> have rsyncd installed and snapshooter enabled in the
> master. Thanks, please

HTTP based replication is easier. (solr 1.4)

http://wiki.apache.org/solr/SolrReplication 


Re: class not found

2011-04-07 Thread Tri Nguyen
yes.





From: Ahmet Arslan 
To: solr-user@lucene.apache.org
Sent: Thu, April 7, 2011 3:23:56 PM
Subject: Re: class not found

> I wrote my own parser plugin.
> 
> I'm getting a NoClassCefFoundError.  Any ideas why?

Did you put jar file - that contains you custom code - into /lib directory?
http://wiki.apache.org/solr/SolrPlugins


Re: class not found

2011-04-07 Thread Erick Erickson
Can you give us some more details? I suspect the jar file containing
your plugin isn't in the Solr lib directory and/or you don't have a lib
directive in your solrconfig.xml file pointing to where your jar is.

But that's a guess since you haven't provided any information about
what you did to try to use your plugin, like how you deployed it, how
you compiled it, how

Best
Erick

On Thu, Apr 7, 2011 at 4:43 PM, Tri Nguyen  wrote:

> Hi,
>
> I wrote my own parser plugin.
>
> I'm getting a NoClassCefFoundError.  Any ideas why?
>
> Apr 7, 2011 1:12:43 PM org.apache.solr.common.SolrException log
> SEVERE: java.lang.NoClassDefFoundError: Could not initialize class
> org.apache.solr.search.QParserPlugin
> at org.apache.solr.core.SolrCore.initQParsers(SolrCore.java:1444)
> at org.apache.solr.core.SolrCore.(SolrCore.java:548)
> at
> org.apache.solr.core.CoreContainer.create(CoreContainer.java:428)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278)
> at
>
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
>
> at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
> at
>
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
>
> at
>
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
>
> at
>
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
>
> at
>
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)
> at
> org.apache.catalina.core.StandardContext.start(StandardContext.java:4450)
> at
>
> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
> at
> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
> at
> org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
> at
> org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:850)
> at
> org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:724)
> at
> org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:493)
> at
> org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206)
> at
> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314)
> at
>
> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
>
> at
> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
> at
> org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
> at
> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
> at
> org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
> at
> org.apache.catalina.core.StandardService.start(StandardService.java:516)
> at
> org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
> at org.apache.catalina.startup.Catalina.start(Catalina.java:583)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
> at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)
>
> Tri


Re: class not found

2011-04-07 Thread Ahmet Arslan
> I wrote my own parser plugin.
> 
> I'm getting a NoClassCefFoundError.  Any ideas why?

Did you put jar file - that contains you custom code - into /lib directory?
http://wiki.apache.org/solr/SolrPlugins


Re: 3.1 release

2011-04-07 Thread Ahmet Arslan
> Does this contain the
> CollapseComponent?

No, For FieldCollapsing you need trunk.
 


Re: Indexing pdf files - question.

2011-04-07 Thread Erick Erickson
Did you try the curl commands that Adam suggested as part of this e-mail
thread?
If so, what happened?

Best
Erick

On Wed, Apr 6, 2011 at 7:50 AM, Mike  wrote:

> Hi All,
>
> I am new to solr. I have gone through solr documents to index pdf files,
> But
> it was hard to find the exact procedure to get started.
> I need step by step procedure to do this. Could you please let me know the
> steps to index pdf files.
>
> Thanks,
> Mike
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-pdf-files-question-tp2079505p2784645.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


How to Setup Solr Collection Distribution

2011-04-07 Thread Li Tan
I have 1 Master, and 3 slaves. The master holds the solr index. How do I
connect the slaves to the master? I have the script in the bin folders. I
have rsyncd installed and snapshooter enabled in the master. Thanks, please
help.


Re: Queries with undetermined field count

2011-04-07 Thread Renaud Delbru

Hi,

SIREn [1], a Lucene/Solr plugin, allows you perform queries across an 
undetermined number of fields, even if you have hundred of thousands of 
fields. It might be helpful for your scenario.


[1] http://siren.sindice.com
--
Renaud Delbru

On 07/04/11 19:18, jisenhart wrote:


I have a question on how to set up queries not having a predetermined
field list to search on.

Here are some sample docs,

1234
hihello
lalachika chika boom boom


1235
foobarhappy happy joy
joy
some textsome more words to
search

.
.
.

4567
bedrock
memeyou you
super duperare we done?


Now a given user user, say fred, belongs to any number of groups, say
only fred, and group1 for this example.
A query on 'foo' is easy if I know that fred belongs to only these two:

_fred:foo OR _group1:foo //will find a hit on doc 1235

However, a user can belong to any number of groups. How do I perform
such a search if the users group list is arbitrarily large?

Could I somehow make use of reference docs like so:


fred
fredgroup1

.
.
.

wilma
wilmagroup1group5group9group11group31group40







Re: SOLR support for unicode?

2011-04-07 Thread Chris Hostetter
: 
: Thanks for your response..please find below the schema details corresponding
: to that field..

your message inlcuded nothing but a bunch of blank lines, probably because 
your email editor thought you were trying to type in html (instead of xml)

before diving too deeply into your analyser however, it's improtant to 
sanity check that your servlet container is configured properly, and that 
your client is actaully sending the data encoded properly -- based on your 
description of hte problem it sounds like even the *stored* value of the 
field contains a "?" character, which means that analyzer probably isn't 
hte problem.

the exampledocs directory has a test_utf8.sh script which cna be handy for 
verifying that your servlet container seems to be behaving properly, you 
can also try putting a "TM" symbol in one of the example XML docs and 
index that with post.jar and see if that works for you.

if it does, then odds are your indexing code isn't doing what it should be 
encoding wise.

if using post.jar wit ha simple xml file in UTF still doesn't give you the 
expected outcome, please reply with the output of a query for your 
test doc that uses the "wt=python" param ... the python response writer is 
handy in these cases because it generates escape codes for everything 
outside of the ascii range making it easy to see *exactly* what bytes 
are in those stored fields.

-Hoss


Re: Solr architecture diagram

2011-04-07 Thread Chris Hostetter

: of the components as well as the flow of data and queries. The result is 
: a conceptual architecture diagram, clearly showing how Solr relates to 
: the app-server, how cores relate to a Solr instance, how documents enter 
: through an UpdateRequestHandler, through an UpdateChain and Analysis and 
: into the Lucene index etc.

Looks really good, but two bits that i think might confuse people are 
the implications that a "Query Parser" then invokes a series of search 
components; and that "analysis" (and the pieces of an analyzer chain) 
are what to lookups in the underlying lucene index.

the first might just be the ambiguity of "Query" .. using the term 
"request parser" might make more sense, in comparison to the "update 
parsing" from the other side of hte diagram.

the analysis piece is a little harder to fix cleanly.  you really want the 
end of the analysis chain to feed back up to the searh components, and 
then show it (most of hte search components really) talking to the Lucene 
index.

FWIW: the last time i tried to do an arcitecture diagram for solr was my 
"Beyond the Box" talk a few years back, targeted at people interested in 
writing plugins.  I made my job a lot easier then what you tackled by 
keeping it at the 50,000 foot level where the SOlrRequestHandler was the 
smallest unit of work i described.  From that fiew there are nice 
parallels that can be drawn with more traditional MVC architectures which 
make it a little easier for people to understand...

http://people.apache.org/~hossman/apachecon2008us/btb/
Slides #9 & 10


-Hoss


class not found

2011-04-07 Thread Tri Nguyen
Hi,

I wrote my own parser plugin.

I'm getting a NoClassCefFoundError.  Any ideas why?

Apr 7, 2011 1:12:43 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.solr.search.QParserPlugin
    at org.apache.solr.core.SolrCore.initQParsers(SolrCore.java:1444)
    at org.apache.solr.core.SolrCore.(SolrCore.java:548)
    at org.apache.solr.core.CoreContainer.create(CoreContainer.java:428)
    at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278)
    at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)

    at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
    at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)

    at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)

    at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)

    at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)
    at 
org.apache.catalina.core.StandardContext.start(StandardContext.java:4450)
    at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
    at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
    at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
    at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:850)
    at 
org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:724)
    at 
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:493)
    at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206)
    at 
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314)
    at 
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)

    at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
    at org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
    at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
    at 
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
    at 
org.apache.catalina.core.StandardService.start(StandardService.java:516)
    at 
org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
    at org.apache.catalina.startup.Catalina.start(Catalina.java:583)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
    at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)

Tri

Sourcesense packager

2011-04-07 Thread Mark
How can one change tomcat specific settings such as tomcat-users.xml? I 
can't seem to find any reference to these conf files?


Thanks


Queries with undetermined field count

2011-04-07 Thread jisenhart


I have a question on how to set up queries not having a predetermined 
field list to search on.


Here are some sample docs,

   1234
   hihello
   lalachika chika boom 
boom



   1235
   foobarhappy happy 
joy joy
   some textsome more words to 
search


.
.
.

   4567
   bedrock
   memeyou you
   super duperare we 
done?



Now a given user user, say fred, belongs to any number of groups, say 
only fred, and group1 for this example.
A query on 'foo' is easy if I know that fred belongs to only these 
two:


_fred:foo OR _group1:foo //will find a hit on doc 1235

However, a user can belong to any number of groups. How do I perform 
such a search if the users group list is arbitrarily large?


Could I somehow make use of reference docs like so:


   fred
   fredgroup1

.
.
.

   wilma
   name="_groups">wilmagroup1group5group9group11group31group40





Why does solr keeps creating connections in full import

2011-04-07 Thread tjtong
Why does solr keeps creating connections for each table, even though they are
in the same database? This happened in the process of full-import. I used
one table as the root entity, and join the other tables. But solr keeps
creating database connections for each table. Any one has any idea or had
same problem? Thanks.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-does-solr-keeps-creating-connections-in-full-import-tp2790786p2790786.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing pdf files - question.

2011-04-07 Thread Mike
Hi All,

I am new to solr. I have gone through solr documents to index pdf files, But
it was hard to find the exact procedure to get started.
I need step by step procedure to do this. Could you please let me know the
steps to index pdf files.

Thanks,
Mike

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-pdf-files-question-tp2079505p2784645.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Encoding issue on synonyms.txt

2011-04-07 Thread Siddharth Powar
Hey Robert,

Thanks for the quick response. That helps a lot.

--Sid

On Thu, Apr 7, 2011 at 11:19 AM, Robert Muir  wrote:

> On Thu, Apr 7, 2011 at 2:13 PM, Siddharth Powar
>  wrote:
> > Hey guys,
> >
> > I am in the process of moving to solr3.1 from solr1.4. I am having this
> > issue where solr3.1 now complains about the synonyms.txt file. I get the
> > following error:
> > *org.apache.solr.common.SolrException: Error loading resource (wrong
> > encoding?): synonyms.txt*
> > *
> > *
> > This worked fine before in solr1.4. Not sure what the issue is...
>
> Hi, your synonyms were not working fine before in solr 1.4, they were
> just silently wrong. Its telling you your synonyms file is in the
> wrong encoding (by default solr expects UTF-8), because it contains
> illegal byte sequences.
>


Re: Lucid Works

2011-04-07 Thread Mark

Andrezej,

Thanks for the info. I have a question regarding stability though. How 
are you able to guarantee the stability of this release when 4.0 is 
still a work in progress? I believe the last version Lucid released was 
1.4 so why did you choose to release a 4.x version as opposed to 3.1?


Is the source code including with your distribution so that we may be 
able to do some further patching upon it?


Thanks again and hopefully I'll be joining you at that conference.

On 4/7/11 12:54 PM, Andrzej Bialecki wrote:

On 4/7/11 9:43 PM, Mark wrote:

I noticed that Lucid Works distribution now says is upt to date with 4.X
versions. Does this mean 1.4 or 4.0/trunk?

If its truly 4.0 does that mean it includes the collapse component?


Yes it does.


Also, is the click scoring tools proprietary or was this just a
contrib/patch that was applied?


At the moment it's proprietary. I will have a talk at the Lucene 
Revolution conference that describes the Click tools in detail.




Re: Lucid Works

2011-04-07 Thread Andrzej Bialecki

On 4/7/11 9:43 PM, Mark wrote:

I noticed that Lucid Works distribution now says is upt to date with 4.X
versions. Does this mean 1.4 or 4.0/trunk?

If its truly 4.0 does that mean it includes the collapse component?


Yes it does.


Also, is the click scoring tools proprietary or was this just a
contrib/patch that was applied?


At the moment it's proprietary. I will have a talk at the Lucene 
Revolution conference that describes the Click tools in detail.


--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Lucid Works

2011-04-07 Thread Mark
I noticed that Lucid Works distribution now says is upt to date with 4.X 
versions. Does this mean 1.4 or 4.0/trunk?


If its truly 4.0 does that mean it includes the collapse component? 
Also, is the click scoring tools proprietary or was this just a 
contrib/patch that was applied?


Thanks


Re: Trying to Post. Emails rejected as spam.

2011-04-07 Thread Paul Rogers
Hi Park

I had the same problem.  I noticed one of the issues with the blocked
messages are they are HTML/Rich Text.

(FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FS_REPLICA,
HTML_MESSAGE 
<-,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL

In GMail I can switch to plain text.  This fixed the problem for me.
If you can do the same in Yahoo you should find it reduces the spam
score sufficiently to allow the messages through.

Regards

Paul

On 7 April 2011 20:21, Ezequiel Calderara  wrote:
>
> Happened to me a couple of times, couldn't find a way a workaround...
>
> On Thu, Apr 7, 2011 at 4:14 PM, Parker Johnson  wrote:
>
> >
> > Hello everyone.  Does anyone else have problems posting to the list?  My
> > messages keep getting rejected with this response below.  I'll be surprised
> > if
> > this one makes it through :)
> >
> > -Park
> >
> > Sorry, we were unable to deliver your message to the following address.
> >
> > :
> > Remote  host said: 552 spam score (8.0) exceeded threshold
> >
> > (FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FS_REPLICA,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL
> >  ) [BODY]
> >
> > --- Below this line is a copy of the message.
> >
>
>
>
> --
> __
> Ezequiel.
>
> Http://www.ironicnet.com


Re: Trying to Post. Emails rejected as spam.

2011-04-07 Thread Peter Sturge
This happens almost always because you're sending from a 'free' mail
account (gmail, yahoo, hotmail, etc), and your message contains words
that spam filters don't like.
For me, it was the use of the word 'remplica' (deliberately
mis-spelled so this mail gets sent).

It can also happen from 'non-free' mail servers that have been
successfully attacked by spambots, so that filters give it a really
bad reputation score.


On Thu, Apr 7, 2011 at 8:14 PM, Parker Johnson  wrote:
>
> Hello everyone.  Does anyone else have problems posting to the list?  My
> messages keep getting rejected with this response below.  I'll be surprised if
> this one makes it through :)
>
> -Park
>
> Sorry, we were unable to deliver your message to the following address.
>
> :
> Remote  host said: 552 spam score (8.0) exceeded threshold
> (FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FS_REPLICA,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL
>  ) [BODY]
>
> --- Below this line is a copy of the message.
>


Re: Trying to Post. Emails rejected as spam.

2011-04-07 Thread Marvin Humphrey
On Thu, Apr 07, 2011 at 04:21:25PM -0300, Ezequiel Calderara wrote:
> Happened to me a couple of times, couldn't find a way a workaround...

Note that the property "HTML_MESSAGE" has contributed to the email's spam
score:

> > (FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FS_REPLICA,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL
> >  ) [BODY]

This issue often crops up at Apache.  Sending your messages as plain text
rather than HTML resolves it 99% of the time.

Marvin Humphrey




Re: Trying to Post. Emails rejected as spam.

2011-04-07 Thread Ezequiel Calderara
Happened to me a couple of times, couldn't find a way a workaround...

On Thu, Apr 7, 2011 at 4:14 PM, Parker Johnson  wrote:

>
> Hello everyone.  Does anyone else have problems posting to the list?  My
> messages keep getting rejected with this response below.  I'll be surprised
> if
> this one makes it through :)
>
> -Park
>
> Sorry, we were unable to deliver your message to the following address.
>
> :
> Remote  host said: 552 spam score (8.0) exceeded threshold
>
> (FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FS_REPLICA,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL
>  ) [BODY]
>
> --- Below this line is a copy of the message.
>



-- 
__
Ezequiel.

Http://www.ironicnet.com


Trying to Post. Emails rejected as spam.

2011-04-07 Thread Parker Johnson

Hello everyone.  Does anyone else have problems posting to the list?  My 
messages keep getting rejected with this response below.  I'll be surprised if 
this one makes it through :)

-Park

Sorry, we were unable to deliver your message to the following address.

:
Remote  host said: 552 spam score (8.0) exceeded threshold  
(FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FS_REPLICA,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL
  ) [BODY]

--- Below this line is a copy of the message.


Re: Solr without Server / Search solutions with Solr on DVD (examples?)

2011-04-07 Thread karsten-solr
Hi Ezequiel,

In Solr the performance of sorting and faceted search is mainly a question of 
main memory.
e.g Mike McCandless wrote in s.apache.org/OWK that sorting of 5m wikipedia 
documents by title field need 674 MB of RAM.

But again: My main interest is an example of other companies/product who 
delivered information on DVD with "stand alone" Solr.

Best regards
  Karsten

---Ezequiel

> Try setting a virtual machine and see its performance.
> 
> I'm really not a java guy, so i really don't know how to tune it for
> performance...
> 
> But afaik solr handles pretty well in ram if the index is static...
> 
> On Thu, Apr 7, 2011 at 2:48 PM, Karsten Fissmer 
> wrote:
> 
> > Hi yonik, Hi Ezequiel,
> >
> > Java is no problem for an DVD Version. We already have a DVD version
> with
> > Servlet-Container (but this does currently not use Solr).
> >
> > Some of our customers work in public sector institutions and have less
> then
> > 1gb main memory, but they use MS Word and IE and..
> >
> > But let us say that we can set Xmx384m (we have 14m documents).
> > Xmx384m with 14m UnitsOfRetrieval means e.g. that we do not allow the
> same
> > fields for sorting as on server.
> >
> > My main interest is an example of other companies/product who delivered
> > information on DVD with "stand alone" Solr.
> >
> > Best regards
> >  Karsten
> >
> > > ---yonik
> > > Including a JRE on the DVD and a launch script that uses that JRE by
> > > default should be doable as well.
> > > -Yonik
> > >> Jeffrey
> > >> Even if you can ship your DVD with a jetty server, you'll still need
> > >> JAVA
> > >> installed on the customer machine...
> > >>
> > >>> ---Karsten
> > >>> My question:
> > >>> Does anyone know examples of solutions with Solr starting from DVD?
> > >>> Is there a tutorial for “configure a slow Solr for Computer with
> little
> > main memory”?
> > >>> Any best practice tips from yourself?
> >
> 
> 
> 
> -- 
> __
> Ezequiel.
> 
> Http://www.ironicnet.com


Re: MoreLikeThis match

2011-04-07 Thread Brian Lamb
Actually, what is the difference between "match" and "response"? It seems
that match always returns one result but I've thrown a few cases at it where
the score of the highest response is higher than the score of match. And
then there are cases where the match score dwarfs the highest response
score.

On Thu, Apr 7, 2011 at 1:30 PM, Brian Lamb wrote:

> Hi all,
>
> I've been using MoreLikeThis for a while through select:
>
> http://localhost:8983/solr/select/?q=field:more like
> this&mlt=true&mlt.fl=field&rows=100&fl=*,score
>
> I was looking over the wiki page today and saw that you can also do this:
>
> http://localhost:8983/solr/mlt/?q=field:more like
> this&mlt=true&mlt.fl=field&rows=100
>
> which seems to run faster and do a better job overall. When the results are
> returned, they are formatted like this:
>
> 
>   
> 0
> 1
>   
>   
> 
>   3.0438285
>   5
> 
>   
>   
> 
>   0.1125823
>   3
> 
> 
>   0.10231556
>   8
> 
>  ...
>   
> 
>
> It seems that it always returns just 1 response under match and response is
> set by the rows parameter. How can I get more than one result under match?
>
> What I'm trying to do here is whatever is set for field:, I would like to
> return the top 100 records that match that search based on more like this.
>
> Thanks,
>
> Brian Lamb
>


Re: Encoding issue on synonyms.txt

2011-04-07 Thread Robert Muir
On Thu, Apr 7, 2011 at 2:13 PM, Siddharth Powar
 wrote:
> Hey guys,
>
> I am in the process of moving to solr3.1 from solr1.4. I am having this
> issue where solr3.1 now complains about the synonyms.txt file. I get the
> following error:
> *org.apache.solr.common.SolrException: Error loading resource (wrong
> encoding?): synonyms.txt*
> *
> *
> This worked fine before in solr1.4. Not sure what the issue is...

Hi, your synonyms were not working fine before in solr 1.4, they were
just silently wrong. Its telling you your synonyms file is in the
wrong encoding (by default solr expects UTF-8), because it contains
illegal byte sequences.


UIMA example setup w/o OpenCalais

2011-04-07 Thread Jay Luker
Hi,

I'd would like to experiment with the UIMA contrib package, but I have
issues with the OpenCalais service's ToS and would rather not use it.
Is there a way to adapt the UIMA example setup to use only the
AlchemyAPI service? I tried simply leaving out the OpenCalais api key
but i get exceptions thrown during indexing.

Thanks,
--jay


Encoding issue on synonyms.txt

2011-04-07 Thread Siddharth Powar
Hey guys,

I am in the process of moving to solr3.1 from solr1.4. I am having this
issue where solr3.1 now complains about the synonyms.txt file. I get the
following error:
*org.apache.solr.common.SolrException: Error loading resource (wrong
encoding?): synonyms.txt*
*
*
This worked fine before in solr1.4. Not sure what the issue is...

Thanks in advance for your help guys.


--Sid


Re: Solr without Server / Search solutions with Solr on DVD (examples?)

2011-04-07 Thread Ezequiel Calderara
Try setting a virtual machine and see its performance.

I'm really not a java guy, so i really don't know how to tune it for
performance...

But afaik solr handles pretty well in ram if the index is static...

On Thu, Apr 7, 2011 at 2:48 PM, Karsten Fissmer  wrote:

> Hi yonik, Hi Ezequiel,
>
> Java is no problem for an DVD Version. We already have a DVD version with
> Servlet-Container (but this does currently not use Solr).
>
> Some of our customers work in public sector institutions and have less then
> 1gb main memory, but they use MS Word and IE and..
>
> But let us say that we can set Xmx384m (we have 14m documents).
> Xmx384m with 14m UnitsOfRetrieval means e.g. that we do not allow the same
> fields for sorting as on server.
>
> My main interest is an example of other companies/product who delivered
> information on DVD with "stand alone" Solr.
>
> Best regards
>  Karsten
>
> > ---yonik
> > Including a JRE on the DVD and a launch script that uses that JRE by
> > default should be doable as well.
> > -Yonik
> >> Jeffrey
> >> Even if you can ship your DVD with a jetty server, you'll still need
> >> JAVA
> >> installed on the customer machine...
> >>
> >>> ---Karsten
> >>> My question:
> >>> Does anyone know examples of solutions with Solr starting from DVD?
> >>> Is there a tutorial for “configure a slow Solr for Computer with little
> main memory”?
> >>> Any best practice tips from yourself?
>



-- 
__
Ezequiel.

Http://www.ironicnet.com


3.1 release

2011-04-07 Thread Mark

Does this contain the CollapseComponent?

Will there be a significant performance boost from 1.4?






Re: Solr without Server / Search solutions with Solr on DVD (examples?)

2011-04-07 Thread Karsten Fissmer
Hi yonik, Hi Ezequiel,

Java is no problem for an DVD Version. We already have a DVD version with 
Servlet-Container (but this does currently not use Solr).

Some of our customers work in public sector institutions and have less then 1gb 
main memory, but they use MS Word and IE and..

But let us say that we can set Xmx384m (we have 14m documents).
Xmx384m with 14m UnitsOfRetrieval means e.g. that we do not allow the same 
fields for sorting as on server.

My main interest is an example of other companies/product who delivered 
information on DVD with "stand alone" Solr.

Best regards
  Karsten 

> ---yonik
> Including a JRE on the DVD and a launch script that uses that JRE by
> default should be doable as well.
> -Yonik
>> Jeffrey
>> Even if you can ship your DVD with a jetty server, you'll still need
>> JAVA
>> installed on the customer machine...
>> 
>>> ---Karsten
>>> My question:
>>> Does anyone know examples of solutions with Solr starting from DVD?
>>> Is there a tutorial for “configure a slow Solr for Computer with little 
>>> main memory”?
>>> Any best practice tips from yourself?


MoreLikeThis match

2011-04-07 Thread Brian Lamb
Hi all,

I've been using MoreLikeThis for a while through select:

http://localhost:8983/solr/select/?q=field:more like
this&mlt=true&mlt.fl=field&rows=100&fl=*,score

I was looking over the wiki page today and saw that you can also do this:

http://localhost:8983/solr/mlt/?q=field:more like
this&mlt=true&mlt.fl=field&rows=100

which seems to run faster and do a better job overall. When the results are
returned, they are formatted like this:


  
0
1
  
  

  3.0438285
  5

  
  

  0.1125823
  3


  0.10231556
  8

 ...
  


It seems that it always returns just 1 response under match and response is
set by the rows parameter. How can I get more than one result under match?

What I'm trying to do here is whatever is set for field:, I would like to
return the top 100 records that match that search based on more like this.

Thanks,

Brian Lamb


Re: SOLR support for unicode?

2011-04-07 Thread bbarani
Hi,

Thanks for your response..please find below the schema details corresponding
to that field..



---

Field type details..



   
   

   
   
   


 
 

   
   
   
   
   
 


Thanks,
Barani

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-support-for-unicode-tp2790512p2791151.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tips for getting unique results?

2011-04-07 Thread Erick Erickson
What version of Solr are you using? And, assuming the version that
has it in, have you seen grouping?

Which is another way of asking why you want to do this, perhaps it's an
XY problem

Best
Erick

On Thu, Apr 7, 2011 at 1:13 AM, Peter Spam  wrote:

> Hi,
>
> I have documents with a field that has "1A2B3C" alphanumeric characters.  I
> can query for * and sort results based on this field, however I'd like to
> "uniq" these results (remove duplicates) so that I can get the 5 largest
> unique values.  I can't use the StatsComponent because my values have
> letters in them too.
>
> Faceting (and ignoring the counts) gets me half of the way there, but I can
> only sort ascending.  If I could also sort facet results descending, I'd be
> done.  I'd rather not return all documents and just parse the last few
> results to work around this.
>
> Any ideas?
>
>
> -Pete
>


Re: Different result for the same query?

2011-04-07 Thread Erick Erickson
I'd advise getting a copy of Luke and examining your
indexes. The information you've provided doesn't really
tell us much.

Although I do notice you don't commit in your example code...

Best
Erick

On Thu, Apr 7, 2011 at 10:21 AM, Amel Fraisse wrote:

> Hello every body,
>
> I am using Solr for indexing and searching.
>
> I am using 2 classes for searching document: In the first one I'm
> instanciating a SolrServer to search documents as follows :
>
> server = new EmbeddedSolrServer(
> coreContainer, "");
> server.add(doc);
> query.setQuery("id:"+idDoc);
> server.query(query);
>
> The response contains 2 document.
>
> In the second class I am using SolrCore for indexing and searching (because
> I need 2 indexes) as follows:
>
> servercore2 = new EmbeddedSolrServer(coreContainer, "core2");
> servercore2.add(doc2);
> query.setQuery("id:"+idDoc);
> QueryResponse rsp = servercore2.query(query);
>
>
> The response contains only 1 document.
>
>
> Thank you very much for your help.
>
> Amel.
>


Re: Tips for getting unique results?

2011-04-07 Thread Peter Spam
The data are fine and not duplicated - however, I want to analyze the data, and 
summarize one field (kind of like faceting), to understand what the largest 
value is.

For example:

Document 1:   label=1A1A1; body="adfasdfadsfasf"
Document 2:   label=5A1B1; body="adfaasdfasdfsdfadsfasf"
Document 3:   label=1A1A1; body="adasdfasdfasdffaasdfasdfsdfadsfasf"
Document 4:   label=7A1A1; body="azxzxcvdfaasdfasdfsdfadsfasf"
Document 5:   label=7A1A1; body="azxzxcvdfaasdfasdfsdasdafadsfasf"
Document 6:   label=5A1B1; body="adfaasdfasdfsdfadsfasfzzz"

How do I get back just ONE of the largest "label" item?

In other words, what query will return the 7A1A1 label just once?  If I search 
for q=* and sort the results, it works, except I get back multiple hits for 
each label.  If I do a facet, I can only sort by increasing order, when what I 
want is decreasing order.


-Pete
 
On Apr 6, 2011, at 10:22 PM, Otis Gospodnetic wrote:

> Hi,
> 
> I think you are saying dupes are the main problem?  If so, 
> http://wiki.apache.org/solr/Deduplication ?
> 
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> - Original Message 
>> From: Peter Spam 
>> To: solr-user@lucene.apache.org
>> Sent: Thu, April 7, 2011 1:13:44 AM
>> Subject: Tips for getting unique results?
>> 
>> Hi,
>> 
>> I have documents with a field that has "1A2B3C" alphanumeric  characters.  I 
>> can query for * and sort results based on this field,  however I'd like to 
>> "uniq" these results (remove duplicates) so that I can get  the 5 largest 
>> unique 
>> values.  I can't use the StatsComponent because my  values have letters in 
>> them 
>> too.
>> 
>> Faceting (and ignoring the counts) gets  me half of the way there, but I can 
>> only sort ascending.  If I could also  sort facet results descending, I'd be 
>> done.  I'd rather not return all  documents and just parse the last few 
>> results 
>> to work around this.
>> 
>> Any  ideas?
>> 
>> 
>> -Pete
>> 



Re: SOLR support for unicode?

2011-04-07 Thread Jonathan Rochkind
That's probably an issue of your analyzer.  Can you show us the field 
definition from the schema.xml file, for the field that you are putting 
this text in?


On 4/7/2011 10:37 AM, bbarani wrote:

Hi,

We are trying to index heterogenous data using SOLR, some of the sources
have some unicode characters like Zone™  but SOLR is converting them to
Zone™. Any idea how to resolve this issue?

I am using SOLR on Jetty server...

Thanks,
Barani

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-support-for-unicode-tp2790512p2790512.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-07 Thread Walter Underwood
On Apr 6, 2011, at 10:29 PM, Jens Mueller wrote:

> Walter, thanks for the advice: Well you are right, mentioning google. My
> question was also to understand how such large systems like google/facebook
> are actually working. So my numbers are just theoretical and made up. My
> system will be smaller,  but I would be very happy to understand how such
> large systems are build and I think the approach Ephraim showd should be
> working quite well at large scale. 

Understanding what Google does will NOT help you build your engine. Just like 
understanding a F1 race car does not help you build a Toyota Camry. One is 
built for performance only, and requires LOTS of support, the other for 
supportability and stability. Very different engineering goals and designs.

Here is one view of Google's search setup: 
http://www.linesave.co.uk/google_search_engine.html

This talk gives a lot more detail. Summary in the blog post, slides in the PDF. 
Google's search is entirely in-memory. They load off disk and run.

http://glinden.blogspot.com/2009/02/jeff-dean-keynote-at-wsdm-2009.html
http://research.google.com/people/jeff/WSDM09-keynote.pdf

How big will your system be? Does it require real-time updates?

wunder
--
Walter Underwood
Lead Engineer, MarkLogic



RE: Using MLT feature

2011-04-07 Thread Frederico Azeiteiro
Well at this point I'm more dedicated to the Deduplicate issue.

Using a Min_token_len of 4 I'm getting nice comparison results. MLT returns a 
lot of similar docs that I don't consider similar - even tuning the parameters.

Finishing this issue, I found out that the signature also contains the field 
name meaning that if you wish to signature both title and text fields, your 
signature will be a hash of ("text"+"text value"+"title"+"title value").

In any case, I found that the Hashmap used on the hash algorithm inserts the 
tokens by some hashmap internal sort method that I can't understand :), and so, 
impossible to copy to C# implementation.

Thank you for all your help,
Frederico 


-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: quinta-feira, 7 de Abril de 2011 04:09
To: solr-user@lucene.apache.org
Subject: Re: Using MLT feature

A "fuzzy signature" system will not work here. You are right, you want
to try MLT instead.

Lance

On Wed, Apr 6, 2011 at 9:47 AM, Frederico Azeiteiro
 wrote:
> Yes, I had already check the code for it and use it to compile a c# method 
> that returns the same signature.
>
> But I have a strange issue:
> For instance, using MinTokenLenght=2 and default QUANT_RATE,  passing the 
> text "frederico" (simple text no big deal here):
>
> 1. using my c# app returns "8b92e01d67591dfc60adf9576f76a055"
> 2. using SOLR, passing a doc with HeadLine "frederico" I get 
> "8d9a5c35812ba75b8383d4538b91080f" on my signature field.
> 3. Created a Java app (i'm not a Java expert..), using the code from SOLR 
> SignatureUpdateProcessorFactory class (please check code below) and I get 
> "8b92e01d67591dfc60adf9576f76a055".
>
> Java app code:
>                TextProfileSignature textProfileSignature = new 
> TextProfileSignature();
>                NamedList params = new NamedList();
>                params.add("", "");
>                SolrParams solrParams = SolrParams.toSolrParams(params);
>                textProfileSignature.init(solrParams);
>                textProfileSignature.add("frederico");
>
>
>                byte[] signature =  textProfileSignature.getSignature();
>                char[] arr = new char[signature.length << 1];
>                for (int i = 0; i < signature.length; i++) {
>                        int b = signature[i];
>                        int idx = i << 1;
>                        arr[idx] = StrUtils.HEX_DIGITS[(b >> 4) & 0xf];
>                        arr[idx + 1] = StrUtils.HEX_DIGITS[b & 0xf];
>                }
>                String sigString = new String(arr);
>                System.out.println(sigString);
>
>
>
>
> Here's my processor configs:
>
> 
>      class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
>       true
>       sig
>       false
>       HeadLine
>        name="signatureClass">org.apache.solr.update.processor.TextProfileSignature
>       2
>       
>     
>     
>   
>
>
> So both my apps (Java and C#)  return the same signature but SOLR returns a 
> different one..
> Can anyone understand what I should be doing wrong?
>
> Thank you once again.
>
> Frederico
>
> -Original Message-
> From: Markus Jelsma [mailto:markus.jel...@openindex.io]
> Sent: terça-feira, 5 de Abril de 2011 15:20
> To: solr-user@lucene.apache.org
> Cc: Frederico Azeiteiro
> Subject: Re: Using MLT feature
>
> If you check the code for TextProfileSignature [1] your'll notice the init
> method reading params. You can set those params as you did. Reading Javadoc
> [2] might help as well. But what's not documented in the Javadoc is how QUANT
> is computed; it rounds.
>
> [1]:
> http://svn.apache.org/viewvc/lucene/solr/branches/branch-1.4/src/java/org/apache/solr/update/processor/TextProfileSignature.java?view=markup
> [2]:
> http://lucene.apache.org/solr/api/org/apache/solr/update/processor/TextProfileSignature.html
>
> On Tuesday 05 April 2011 16:10:08 Frederico Azeiteiro wrote:
>> Thank you, I'll try to create a c# method to create the same sig of SOLR,
>> and then compare both sigs before index the doc. This way I can avoid the
>> indexation of existing docs.
>>
>> If anyone needs to use this parameter (as this info is not on the wiki),
>> you can add the option
>>
>> 5
>>
>> On the processor tag.
>>
>> Best regards,
>> Frederico
>>
>>
>> -Original Message-
>> From: Markus Jelsma [mailto:markus.jel...@openindex.io]
>> Sent: terça-feira, 5 de Abril de 2011 12:01
>> To: solr-user@lucene.apache.org
>> Cc: Frederico Azeiteiro
>> Subject: Re: Using MLT feature
>>
>> On Tuesday 05 April 2011 12:19:33 Frederico Azeiteiro wrote:
>> > Sorry, the reply I made yesterday was directed to Markus and not the
>> > list...
>> >
>> > Here's my thoughts on this. At this point I'm a little confused if SOLR
>> > is a good option to find near duplicate docs.
>> >
>> > >> Yes there is, try set overwriteDupes to true and documents yielding
>> >
>> > the same signature will be overwritten
>> >
>> > The problem is that I don't

Re: Solr without Server / Search solutions with Solr on DVD (examples?)

2011-04-07 Thread Yonik Seeley
On Thu, Apr 7, 2011 at 10:28 AM, Jeffrey Chang  wrote:
> Even if you can ship your DVD with a jetty server, you'll still need JAVA
> installed on the customer machine...

Including a JRE on the DVD and a launch script that uses that JRE by
default should be doable as well.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


SOLR support for unicode?

2011-04-07 Thread bbarani
Hi,

We are trying to index heterogenous data using SOLR, some of the sources
have some unicode characters like Zone™  but SOLR is converting them to
Zone™. Any idea how to resolve this issue? 

I am using SOLR on Jetty server...

Thanks,
Barani

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-support-for-unicode-tp2790512p2790512.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr without Server / Search solutions with Solr on DVD (examples?)

2011-04-07 Thread Jeffrey Chang
Even if you can ship your DVD with a jetty server, you'll still need JAVA
installed on the customer machine...

On Thu, Apr 7, 2011 at 10:18 PM, Ezequiel Calderara wrote:

> Can't you just run a jetty server on the background?
>
> But probably some antivirus or antispyware could take that as an tojan or
> something like that.
>
> How many little main memory is? 1gb? less?
>
> I don't think that you are going to have problems above 1gb. The index will
> be static, no changes, no optimizations...
>
> That's my thought
>
> On Thu, Apr 7, 2011 at 11:12 AM,  wrote:
>
> > Hi folks,
> >
> > we want to migrate our search-portal to Solr.
> > But some of our customers search in our informations offline with a
> > DVD-Version.
> > So we want to estimate the complexity of a Solr DVD-Version.
> > This means to trim Solr to work on small computers with the opposite of
> > heavy loads. So no server-optimizations, no Cache, less facet terms in
> > memory...
> >
> > My question:
> > Does anyone know examples of solutions with Solr starting from DVD?
> >
> > Is there a tutorial for “configure a slow Solr for Computer with little
> > main memory”?
> >
> > Any best practice tips from yourself?
> >
> >
> > Best regards
> >  Karsten
> >
>
>
>
> --
> __
> Ezequiel.
>
> Http://www.ironicnet.com 
>


Highlighting and custom fragmenting

2011-04-07 Thread dan sutton
Hi All,

I'd like to make the highlighting work as follows:

length(all snippits) approx. 200 chars
hl.snippits = 2 (2 snippits)

is this possible with the regex fragmenter? or does anyone know of any
contrib fragmenter that might do this?

Many thanks
Dan


Different result for the same query?

2011-04-07 Thread Amel Fraisse
Hello every body,

I am using Solr for indexing and searching.

I am using 2 classes for searching document: In the first one I'm
instanciating a SolrServer to search documents as follows :

server = new EmbeddedSolrServer(
coreContainer, "");
server.add(doc);
query.setQuery("id:"+idDoc);
server.query(query);

The response contains 2 document.

In the second class I am using SolrCore for indexing and searching (because
I need 2 indexes) as follows:

servercore2 = new EmbeddedSolrServer(coreContainer, "core2");
servercore2.add(doc2);
query.setQuery("id:"+idDoc);
QueryResponse rsp = servercore2.query(query);


The response contains only 1 document.


Thank you very much for your help.

Amel.


Re: difference between geospatial search from database angle and from solr angle

2011-04-07 Thread Smiley, David W.
I haven't used PostGIS so I can't offer a real comparison. I think if you were 
to try out both, you'd be impressed with Solr's performance/scalability thanks 
in large part to its sharding.  But for "functionality richness" in so far as 
geospatial is concerned, that's where Solr currently comes short. It just has 
the basic stuff 80% of people want.

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/

On Apr 7, 2011, at 2:24 AM, Sean Bigdatafun wrote:

> Thanks, David.
> 
> I am thinking of a scenario that billions of objects, whose indices are too
> big for a single machine to serve the indexing, to serve the querying. Is
> there any sharding mechanism?
> 
> 
> Can you give a comparison between solr-based geospatial search and PostGIS
> based geospatial search?
>  * scalability
>  * functionality richness
>  * incremental indexing (re-indexing) cost
>  * query cost
>  * sharding scheme support








Re: Solr without Server / Search solutions with Solr on DVD (examples?)

2011-04-07 Thread Ezequiel Calderara
Can't you just run a jetty server on the background?

But probably some antivirus or antispyware could take that as an tojan or
something like that.

How many little main memory is? 1gb? less?

I don't think that you are going to have problems above 1gb. The index will
be static, no changes, no optimizations...

That's my thought

On Thu, Apr 7, 2011 at 11:12 AM,  wrote:

> Hi folks,
>
> we want to migrate our search-portal to Solr.
> But some of our customers search in our informations offline with a
> DVD-Version.
> So we want to estimate the complexity of a Solr DVD-Version.
> This means to trim Solr to work on small computers with the opposite of
> heavy loads. So no server-optimizations, no Cache, less facet terms in
> memory...
>
> My question:
> Does anyone know examples of solutions with Solr starting from DVD?
>
> Is there a tutorial for “configure a slow Solr for Computer with little
> main memory”?
>
> Any best practice tips from yourself?
>
>
> Best regards
>  Karsten
>



-- 
__
Ezequiel.

Http://www.ironicnet.com


Solr without Server / Search solutions with Solr on DVD (examples?)

2011-04-07 Thread karsten-solr
Hi folks,

we want to migrate our search-portal to Solr.
But some of our customers search in our informations offline with a DVD-Version.
So we want to estimate the complexity of a Solr DVD-Version.
This means to trim Solr to work on small computers with the opposite of heavy 
loads. So no server-optimizations, no Cache, less facet terms in memory...

My question:
Does anyone know examples of solutions with Solr starting from DVD?

Is there a tutorial for “configure a slow Solr for Computer with little main 
memory”?

Any best practice tips from yourself?


Best regards
  Karsten


Re: Solr architecture diagram

2011-04-07 Thread David MARTIN
Hi,

Thank you for this contribution. Such a diagram could be useful in the
official documentation.

David

On Thu, Apr 7, 2011 at 12:15 PM, Jeffrey Chang  wrote:

> This is awesome; thank you!
>
> On Thu, Apr 7, 2011 at 6:09 PM, Jan Høydahl  wrote:
>
> > Hi,
> >
> > Glad you liked it. You'd like to model the inner architecture of SolrJ as
> > well, do you? Perhaps that should be a separate diagram.
> >
> > --
> > Jan Høydahl, search solution architect
> > Cominvent AS - www.cominvent.com
> >
> >  On 6. apr. 2011, at 12.06, Stevo Slavić wrote:
> >
> > > Nice, thank you!
> > >
> > > Wish there was something similar or extra to this one depicting where
> > > do SolrJ's CommonsHttpSolrServer and EmbeddedSolrServer fit in.
> > >
> > > Regards,
> > > Stevo.
> > >
> > > On Wed, Apr 6, 2011 at 11:44 AM, Jan Høydahl 
> > wrote:
> > >> Hi,
> > >>
> > >> At Cominvent we've often had the need to visualize the internal
> > architecture of Apache Solr in order to explain both the relationships of
> > the components as well as the flow of data and queries. The result is a
> > conceptual architecture diagram, clearly showing how Solr relates to the
> > app-server, how cores relate to a Solr instance, how documents enter
> through
> > an UpdateRequestHandler, through an UpdateChain and Analysis and into the
> > Lucene index etc.
> > >>
> > >> The drawing is created using Google draw, and the original is shared
> on
> > Google Docs. We have licensed the diagram under the permissive Creative
> > Commons "CC-by" license which lets you use, modify and re-distribute the
> > diagram, even commercially, as long as you attribute us with a link.
> > >>
> > >> Check it out at http://ow.ly/4sOTm
> > >> We'd love your comments
> > >>
> > >> --
> > >> Jan Høydahl, search solution architect
> > >> Cominvent AS - www.cominvent.com
> > >>
> > >>
> >
> >
>


Re: difference between geospatial search from database angle and from solr angle

2011-04-07 Thread Erick Erickson
Have you looked at solr sharding?

Best
Erick

On Thu, Apr 7, 2011 at 2:24 AM, Sean Bigdatafun
wrote:

> Thanks, David.
>
> I am thinking of a scenario that billions of objects, whose indices are too
> big for a single machine to serve the indexing, to serve the querying. Is
> there any sharding mechanism?
>
>
> Can you give a comparison between solr-based geospatial search and PostGIS
> based geospatial search?
>  * scalability
>  * functionality richness
>  * incremental indexing (re-indexing) cost
>  * query cost
>  * sharding scheme support
>
>
>
> On Wed, Apr 6, 2011 at 9:42 PM, David Smiley (@MITRE.org) <
> dsmi...@mitre.org
> > wrote:
>
> > Sean,
> >Geospatial search in Lucene/Solr is of course implemented based on
> > Lucene's underlying index technology. That technology was originally just
> > for text but it's been adapted very successfully for numerics and
> querying
> > ranges too. The only mature geospatial field type in Solr 3.1 is
> LatLonType
> > which under the hood is simply a pair of latitude & longitude numeric
> > fields.  There really isn't anything sophisticated (geospatially
> speaking)
> > in Solr 3.1. I'm not sure what sort of geospatial DB research you have in
> > mind but I would expect other systems would be free to use an indexing
> > strategy designed for spatial such as "R-Trees". Nevertheless, I think
> > Lucene offers the underlying primitives to compete with systems using
> other
> > technologies.  Case in point is my patch SOLR-2155 which indexes a single
> > point in the form of a "geohash" at multiple resolutions (geohash lengths
> > AKA spatial prefixes / grids) and uses a recursive algorithm to
> efficiently
> > query an arbitrary shape.  It's quite fast and bests LatLonType already;
> > and
> > there's a lot more I can do to make it faster.
> >This is definitely a field of interest and a growing one in the
> > Lucene/Solr community.  There are even some external spatial providers
> > (JTeam, MetaCarta) and I'm partnering with other individuals to create a
> > new
> > one.  Expect to see more in the coming months.  If you're looking for
> some
> > specific geospatial capabilities then let us know.
> >
> > ~ David Smiley
> > Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
> >
> > -
> >  Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/difference-between-geospatial-search-from-database-angle-and-from-solr-angle-tp2788442p2788972.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
>
> --
> --Sean
>


Re: solr-2351 patch

2011-04-07 Thread Erick Erickson
Trunk. That's what "next" means in the "fix version" field.

Note that the patch is "as is", it's not guaranteed. The
trunk code may well have moved on so use at your own
risk!

Best
Erick

On Wed, Apr 6, 2011 at 11:44 PM, Isha Garg  wrote:

>
>
> Hi,
> Tell me for which solr version does Patch file
> SOLR-2351(
> https://issues.apache.org/jira/secure/attachment/12470560/mlt.patch)
> fixed for .
>
> Regards!
> Isha
>
>


Re: what happens to docsPending if stop solr before commit

2011-04-07 Thread Erick Erickson
Hmmm, depends on how you stop the server, I was
assuming you did something radical like 'kill -9' (for SHAME )
or the machine crashed or something else horrible...

Koji was covering graceful shutdown, thanks Koji! I hadn't
even considered that

Erick

On Wed, Apr 6, 2011 at 7:19 PM, Robert Petersen  wrote:

> Really?  Great!  I was wondering if there was some cleanup cycle like
> that which would occur upon shutdown.  That sounds like much more
> logical behavior!
>
> -Original Message-
> From: Koji Sekiguchi [mailto:k...@r.email.ne.jp]
> Sent: Wednesday, April 06, 2011 4:03 PM
> To: solr-user@lucene.apache.org
> Subject: Re: what happens to docsPending if stop solr before commit
>
> (11/04/06 5:25), Robert Petersen wrote:
> > I tried to find the answer to this simple question online, but failed.
> > I was wondering about this, what happens to uncommitted docsPending if
> I
> > stop solr and then restart solr?  Are they lost?  Are they still there
> > but still uncommitted?  Do they get committed at startup?  I noticed
> > after a restart my 250K pending doc count went to 0 is what got me
> > wondering.
>
> Robi,
>
> Usually they are never lost, but they are committed.
>
> When you stop Solr, servlet container (Jetty) calls servlets/filters
> destroy() methods. This causes closing all SolrCores. Then
> SolrCore.close()
> calls UpdateHandler.close(). It calls SolrIndexWriter.close(). Then
> pending docs are flushed, then committed.
>
> Koji
> --
> http://www.rondhuit.com/en/
>


Re: Synonym-time Reindexing Issues

2011-04-07 Thread Erick Erickson
OK, see below.

On Wed, Apr 6, 2011 at 6:22 PM, Preston Marshall wrote:

> Reply Inline:
> On Apr 6, 2011, at 8:12 AM, Erick Erickson wrote:
>
> > Hmmm, this should work just fine. Here are my questions.
> >
> > 1> are you absolutely sure that the new synonym file
> > is available when reindexing?
> Not sure what you mean here, solr is running as root, and the file is never
> moved around or anything crazy.
>

Just a sanity check that you're changing the indexing file you think you're
changing. I've sometimes managed to be in the wrong directory, on the wrong
machine, etc.

Hmmm, what happens if you just stop/start the server instead of delete the
index? I'm wondering if the old file is used (assuming *nix here). I have no
evidence this could be the  case but it's an idea.


> > 2> does the sunspot program do anything wonky with
> > the ids? The documents
> > will only be replaced if the IDs are identical.
> Is there a way I can add debugging to show what it's doing with the IDs or
> something to view the index?  I tried using Luke, but I can't get it to
> actually show me the actual data of the objects, only the name and some
> other basic info.
>

The issue is seeing whatever has been defined as the  field. In
the default schema, it's defined as "id". I'm NOT talking about the internal
Lucene ID, it's entirely about what's defined in your schema. Set
stored="true" for fields to see them easily. The point here is that Solr
updates documents based on . If there is no such field,
reindexing your documents will simply add another copy, the original is
still searchable.


> > 3> are you sure that a commit is done at the end?
> It appears that it commits a few times during reindexing.
> > 4> What happens if you optimize? At that point, maxdocs
> > and numdocs should be the same, and should be the count
> > of documents. if they differ by a factor of 2, I'd suspect your
> > id field isn't being used correctly.
> I'm unaware of what you mean by optimizing, or even viewing maxdocs and
> numdocs, but I will RTFM to find out.  I did notice something strange
> earlier though that may relate to this.  When I ran a search there were
> duplicate results.
>

OK, see the  discussion above. It really sounds
like re-indexing the data is
merely adding documents again and again and again, not
replacing the first copy with the second. If this is true, your numDocs
and maxDocs should be nearly equal the first time and grow
by the number of documents you index every time you
reindex. If/when you  is working, you should see
numDocs stay constant and maxDocs go up by the number
of documents you re-index.

Sending an optimize command to the indexer will reclaim all
unused resources and bring numDocs and maxDocs back
to the same value, but this is probably not your problem.

I do see that "id" is the  in your schema. So I'm
guessing, especially because the comment says that this
field is used by sunspot, that the sunspot stuff is creating
a new id for each document when you re-index. If all this is
true, it's an issue with sunspot. So here's what I predict. If you
look at the id field you'll see some sunspot-generated id that's
unique for every added document even if it's a new copy
of an old document, so Solr sees two separate, entirely
unrelated documents. The old one has the old synonyms and
the new one the new list.

The maxDocs/numDocs are available on the admin page, click
the "statistics" link.

Best
Erick


> >
> > If the hypothesis that you id field isn't working correctly, your number
> > of hits should be going up after re-indexing...
> >
> > If none of that is relevant, let us know what you find and we'll
> > try something else
> >
> > Best
> > Erick
> >
> > On Tue, Apr 5, 2011 at 10:46 PM, Preston Marshall <
> pres...@synergyeoc.com>wrote:
> >
> >> Hello all, I am having an issue with Solr and the SynonymFilterFactory.
>  I
> >> am using a library to interface with Solr called "sunspot."  I realize
> that
> >> is not what this list is for, but I believe this may be an issue with
> Solr,
> >> not the library (plus the lib author doesn't know the answer). I am
> using
> >> the SynonymFilterFactory in my index-time analyzer, and it works great.
>  My
> >> only problem is when it comes to changing the synonyms file.  I would
> expect
> >> to be able to edit the file, run a reindex (this is through the
> library),
> >> and have the new synonyms function when the reindex is complete.
> >> Unfortunately this is not the case, as changing the synonyms file
> doesn't
> >> actually affect the search results.  What DOES work is deleting the
> existing
> >> index, and starting from scratch.  This is unacceptable for my usage
> though,
> >> because I need the old index to remain online while the new one is being
> >> built, so there is no downtime.
> >>
> >> Here's my schema in case anyone needs it:
> >> https://gist.github.com/88f8fb763e99abe4d5b8
> >>
> >> Thanks,
> >> Preston
> >>
> >> P.S. Sorry if th

Re: Highlighting not working

2011-04-07 Thread Tom Mortimer
Problem solved. *bangs head on desk*
T

On 7 April 2011 11:33, Tom Mortimer  wrote:
> Hi,
>
> I'm having trouble getting highlighting to work for a large text
> field. This field can be in several languages, so I'm sending it to
> one of several fields configured appropriately (e.g. "cv_text_en") and
> then copying it to a common field for storage and display ("cv_text").
> The relevant fragment of schema.xml looks like:
>
>     stored="false" termVectors="true" termPositions="true"/>
>    ...
>    
>    
>
> At search time I can't get cv_text to be highlighted - it's returned
> in its entirety. Here's the relevant bit of solrconfig.xml (I'm
> qt="all" with the default request handler):
>
>  
>    
>        explicit
>        10
>
>        cv_text_en
>        cv_text_de
>        ...
>
>        on
>        cv_text
>
>     
>
> I've tried playing with other hl. parameters, but have had no luck so
> far. Any ideas?
>
> thanks,
> Tom
>


Re: ClobTransformer Issues

2011-04-07 Thread Shalin Shekhar Mangar
Hi Stephen,

I looked through the Ingres documentation but I don't see why this will
happen. It seems that the column is not being detected as a Clob by the
transformer and Object.toString is being invoked.

[1] - http://community.ingres.com/wiki/Manipulating_SQL_CLOB_data_with_JDBC

On Thu, Apr 7, 2011 at 2:22 AM, Stephen Garvey wrote:

> Hi All,
>
> I'm hoping someone can give me some pointers. I've got Solr 1.4.1 and am
> using DIH to import a table from and Ingres database. The table contains
> a column which is a CLOB type. I've tried to use a CLOB transformer to
> transform the CLOB to a String but the index only contains something
> like INGRES-CLOB:(Loc 10).
>
> Does anyone have any ideas on why the CLOB transformer is not
> transforming this column?
>
> Thanks,
>
> Stephen
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: Highlighting not working

2011-04-07 Thread Ahmet Arslan

> I guess what I'm asking is - can Solr
> highlight non-indexed fields?

http://wiki.apache.org/solr/FieldOptionsByUseCase


Re: Highlighting not working

2011-04-07 Thread Tom Mortimer
I guess what I'm asking is - can Solr highlight non-indexed fields?

Tom


On 7 April 2011 11:33, Tom Mortimer  wrote:
> Hi,
>
> I'm having trouble getting highlighting to work for a large text
> field. This field can be in several languages, so I'm sending it to
> one of several fields configured appropriately (e.g. "cv_text_en") and
> then copying it to a common field for storage and display ("cv_text").
> The relevant fragment of schema.xml looks like:
>
>     stored="false" termVectors="true" termPositions="true"/>
>    ...
>    
>    
>
> At search time I can't get cv_text to be highlighted - it's returned
> in its entirety. Here's the relevant bit of solrconfig.xml (I'm
> qt="all" with the default request handler):
>
>  
>    
>        explicit
>        10
>
>        cv_text_en
>        cv_text_de
>        ...
>
>        on
>        cv_text
>
>     
>
> I've tried playing with other hl. parameters, but have had no luck so
> far. Any ideas?
>
> thanks,
> Tom
>


Highlighting not working

2011-04-07 Thread Tom Mortimer
Hi,

I'm having trouble getting highlighting to work for a large text
field. This field can be in several languages, so I'm sending it to
one of several fields configured appropriately (e.g. "cv_text_en") and
then copying it to a common field for storage and display ("cv_text").
The relevant fragment of schema.xml looks like:


...



At search time I can't get cv_text to be highlighted - it's returned
in its entirety. Here's the relevant bit of solrconfig.xml (I'm
qt="all" with the default request handler):

  

explicit
10

cv_text_en
cv_text_de
...

on
cv_text

 

I've tried playing with other hl. parameters, but have had no luck so
far. Any ideas?

thanks,
Tom


Re: Solr architecture diagram

2011-04-07 Thread Jeffrey Chang
This is awesome; thank you!

On Thu, Apr 7, 2011 at 6:09 PM, Jan Høydahl  wrote:

> Hi,
>
> Glad you liked it. You'd like to model the inner architecture of SolrJ as
> well, do you? Perhaps that should be a separate diagram.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
>  On 6. apr. 2011, at 12.06, Stevo Slavić wrote:
>
> > Nice, thank you!
> >
> > Wish there was something similar or extra to this one depicting where
> > do SolrJ's CommonsHttpSolrServer and EmbeddedSolrServer fit in.
> >
> > Regards,
> > Stevo.
> >
> > On Wed, Apr 6, 2011 at 11:44 AM, Jan Høydahl 
> wrote:
> >> Hi,
> >>
> >> At Cominvent we've often had the need to visualize the internal
> architecture of Apache Solr in order to explain both the relationships of
> the components as well as the flow of data and queries. The result is a
> conceptual architecture diagram, clearly showing how Solr relates to the
> app-server, how cores relate to a Solr instance, how documents enter through
> an UpdateRequestHandler, through an UpdateChain and Analysis and into the
> Lucene index etc.
> >>
> >> The drawing is created using Google draw, and the original is shared on
> Google Docs. We have licensed the diagram under the permissive Creative
> Commons "CC-by" license which lets you use, modify and re-distribute the
> diagram, even commercially, as long as you attribute us with a link.
> >>
> >> Check it out at http://ow.ly/4sOTm
> >> We'd love your comments
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >>
> >>
>
>


Re: Solr architecture diagram

2011-04-07 Thread Jan Høydahl
Hi,

Glad you liked it. You'd like to model the inner architecture of SolrJ as well, 
do you? Perhaps that should be a separate diagram.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 6. apr. 2011, at 12.06, Stevo Slavić wrote:

> Nice, thank you!
> 
> Wish there was something similar or extra to this one depicting where
> do SolrJ's CommonsHttpSolrServer and EmbeddedSolrServer fit in.
> 
> Regards,
> Stevo.
> 
> On Wed, Apr 6, 2011 at 11:44 AM, Jan Høydahl  wrote:
>> Hi,
>> 
>> At Cominvent we've often had the need to visualize the internal architecture 
>> of Apache Solr in order to explain both the relationships of the components 
>> as well as the flow of data and queries. The result is a conceptual 
>> architecture diagram, clearly showing how Solr relates to the app-server, 
>> how cores relate to a Solr instance, how documents enter through an 
>> UpdateRequestHandler, through an UpdateChain and Analysis and into the 
>> Lucene index etc.
>> 
>> The drawing is created using Google draw, and the original is shared on 
>> Google Docs. We have licensed the diagram under the permissive Creative 
>> Commons "CC-by" license which lets you use, modify and re-distribute the 
>> diagram, even commercially, as long as you attribute us with a link.
>> 
>> Check it out at http://ow.ly/4sOTm
>> We'd love your comments
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>> 



Re: Shared conf

2011-04-07 Thread Jan Høydahl
Hi,

This is how I have shared schema between several cores. Also you can use ${} 
syntax in your solrconfig.xml's to reference shared conf files.









--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 7. apr. 2011, at 02.13, Mark wrote:

> Is there a configuration value I can specify for multiple cores to use the 
> same conf directory?
> 
> Thanks



Re: Search Regression Testing

2011-04-07 Thread Mark Mandel
Thanks for the input guys.

I've decided to implement some unit tests for now, although we don't have a
clean data set to work from (sucks, I know).

We're going to keep track of a set of vital queries, and ensure they don't
return 0 results, as we have a pretty decent level of confidence with Solr's
text matching. So not ideal, but better than nothing ;o)

That should find anything that's gone horribly wrong, while at the same time
dealing with our data set changing, and us not having very brittle tests.

Much appreciated,

Mark

On Wed, Apr 6, 2011 at 6:54 PM, Paul Libbrecht  wrote:

> Mark,
>
> In one project, with Lucene not Solr, I also use a smallish unit test
> sample and apply some queries there.
> It is very limited but is automatable.
>
> I find a better way is to have precision and recall measures of real users
> run release after release.
> I could never fully apply this yet on a recurring basis sadly.
>
> My ideal world would be that the search sample is small enough and that
> users are able to restrict search to this.
> Then users have the possibility of checking correctness of each result
> (say, first 10) for each query out of which one can then read results.
> Often, users provide comments along, e.g. missing matches. This is packed as
> a wiki page.
> First samples generally do not use enough of the features, this is adjusted
> as a dialogue.
>
> As a developer I review the test suite run and plan for next adjustments.
> The numeric approach allows easy mean precision and mean recall which is
> good for reporting.
>
> My best reference for PR testing and other forms of testing Kavi Mahesh's
> Text Retrieval Quality, a primer:
> http://www.oracle.com/technetwork/database/enterprise-edition/imt-quality-092464.html
>
> I would love to hear more of what the users have been doing.
>
> paul
>
>
> Le 6 avr. 2011 à 08:10, Mark Mandel a écrit :
>
> > Hey guys,
> >
> > I'm wondering how people are managing regression testing, in particular
> with
> > things like text based search.
> >
> > I.e. if you change how fields are indexed or change boosts in dismax,
> > ensuring that doesn't mean that critical queries are showing bad data.
> >
> > The obvious answer to me was using unit tests. These may be brittle as
> some
> > index data can change over time, but I couldn't think of a better way.
> >
> > How is everyone else solving this problem?
> >
> > Cheers,
> >
> > Mark
> >
> > --
> > E: mark.man...@gmail.com
> > T: http://www.twitter.com/neurotic
> > W: www.compoundtheory.com
> >
> > cf.Objective(ANZ) - Nov 17, 18 - Melbourne Australia
> > http://www.cfobjective.com.au
> >
> > Hands-on ColdFusion ORM Training
> > www.ColdFusionOrmTraining.com
>
>


-- 
E: mark.man...@gmail.com
T: http://www.twitter.com/neurotic
W: www.compoundtheory.com

cf.Objective(ANZ) - Nov 17, 18 - Melbourne Australia
http://www.cfobjective.com.au

Hands-on ColdFusion ORM Training
www.ColdFusionOrmTraining.com


How to index MS SQL Server column with image type

2011-04-07 Thread Roy Liu
Hi all,

When I index a column(image type) of a table  via *
http://localhost:8080/solr/dataimport?command=full-import*
*There is a error like this: String length must be a multiple of four.*

Any help?
Thank you very much.

PS. the attachment includes Chinese character.


*1. data-config.xml*

 

 
   
*   *
   
 


*2. schema.xml*


*3. Database*
*attachment *is a column of table attachment. it's type is IMAGE.


Best Regards,
Roy Liu


Re: Shared conf

2011-04-07 Thread lboutros
You could use the replication to replicate the configuration files :

http://wiki.apache.org/solr/SolrReplication

What do you want to do with your different cores ?

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Shared-conf-tp2787771p2789447.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Trade Mark symbol(TM) in Index

2011-04-07 Thread Markus Jelsma
You opened the same thread this monday and got two replies.

> Hi,
>   Has anyone indexed the data with Trade Mark symbol??...when i tried to
> index, the data appears as below... I want to see the Indexed data with TM
> symbol 
> 
> Indexed Data:
>   79797 - Siebel Research– AI Fund,
>   79797 - Siebel Research– AI Fund,l
> 
> 
> Original Data:
> 79797 - Siebel Research™ AI Fund,
> 
> 
> Please help me to resolve this
> 
> Regards,
> Ravi
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Trade-Mark-symbol-TM-in-Index-tp2789398
> p2789398.html Sent from the Solr - User mailing list archive at Nabble.com.


Solr 3.1.0 WARNING in logs

2011-04-07 Thread Bernd Fehling

Dear all,

while having some warnings in Solr 3.1.0 log files

07.04.2011 09:08:50 org.apache.solr.request.SolrQueryResponse 
WARNING: org.apache.solr.request.SolrQueryResponse is deprecated.
Please use the corresponding class in org.apache.solr.response

I recommend cleaning up /admin/replication/header.jsp and /admin/ping.jsp
to get rid of these warnings with the next Solr 3.1.x release.

Should I write an issue for it?

Best regards,
Bernd


Trade Mark symbol(TM) in Index

2011-04-07 Thread mechravi25
Hi, 
  Has anyone indexed the data with Trade Mark symbol??...when i tried to
index, the data appears as below... I want to see the Indexed data with TM
symbol 

Indexed Data: 
  79797 - Siebel Research– AI Fund,   
  79797 - Siebel Research– AI Fund,l  


Original Data: 
79797 - Siebel Research™ AI Fund, 


Please help me to resolve this 

Regards, 
Ravi  


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Trade-Mark-symbol-TM-in-Index-tp2789398p2789398.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-07 Thread Ephraim Ofir
You can't view it online, but you should be able to download it from:
https://docs.google.com/leaf?id=0BwOEbnJ7oeOrNmU5ZThjODUtYzM5MS00YjRlLWI
2OTktZTEzNDk1YmVmOWU4&hl=en&authkey=COGel4gP

Enjoy,
Ephraim Ofir


-Original Message-
From: Jens Mueller [mailto:supidupi...@googlemail.com] 
Sent: Thursday, April 07, 2011 8:30 AM
To: solr-user@lucene.apache.org
Subject: Re: Very very large scale Solr Deployment = how to do (Expert
Question)?

Hello Ephraim, hello Lance, hello Walter,

thanks for your replies:

Ephraim, thanks very much for the further detailed explanation. I will
try
to setup a demo system in the next few days and use your advice.
LoadBalancers are an important aspect of your design. Can you recommend
one
LB specificallly? (I would be using haproxy.1wt.eu) . I think the Idea
with
uploading your document is very good. However Google-Docs seemed not be
be
working (at least for me with the docx format?), but maybe you can
simply
output the document as PDF and then I think Google Docs is working, so
all
the others can also have a look at your concept. The best approach would
be
if you could upload your advice directly somewhere to the solr wiki as
it is
really helpful.I found some other documents meanwhile, but yours is much
clearer and more complete, with the LBs and the Aggregators (
http://lucene-eurocon.org/slides/Solr-In-The-Cloud_Mark-Miller.pdf)

Lance, thanks I will have a look at what linkedin is doing.

Walter, thanks for the advice: Well you are right, mentioning google. My
question was also to understand how such large systems like
google/facebook
are actually working. So my numbers are just theoretical and made up. My
system will be smaller,  but I would be very happy to understand how
such
large systems are build and I think the approach Ephraim showd should be
working quite well at large scale. If you know a good documents (besides
the
bigtable research paper that I already know) that technically describes
how
google is working in detail that would be of great interest. You seem to
be
working for a company that handles large datasets. Does google use this
approach, sharing the index into N writers, and the procuded index is
then
replicated to N "read only searchers"?

thank you all.
best regards
jens



2011/4/7 Walter Underwood 

> The bigger answer is that you cannot get to this size by just
configuring
> Solr. You may have to invent a lot of stuff. Like all of Google.
>
> Where did you get these numbers? The proposed query rate is twice as
big as
> Google (Feb 2010 estimate, 34K qps).
>
> I work at MarkLogic, and we scale to 100's of terabytes, with fast
update
> and query rates. If you want a real system that handles that, you
might want
> to look at our product.
>
> wunder
>
> On Apr 6, 2011, at 8:06 PM, Lance Norskog wrote:
>
> > I would not use replication. LinkedIn consumer search is a flat
system
> > where one process indexes new entries and does queries
simultaneously.
> > It's a custom Lucene app called Zoie. Their stuff is on Github..
> >
> > I would get documents to indexers via a multicast IP-based queueing
> > system. This scales very well and there's a lot of hardware support.
> >
> > The problem with distributed search is that it is a) inherently
slower
> > and b) has inherently more and longer jitter. The "airplane wing"
> > distribution of query times becomes longer and flatter.
> >
> > This is going to have to be a "federated" system, where the
front-end
> > app aggregates results rather than Solr.
> >
> > On Mon, Apr 4, 2011 at 6:25 PM, Jens Mueller

> wrote:
> >> Hello Experts,
> >>
> >>
> >>
> >> I am a Solr newbie but read quite a lot of docs. I still do not
> understand
> >> what would be the best way to setup very large scale deployments:
> >>
> >>
> >>
> >> Goal (threoretical):
> >>
> >>  A.) Index-Size: 1 Petabyte (1 Document is about 5 KB in Size)
> >>
> >>  B) Queries: 10 Queries/ per Second
> >>
> >>  C) Updates: 10 Updates / per Second
> >>
> >>
> >>
> >>
> >> Solr offers:
> >>
> >> 1.)Replication => Scales Well for B)  BUT  A) and C) are not
> satisfied
> >>
> >>
> >> 2.)Sharding => Scales well for A) BUT B) and C) are not
satisfied
> (=> As
> >> I understand the Sharding approach all goes through a central
server,
> that
> >> dispatches the updates and assembles the quries retrieved from the
> different
> >> shards. But this central server has also some capacity limits...)
> >>
> >>
> >>
> >>
> >> What is the right approach to handle such large deployments? I
would be
> >> thankfull for just a rough sketch of the concepts so I can
> experiment/search
> >> further...
> >>
> >>
> >> Maybe I am missing something very trivial as I think some of the
"Solr
> >> Users/Use Cases" on the homepage are that kind of large
deployments. How
> are
> >> they implemented?
> >>
> >>
> >>
> >> Thanky very much!!!
> >>
> >> Jens
> >>
> >
>
>
>
>
>