date:20101220

RE: solr 4.0 - pagination

2010-12-20 Thread Grijesh.singh


Then what will be when we filter out only some result and want to group ,how
your index time group count will help.

-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-4-0-pagination-tp1812384p2124747.html
Sent from the Solr - User mailing list archive at Nabble.com.

Recap on derived objects in Solr Index, 'schema in a can'

2010-12-20 Thread Dennis Gearon

Based on more searches and manual consolidation, I've put together some of
the ideas for this already suggested in a summary below. The last item in the
summary
seems to be interesting, low technical cost way of doing it.

Basically, it treats the index like a 'BigTable', a la "No SQL".

Erick Erickson pointed out:
"...but there's absolutely no requirement
that all documents in SOLR have the same fields..."

I guess I don't have the right understanding of what goes into a Document
in Solr. Is it just a set of fields, each with it's own independent field type
declaration/id, it's name, and it's content?

So even though there's a schema for an index, one could ignore it and
jsut throw any other named fields and types and content at document addition
time?

So If I wanted to search on a base set, all documents having it, I could then
additionally filter based on the (might be wrong use of this) dynamic fields?

Origninal Thread that I started:

http://lucene.472066.n3.nabble.com/A-schema-inside-a-Solr-Schema-Schema-in-a-can-tt2103260.html

Repeat of the problem, (not actual ratios, numbers, i.e. could be WORSE!):
-

1/ Base object of some kind, x number of fields
2/ Derived objects representing Divisiion in company, different customer bases,
etc.
each having 2 additional, unique fields.
3/ Assume 1000 such derived object types
4/ A 'flattened' Index would have the x base object fields,
and 2000 additional fields

Solutions Posited
---

A/ First thought, muliti-value columns as key pairs.
1/ Difficult to access individual items of more than one 'word' length
for querying in multivalued fields.
2/ All sorts of statistical stuff probably wouldn't apply?
3/ (James Dayer said:) There's also one "gotcha" we've experienced when
searching acrosse
multi-valued fields: SOLR will match across field occurences.
In the example below, if you were to search q=contrib_name:(james
AND smith),
you will get this record back. It matches one name from one
contributor and

another name from a different contributor. This is not what our
users want.

As a work-around, I am converting these to phrase queries with
slop: "james smith"~50 ... Just use a slop # smaller than your
positionIncrementGap

and bigger than the # of terms entered. This will prevent the
cross-field matches

yet allow the words to occur in any order.

The problem with this approach is that Lucene doesn't support
wildcards in phrases
B/ Dynamic fields was suggested, but I am not sure exactly how they
work, and the person who suggested it was not sure it would work,
either.
C/ Different field naming conventions were suggested in field types were
similar.
I can't predict that.
D/ Found this old thread, and i had other suggestions:
1/ Use multiple cores, one for each record type/schema, aggregate them
in
during the query.
2/ Use a fixed number of additional fields X 2. Eatch additional field
is
actually a pair of fields.
The first of the pair gives the colmn name, the second gives the
data.

a) Although I like this, I wonder how many extra fields to use,
b) it was pointed out that relevancy and other statistical
criterial
for queries might suffer.
3/ Index the different objects exactly as they are, i.e. as Erick
Erickson said:
"I'm not entirely sure this is germane, but there's absolutely no
requirement

that all documents in SOLR have the same fields. So it's possible
for
you to

index the "wildly different content" in "wildly different fields"
. Then

searching for screen:LCD would be straightforward."...
Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a
better
idea to learn from others’ mistakes, so you do not have to make them yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

Re: Lower level filtering

2010-12-20 Thread Stephen Green

On Wed, Dec 15, 2010 at 9:57 AM, Stephen Green  wrote:
> Otis pointed out that the patch can't be applied against the current
> source, so I need to go back and make it work with the current source
> (new job = no time).  I'll see if I can find the time this weekend to
> do this.

OK, I just submitted a patch that works against trunk.

Steve
-- 
Stephen Green
http://thesearchguy.wordpress.com

Re: [Nutch] and Solr integration

2010-12-20 Thread Adam Estrada

bin/nutch crawl urls -dir crawl -threads 10 -depth 100 -topN 50 -solrindex
http://localhost:8983/solr

I've run that command before and it worked...that's why I asked.

grab nutch from trunk and run bin/nutch and see that it is in fact an
option. It looks like Hadoop is the culprit now and I am at a loss on how to
fix it.

Thanks for the feedback.
Adam

On Mon, Dec 20, 2010 at 4:21 PM, Anurag  wrote:

>
> why are using solrindex in the argument.? It is used when we need to index
> the crawled data in Solr
> For more read http://wiki.apache.org/nutch/NutchTutorial .
>
> Also for nutch-solr integration this is very useful blog
> http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
> I integrated nutch and solr and it works well.
>
> Thanks
>
> On Tue, Dec 21, 2010 at 1:57 AM, Adam Estrada-2 [via Lucene] <
> ml-node+2122347-622655030-146...@n3.nabble.com
> 
> >
> > wrote:
>
> > All,
> >
> > I have a couple websites that I need to crawl and the following command
> > line
> > used to work I think. Solr is up and running and everything is fine there
> > and I can go through and index the site but I really need the results
> added
> >
> > to Solr after the crawl. Does anyone have any idea on how to make that
> > happen or what I'm doing wrong?  These errors are being thrown fro Hadoop
> > which I am not using at all.
> >
> > $ bin/nutch crawl urls -dir crawl -threads 10 -depth 100 -topN 50
> > -solrindex
> > ht
> > tp://localhost:8983/solr
> > crawl started in: crawl
> > rootUrlDir = http://localhost:8983/solr
> > threads = 10
> > depth = 100
> > indexer=lucene
> > topN = 50
> > Injector: starting at 2010-12-20 15:23:25
> > Injector: crawlDb: crawl/crawldb
> > Injector: urlDir: http://localhost:8983/solr
> > Injector: Converting injected urls to crawl db entries.
> > Exception in thread "main" java.io.IOException: No FileSystem for scheme:
> > http
> > at
> > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375
> > )
> > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
> > at
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
> > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
> > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
> > at
> > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.j
> > ava:169)
> > at
> > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.ja
> > va:201)
> > at
> > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
> >
> > at
> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:7
> > 81)
> > at
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
> >
> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
> > at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)
> >
> >
> > --
> >  View message @
> >
> http://lucene.472066.n3.nabble.com/Nutch-and-Solr-integration-tp2122347p2122347.html
> > To start a new topic under Solr - User, email
> > ml-node+472068-1941297125-146...@n3.nabble.com
> 
> >
> > To unsubscribe from Solr - User, click here<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472068&code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXw0NzIwNjh8LTIwOTgzNDQxOTY=
> >.
> >
> >
>
>
>
> --
> Kumar Anurag
>
>
> -
> Kumar Anurag
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Nutch-and-Solr-integration-tp2122347p2122623.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: shard versus core

2010-12-20 Thread Lance Norskog

2x the index size is required for optimizing.

Things that increase with index size: indexing time, query time and
disk index size. My 500GB index at a previous job worked. Indexing was
a little slow, queries were much slower. What finally made us split it
up was that one binary blob of 500GB was too much to manage: back up,
optimize etc. It was the IT that made it impossible. Lucene & Solr
worked fine.

On Mon, Dec 20, 2010 at 4:53 AM, Tri Nguyen  wrote:
> Thought about it some more and after some reading.  I suppose the answer 
> depends on what kind of response time is expected to be good enough.
>
> I can do some stress testing and see if disk i/o is the bottleneck as the 
> index grows.  I can also look into optimizing/configuring solr parameters to 
> help performance.  One thing I've read is my disk should be at least 2 times 
> the index.
>
>
>
>
> --- On Mon, 12/20/10, Tri Nguyen  wrote:
>
>
> From: Tri Nguyen 
> Subject: Re: shard versus core
> To: solr-user@lucene.apache.org
> Date: Monday, December 20, 2010, 4:04 AM
>
>
> Hi Erick,
>
> Thanks for the explanation.
>
> At which point does the index get too big where sharding is appropriate where 
> it affects performance?
>
> Tri
>
> --- On Sun, 12/19/10, Erick Erickson  wrote:
>
>
> From: Erick Erickson 
> Subject: Re: shard versus core
> To: solr-user@lucene.apache.org
> Date: Sunday, December 19, 2010, 7:36 AM
>
>
> Well, they can be different beasts. First of all, different cores can have
> different schemas, which is not true of shards. Also, shards are almost
> assumed to be running on different machines as a scaling technique,
> whereas it multiple cores are run on a single Solr instance.
>
> So using multiple cores is very similar to running multiple "virtual" Solr
> serves on a single machine, each independent of the other. This can make
> sense if, for instance, you wanted to have a bunch of small indexes all
> on one machine. You could use multiple cores rather than multiple
> instances of Solr. These indexes may or may not have anything to do with
> each other.
>
> Sharding, on the other hand, is almost always used to split a single logical
> index up amongst multiple machines in order to improve performance. The
> assumption usually is that the index is too big to give satisfactory
> performance
> on a single machine, so you'll split it into parts. That assumption really
> implies that it makes no sense to put multiple shards on the #same# machine.
>
> So really, the answer to your question is that you choose the right
> technique
> for the problem you're trying to solve. They aren't really different
> solutions to
> the same problem...
>
> Hope this helps.
> Erick
>
> On Sun, Dec 19, 2010 at 4:07 AM, Tri Nguyen  wrote:
>
>> Hi,
>>
>> Was wondering about  the pro's and con's of using sharding versus cores.
>>
>> An index can be split up to multiple cores or multilple shards.
>>
>> So why one over the other?
>>
>> Thanks,
>>
>>
>> tri
>



-- 
Lance Norskog
goks...@gmail.com

Re: master master, repeaters

2010-12-20 Thread Lance Norskog

Ah, thanks for pointing that out.

Each indexer needs its own marker for "where is new data in this
stream"? This way, when either the primary or secondary starts, it can
restart indexing from where it left off.  The most reliable way to do
this is to search the indexer Solr for its last update.

Another problem is that if the backup has a completely different index
than the primary, when the query servers switch to the backup they
have to download a whole new index.

On Mon, Dec 20, 2010 at 12:45 AM, Upayavira  wrote:
> I've successfully made extensive use of load balancers in sharded,
> replicated slave setups - see [1].
>
> My question is how that might work with a master. You can have a load
> balancer, but you'd need to configure it into a 'fail over but please
> don't fail back' configuration. I'm not sure if that is possible on the
> load balancers we have used. Otherwise, if your master had a five minute
> blip, you could have some content going to your backup, then traffic
> returning to your master, leading to master/backup out of sync and
> content missing from your master index.
>
> It seems to me, unless I am missing something, that while a load
> balancer can be useful, it is only as a part of a larger scheme when it
> comes to master replication. Or am I missing something?
>
> Upayavira
>
> [1] http://www.slideshare.net/sourcesense/sharded-solr-setup-with-master
>
> On Sun, 19 Dec 2010 22:41 -0800, "Lance Norskog" 
> wrote:
>> If you have a load balancer available, that is a much cleaner solution
>> than anything else. After the main indexer comes back, you have to get
>> the current index state to it to start again. But otherwise
>>
>> On Sun, Dec 19, 2010 at 10:39 AM, Upayavira  wrote:
>> >
>> >
>> > On Sun, 19 Dec 2010 10:20 -0800, "Tri Nguyen" 
>> > wrote:
>> >> How do we tell the slaves to point to the new master without modifying
>> >> the config files?  Can we do this while the slave is up, issuing a
>> >> command to it?
>> >
>> > I believe this can be done (details are in
>> > http://wiki.apache.org/solr/SolrReplication), but I've not actually done
>> > it.
>> >
>> > Upayavira
>> >
>> >> --- On Sun, 12/19/10, Upayavira  wrote:
>> >>
>> >>
>> >> From: Upayavira 
>> >> Subject: Re: master master, repeaters
>> >> To: solr-user@lucene.apache.org
>> >> Date: Sunday, December 19, 2010, 10:13 AM
>> >>
>> >>
>> >> We had a (short) thread on this late last week.
>> >>
>> >> Solr doesn't support automatic failover of the master, at least in
>> >> 1.4.1. I've been discussing with my colleague (Tommaso) about ways to
>> >> achieve this.
>> >>
>> >> There's ways we could 'fake it', scripting the following:
>> >>
>> >> * set up a 'backup' master, as a replica of the actual master
>> >> * monitor the master for 'up-ness'
>> >> * if it fails:
>> >>    * tell the master to start indexing to the backup instead
>> >>    * tell the slave(s) to connect to a different master (the backup)
>> >> * then, when the master is back:
>> >>    * wipe its index (backing up dir first?)
>> >>    * configure it to be a backup of the new master
>> >>    * make it pull a fresh index over
>> >>
>> >> But, Jan Høydahl suggested using SolrCloud. I'm going to follow up on
>> >> how that might work in that thread.
>> >>
>> >> Upayavira
>> >>
>> >>
>> >> On Sun, 19 Dec 2010 00:20 -0800, "Tri Nguyen" 
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > In the master-slave configuration, I'm trying to figure out how to
>> >> > configure the
>> >> > system setup for master failover.
>> >> >
>> >> > Does solr support master-master setup?  From my readings, solr does not.
>> >> >
>> >> > I've read about repeaters as well where the slave can act as a master.
>> >> > When the
>> >> > main master goes down, do the other slaves switch to the repeater?
>> >> >
>> >> > Barring better solutions, I'm thinking about putting 2 masters behind  a
>> >> > load
>> >> > balancer.
>> >> >
>> >> > If this is not implemented already, perhaps solr can be updated to
>> >> > support a
>> >> > list of masters for fault tolerance.
>> >> >
>> >> > Tri
>> >>
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>



-- 
Lance Norskog
goks...@gmail.com

Re: A schema inside a Solr Schema (Schema in a can)

2010-12-20 Thread Dennis Gearon

Here is a thread on this subject that I did not find earlier. Sometimes 
discussion, thought, and 'mulling' in the subconcious gets me better Google 
searches.

http://lucene.472066.n3.nabble.com/multi-valued-associated-fields-td811883.html

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Dennis Gearon 
To: solr-user@lucene.apache.org
Sent: Mon, December 20, 2010 10:19:53 AM
Subject: Re: A schema inside a Solr Schema (Schema in a can)

Thanks James.

So being accurate with fields with fields(mulitvalues) is probably not possible 
using all the currently made analyzers.

- Original Message 
From: "Dyer, James" 
To: "solr-user@lucene.apache.org" 
Sent: Mon, December 20, 2010 7:16:43 AM
Subject: RE: A schema inside a Solr Schema (Schema in a can)

Dennis,

If you need to search a key/value pair, you'll have to put them both in the 
same 

field, somehow.  One way is to re-index them using the key in the fieldname.  
For instance, suppose you have:

contributor:  dyer, james
contributor:  smith, sam
role:  author
role:  editor

...but you want to search only for authors, you could index these again with 
fieldnames like:

contrib_author:  dyer, james
contrib_editor:  smith, sam

Then you would query "q=contributor:smtih" to search all contribtors and 
q=contrib_editor:smith just to get editors.

Another way to do it is to use some type of marker character sequence to define 
the "key" and index it like this:

contributor:  dyer, james __author
contributor:  smith, sam  __editor

then you can query like this:  "q=contributor:"smith __editor"~50 ... to search 
only for editors named Smith.

We are not yet fully developed here on SOLR but we currently use both of these 
approaches using a different search engine.  One nice thing SOLR could add to 
this second approach that is not an option with our other system is the 
possibility of writing a custom analyzer that could maybe take some of the 
complexity out of the app.  Not sure exactly how it'd work though...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net] 
Sent: Friday, December 17, 2010 6:52 PM
To: solr-user@lucene.apache.org
Subject: RE: A schema inside a Solr Schema (Schema in a can)

So this is a current usable plugin (except for the latest bug)?

And, is it possible to search jwithin ust one key:value pair in a multivalued 
field? 

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 

idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' 

EARTH has a Right To Life,
  otherwise we all die.

--- On Fri, 12/17/10, Ahmet Arslan  wrote:

> From: Ahmet Arslan 
> Subject: RE: A schema inside a Solr Schema (Schema in a can)
> To: solr-user@lucene.apache.org
> Date: Friday, December 17, 2010, 12:47 PM
> > The problem with this approach
> is that Lucene doesn't
> > support wildcards in phrases.  
> 
> With https://issues.apache.org/jira/browse/SOLR-1604 you can
> do that.
> 
> 
> 
>

RE: solr 4.0 - pagination

2010-12-20 Thread phpcip


Well, right now, I'm using SOLR in a LOT of my projects.
I'm VERY fond of it, proud of it and VERY happy that such a team exists to
make it work.

Of course the pagination issue is a bit frustrating on the field
collapsing... But... heck... I'm currently de-normalizing my postgresql
database and... I'm just counting the total unique rows using SQL :D

And let SOLR do the rest of the job

So... As a generic idea for the SOLR JAVA expert-people, I think the
counting of groups per each field should be something to be done at indexing
time rather than at query-time.

So basically when you index each document, you compute the grouping thingie
IF the user would so choose to have his field as eligible for grouping
inside his schema.xml... If you guys geather my meaning...

So I would have a field like this:


Or something like that... and the indexer will know to compute the number of
groups that exist for this field, so that number would be available at
query-time without too much stress on the memory or CPU.

Hope this helps

In the meanwhile... just count the total from SQL... And keep your index
in-sync often :D

Cip.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-4-0-pagination-tp1812384p2123255.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime type for JSON

2010-12-20 Thread Emmanuel Bégué

Hello,

When using a writer type of "json", SOLR (1.4.1) sets the content type
header of the response as "text/plain" although it should be
"application/json". This is not a very big problem, but it writes many
warnings in Chrome logs: "Resource interpreted as script but
transferred with MIME type text/plain."

This is documented here:
https://issues.apache.org/jira/browse/SOLR-1123

but it's unclear whether it will be addressed (or, indeed, should be addressed).

Is there another way to force the mime-type? I use Tomcat -> jk
connector -> Apache, and I tried to force the mime-type as a
mod_rewrite directive ([T=application/json]). The Apache rewrite log
says the mime type is changed:

   force filename redirect:/solr-app/search to have MIME-type 'application/json'

but it is, in fact, not (probably because the mime type sent by
Tomcat/SOLR go through...?)

Any ideas?

Thanks,
Regards,
EB

Re: [Nutch] and Solr integration

2010-12-20 Thread Anurag


why are using solrindex in the argument.? It is used when we need to index
the crawled data in Solr
For more read http://wiki.apache.org/nutch/NutchTutorial .

Also for nutch-solr integration this is very useful blog
http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
I integrated nutch and solr and it works well.

Thanks

On Tue, Dec 21, 2010 at 1:57 AM, Adam Estrada-2 [via Lucene] <
ml-node+2122347-622655030-146...@n3.nabble.com
> wrote:

> All,
>
> I have a couple websites that I need to crawl and the following command
> line
> used to work I think. Solr is up and running and everything is fine there
> and I can go through and index the site but I really need the results added
>
> to Solr after the crawl. Does anyone have any idea on how to make that
> happen or what I'm doing wrong?  These errors are being thrown fro Hadoop
> which I am not using at all.
>
> $ bin/nutch crawl urls -dir crawl -threads 10 -depth 100 -topN 50
> -solrindex
> ht
> tp://localhost:8983/solr
> crawl started in: crawl
> rootUrlDir = http://localhost:8983/solr
> threads = 10
> depth = 100
> indexer=lucene
> topN = 50
> Injector: starting at 2010-12-20 15:23:25
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: http://localhost:8983/solr
> Injector: Converting injected urls to crawl db entries.
> Exception in thread "main" java.io.IOException: No FileSystem for scheme:
> http
> at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375
> )
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
> at
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.j
> ava:169)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.ja
> va:201)
> at
> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
>
> at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:7
> 81)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
>
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
> at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)
>
>
> --
>  View message @
> http://lucene.472066.n3.nabble.com/Nutch-and-Solr-integration-tp2122347p2122347.html
> To start a new topic under Solr - User, email
> ml-node+472068-1941297125-146...@n3.nabble.com
> To unsubscribe from Solr - User, click 
> here.
>
>



-- 
Kumar Anurag


-
Kumar Anurag

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Nutch-and-Solr-integration-tp2122347p2122623.html
Sent from the Solr - User mailing list archive at Nabble.com.

[Nutch] and Solr integration

2010-12-20 Thread Adam Estrada

All,

I have a couple websites that I need to crawl and the following command line
used to work I think. Solr is up and running and everything is fine there
and I can go through and index the site but I really need the results added
to Solr after the crawl. Does anyone have any idea on how to make that
happen or what I'm doing wrong?  These errors are being thrown fro Hadoop
which I am not using at all.

$ bin/nutch crawl urls -dir crawl -threads 10 -depth 100 -topN 50 -solrindex
ht
tp://localhost:8983/solr
crawl started in: crawl
rootUrlDir = http://localhost:8983/solr
threads = 10
depth = 100
indexer=lucene
topN = 50
Injector: starting at 2010-12-20 15:23:25
Injector: crawlDb: crawl/crawldb
Injector: urlDir: http://localhost:8983/solr
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" java.io.IOException: No FileSystem for scheme:
http
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375
)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.j
ava:169)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.ja
va:201)
at
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)

at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:7
81)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)

Re: A schema inside a Solr Schema (Schema in a can)

2010-12-20 Thread Dennis Gearon

Thanks James.

So being accurate with fields with fields(mulitvalues) is probably not possible 
using all the currently made analyzers.

- Original Message 
From: "Dyer, James" 
To: "solr-user@lucene.apache.org" 
Sent: Mon, December 20, 2010 7:16:43 AM
Subject: RE: A schema inside a Solr Schema (Schema in a can)

Dennis,

If you need to search a key/value pair, you'll have to put them both in the 
same 
field, somehow.  One way is to re-index them using the key in the fieldname.  
For instance, suppose you have:

contributor:  dyer, james
contributor:  smith, sam
role:  author
role:  editor

...but you want to search only for authors, you could index these again with 
fieldnames like:

contrib_author:  dyer, james
contrib_editor:  smith, sam

Then you would query "q=contributor:smtih" to search all contribtors and 
q=contrib_editor:smith just to get editors.

Another way to do it is to use some type of marker character sequence to define 
the "key" and index it like this:

contributor:  dyer, james __author
contributor:  smith, sam  __editor

then you can query like this:  "q=contributor:"smith __editor"~50 ... to search 
only for editors named Smith.

We are not yet fully developed here on SOLR but we currently use both of these 
approaches using a different search engine.  One nice thing SOLR could add to 
this second approach that is not an option with our other system is the 
possibility of writing a custom analyzer that could maybe take some of the 
complexity out of the app.  Not sure exactly how it'd work though...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net] 
Sent: Friday, December 17, 2010 6:52 PM
To: solr-user@lucene.apache.org
Subject: RE: A schema inside a Solr Schema (Schema in a can)

So this is a current usable plugin (except for the latest bug)?

And, is it possible to search jwithin ust one key:value pair in a multivalued 
field? 

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' 

EARTH has a Right To Life,
  otherwise we all die.

--- On Fri, 12/17/10, Ahmet Arslan  wrote:

> From: Ahmet Arslan 
> Subject: RE: A schema inside a Solr Schema (Schema in a can)
> To: solr-user@lucene.apache.org
> Date: Friday, December 17, 2010, 12:47 PM
> > The problem with this approach
> is that Lucene doesn't
> > support wildcards in phrases.  
> 
> With https://issues.apache.org/jira/browse/SOLR-1604 you can
> do that.
> 
> 
> 
>

Re: Reg blank values ( ) tags in SOLR XML

2010-12-20 Thread Markus Jelsma

No. But why is it a problem? A standard XML parser won't feel the difference.

> Hi,
> 
> In SOLR XML the blank spaces are displayed with just  tags
> 
> Is there a way I can make SOLR XML to display the blank values as
> 
> 
> 
> instead of just
> 
> 
> 
> Also has anyone parsed the blank value tags using SOLRNET before?
> 
> If anyone can help me with my question or provide pointers it would be of
> great help!!!
> 
> Thanks,
> Barani

Re: Syncing 'delta-import' with 'select' query

2010-12-20 Thread Juan Manuel Alvarez

Oops! That seems to be the problem, since I am using 1.4.

Thanks!
Juan M.

On Tue, Dec 14, 2010 at 8:40 PM, Alexey Serba  wrote:
> What Solr version do you use?
>
> It seems that sync flag has been added to 3.1 and 4.0 (trunk) branches
> and not to 1.4
> https://issues.apache.org/jira/browse/SOLR-1721
>
> On Wed, Dec 8, 2010 at 11:21 PM, Juan Manuel Alvarez  
> wrote:
>> Hello everyone!
>> I have been doing some tests, but it seems I can't make the
>> synchronize flag work.
>>
>> I have made two tests:
>> 1) DIH with commit=false
>> 2) DIH with commit=false + commit via Solr XML update protocol
>>
>> And here are the log results:
>> For (1) the command is
>> "/solr/dataimport?command=delta-import&commit=false&synchronous=true"
>> and the first part of the output is:
>>
>> Dec 8, 2010 4:42:51 PM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 
>> QTime=0
>> Dec 8, 2010 4:42:51 PM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/dataimport
>> params={schema=testproject&dbHost=127.0.0.1&dbPassword=fuz10n!&dbName=fzm&commit=false&dbUser=fzm&command=delta-import&projectId=1&synchronous=true&dbPort=5432}
>> status=0 QTime=4
>> Dec 8, 2010 4:42:51 PM org.apache.solr.handler.dataimport.DataImporter
>> doDeltaImport
>> INFO: Starting Delta Import
>> Dec 8, 2010 4:42:51 PM org.apache.solr.handler.dataimport.SolrWriter
>> readIndexerProperties
>> INFO: Read dataimport.properties
>> Dec 8, 2010 4:42:51 PM org.apache.solr.handler.dataimport.DocBuilder doDelta
>> INFO: Starting delta collection.
>> Dec 8, 2010 4:42:51 PM org.apache.solr.handler.dataimport.DocBuilder
>> collectDelta
>>
>>
>> For (2) the commands are
>> "/solr/dataimport?command=delta-import&commit=false&synchronous=true"
>> and "/solr/update?commit=true&waitFlush=true&waitSearcher=true" and
>> the first part of the output is:
>>
>> Dec 8, 2010 4:22:50 PM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 
>> QTime=0
>> Dec 8, 2010 4:22:50 PM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/dataimport
>> params={schema=testproject&dbHost=127.0.0.1&dbPassword=fuz10n!&dbName=fzm&commit=false&dbUser=fzm&command=delta-import&projectId=1&synchronous=true&dbPort=5432}
>> status=0 QTime=1
>> Dec 8, 2010 4:22:50 PM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 
>> QTime=0
>> Dec 8, 2010 4:22:50 PM org.apache.solr.handler.dataimport.DataImporter
>> doDeltaImport
>> INFO: Starting Delta Import
>> Dec 8, 2010 4:22:50 PM org.apache.solr.handler.dataimport.SolrWriter
>> readIndexerProperties
>> INFO: Read dataimport.properties
>> Dec 8, 2010 4:22:50 PM org.apache.solr.update.DirectUpdateHandler2 commit
>> INFO: start 
>> commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
>>
>> In (2) it seems like the commit is being fired before the delta-update 
>> finishes.
>>
>> Am I using the "synchronous" flag right?
>>
>> Thanks in advance!
>> Juan M.
>>
>> On Mon, Dec 6, 2010 at 6:46 PM, Juan Manuel Alvarez  
>> wrote:
>>> Thanks for all the help! It is really appreciated.
>>>
>>> For now, I can afford the parallel requests problem, but when I put
>>> synchronous=true in the delta import, the call still returns with
>>> outdated items.
>>> Examining the log, it seems that the commit operation is being
>>> executed after the operation returns, even when I am using
>>> commit=true.
>>> Is it possible to also execute the commit synchronously?
>>>
>>> Cheers!
>>> Juan M.
>>>
>>> On Mon, Dec 6, 2010 at 4:29 PM, Alexey Serba  wrote:
> When you say "two parallel requests from two users to single DIH
> request handler", what do you mean by "request handler"?
 I mean DIH.

> Are you
> refering to the HTTP request? Would that mean that if I make the
> request from different HTTP sessions it would work?
 No.

 It means that when you have two users that simultaneously changed two
 objects in the UI then you have two HTTP requests to DIH to pull
 changes from the db into Solr index. If the second request comes when
 the first is not fully processed then the second request will be
 rejected. As a result your index would be outdated (w/o the latest
 update) until the next update.

>>>
>>
>

Re: about groups of random results + alphabetical result

2010-12-20 Thread Paula C. Laun : Dataprisma

There's another problem, i'm not sure i was clear: i need these records 
randomic, each level randomic alone. (one level cannot random with another 
level)

Is it possible for the same request?

Um Abraço,

Paula C. Laun : Dataprisma
pa...@dataprisma.com.br
(47) 3035.1868
www.dataprisma.com.br
- Original Message - 
From: "Walter Underwood" 
To: 
Sent: Monday, December 20, 2010 2:31 PM
Subject: Re: about groups of random results + alphabetical result


The problem happens with any common word, not just short words. What happens 
with "Brasil"?

If this was a good way to do search, Solr would already implement it. It is 
not that hard to build. But it is not a good way to do search. I have been 
working on search for almost 15 years, and I hear this idea every year or 
two. Don't do it. Use the QueryElevationComponent for step 1, boots in 
DisMax for steps 2-4, and don't do step 5. People will never scroll down 
that far, besides, phonetic search will match a lot of the documents.

wunder

On Dec 20, 2010, at 8:09 AM, Paula C. Laun : Dataprisma wrote:

> thank you for your help... this search will be published in Portuguese, 
> and
> in this language we can clean up the sentence from words shorter than 3
> characters.
>
> Paula C. Laun : Dataprisma
> pa...@dataprisma.com.br
> (47) 3035.1868
> www.dataprisma.com.br
> - Original Message - 
> From: "Walter Underwood" 
> To: 
> Sent: Monday, December 20, 2010 2:02 PM
> Subject: Re: about groups of random results + alphabetical result
>
>
> You probably do not want this ranking, because any query with a common 
> word,
> like "the", will match most of the corpus in step two.
>
> Instead, use Solr to weight better quality matches more heavily, maybe 4X
> for exact matches, 2X for stemmed matches, and 1X for phonetic matches.
>
> wunder
>
> On Dec 20, 2010, at 4:01 AM, Paula C. Laun : Dataprisma wrote:
>
>> hi. i'm looking for a technology who could have high performance in
>> searching a high amount of data (nearly  10 milion lines in a 
>> convencional
>> database like sql server) and i think PHP running under apache solr is a
>> good choice. i have only a doubt about its possibilities.
>>
>> i need to show in first place: promoted records who have all the terms
>> searched by the user (ordered randomly).
>> in second place: i need to show promoted records who have any term
>> searched
>> by the user (ordered randomly).
>> in third place: i need the promoted records found by the stemming search
>> (ordered randomly).
>> in fourth place: i need the promoted records found by the phonetic search
>> (randomly).
>> in fifth place: the free records ordered alphabeticly.
>>
>> these results need to be paginated.
>>
>> is it possible to do that in the same task?
>>
>> Thanks,
>>
>> Paula
>
>
>
>
>

--
Walter Underwood
Venture ASM, Troop 14, Palo Alto

Reg blank values ( ) tags in SOLR XML

2010-12-20 Thread bbarani


Hi,

In SOLR XML the blank spaces are displayed with just  tags

Is there a way I can make SOLR XML to display the blank values as 

 

instead of just 



Also has anyone parsed the blank value tags using SOLRNET before?

If anyone can help me with my question or provide pointers it would be of
great help!!!

Thanks,
Barani
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Reg-blank-values-str-tags-in-SOLR-XML-tp2121284p2121284.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: about groups of random results + alphabetical result

2010-12-20 Thread Paula C. Laun : Dataprisma

"brasil" will return companies with this word in any part of its name. this 
search (randomic in 4 different levels) is only for promoted records (1 
records to be searched at all). free records (10 milion) are the fifth level 
and will respect the common search mode.

Um Abraço,

Paula C. Laun : Dataprisma
pa...@dataprisma.com.br
(47) 3035.1868
www.dataprisma.com.br
- Original Message - 
From: "Walter Underwood" 
To: 
Sent: Monday, December 20, 2010 2:31 PM
Subject: Re: about groups of random results + alphabetical result


The problem happens with any common word, not just short words. What happens 
with "Brasil"?

If this was a good way to do search, Solr would already implement it. It is 
not that hard to build. But it is not a good way to do search. I have been 
working on search for almost 15 years, and I hear this idea every year or 
two. Don't do it. Use the QueryElevationComponent for step 1, boots in 
DisMax for steps 2-4, and don't do step 5. People will never scroll down 
that far, besides, phonetic search will match a lot of the documents.

wunder

On Dec 20, 2010, at 8:09 AM, Paula C. Laun : Dataprisma wrote:

> thank you for your help... this search will be published in Portuguese, 
> and
> in this language we can clean up the sentence from words shorter than 3
> characters.
>
> Paula C. Laun : Dataprisma
> pa...@dataprisma.com.br
> (47) 3035.1868
> www.dataprisma.com.br
> - Original Message - 
> From: "Walter Underwood" 
> To: 
> Sent: Monday, December 20, 2010 2:02 PM
> Subject: Re: about groups of random results + alphabetical result
>
>
> You probably do not want this ranking, because any query with a common 
> word,
> like "the", will match most of the corpus in step two.
>
> Instead, use Solr to weight better quality matches more heavily, maybe 4X
> for exact matches, 2X for stemmed matches, and 1X for phonetic matches.
>
> wunder
>
> On Dec 20, 2010, at 4:01 AM, Paula C. Laun : Dataprisma wrote:
>
>> hi. i'm looking for a technology who could have high performance in
>> searching a high amount of data (nearly  10 milion lines in a 
>> convencional
>> database like sql server) and i think PHP running under apache solr is a
>> good choice. i have only a doubt about its possibilities.
>>
>> i need to show in first place: promoted records who have all the terms
>> searched by the user (ordered randomly).
>> in second place: i need to show promoted records who have any term
>> searched
>> by the user (ordered randomly).
>> in third place: i need the promoted records found by the stemming search
>> (ordered randomly).
>> in fourth place: i need the promoted records found by the phonetic search
>> (randomly).
>> in fifth place: the free records ordered alphabeticly.
>>
>> these results need to be paginated.
>>
>> is it possible to do that in the same task?
>>
>> Thanks,
>>
>> Paula
>
>
>
>
>

--
Walter Underwood
Venture ASM, Troop 14, Palo Alto

Re: about groups of random results + alphabetical result

2010-12-20 Thread Walter Underwood

The problem happens with any common word, not just short words. What happens 
with "Brasil"?

If this was a good way to do search, Solr would already implement it. It is not 
that hard to build. But it is not a good way to do search. I have been working 
on search for almost 15 years, and I hear this idea every year or two. Don't do 
it. Use the QueryElevationComponent for step 1, boots in DisMax for steps 2-4, 
and don't do step 5. People will never scroll down that far, besides, phonetic 
search will match a lot of the documents.

wunder

On Dec 20, 2010, at 8:09 AM, Paula C. Laun : Dataprisma wrote:

> thank you for your help... this search will be published in Portuguese, and 
> in this language we can clean up the sentence from words shorter than 3 
> characters.
> 
> Paula C. Laun : Dataprisma
> pa...@dataprisma.com.br
> (47) 3035.1868
> www.dataprisma.com.br
> - Original Message - 
> From: "Walter Underwood" 
> To: 
> Sent: Monday, December 20, 2010 2:02 PM
> Subject: Re: about groups of random results + alphabetical result
> 
> 
> You probably do not want this ranking, because any query with a common word, 
> like "the", will match most of the corpus in step two.
> 
> Instead, use Solr to weight better quality matches more heavily, maybe 4X 
> for exact matches, 2X for stemmed matches, and 1X for phonetic matches.
> 
> wunder
> 
> On Dec 20, 2010, at 4:01 AM, Paula C. Laun : Dataprisma wrote:
> 
>> hi. i'm looking for a technology who could have high performance in
>> searching a high amount of data (nearly  10 milion lines in a convencional
>> database like sql server) and i think PHP running under apache solr is a
>> good choice. i have only a doubt about its possibilities.
>> 
>> i need to show in first place: promoted records who have all the terms
>> searched by the user (ordered randomly).
>> in second place: i need to show promoted records who have any term 
>> searched
>> by the user (ordered randomly).
>> in third place: i need the promoted records found by the stemming search
>> (ordered randomly).
>> in fourth place: i need the promoted records found by the phonetic search
>> (randomly).
>> in fifth place: the free records ordered alphabeticly.
>> 
>> these results need to be paginated.
>> 
>> is it possible to do that in the same task?
>> 
>> Thanks,
>> 
>> Paula
> 
> 
> 
> 
> 

--
Walter Underwood
Venture ASM, Troop 14, Palo Alto

Re: about groups of random results + alphabetical result

2010-12-20 Thread Paula C. Laun : Dataprisma

thank you for your help... this search will be published in Portuguese, and 
in this language we can clean up the sentence from words shorter than 3 
characters.

Paula C. Laun : Dataprisma
pa...@dataprisma.com.br
(47) 3035.1868
www.dataprisma.com.br
- Original Message - 
From: "Walter Underwood" 
To: 
Sent: Monday, December 20, 2010 2:02 PM
Subject: Re: about groups of random results + alphabetical result


You probably do not want this ranking, because any query with a common word, 
like "the", will match most of the corpus in step two.

Instead, use Solr to weight better quality matches more heavily, maybe 4X 
for exact matches, 2X for stemmed matches, and 1X for phonetic matches.

wunder

On Dec 20, 2010, at 4:01 AM, Paula C. Laun : Dataprisma wrote:

> hi. i'm looking for a technology who could have high performance in
> searching a high amount of data (nearly  10 milion lines in a convencional
> database like sql server) and i think PHP running under apache solr is a
> good choice. i have only a doubt about its possibilities.
>
> i need to show in first place: promoted records who have all the terms
> searched by the user (ordered randomly).
> in second place: i need to show promoted records who have any term 
> searched
> by the user (ordered randomly).
> in third place: i need the promoted records found by the stemming search
> (ordered randomly).
> in fourth place: i need the promoted records found by the phonetic search
> (randomly).
> in fifth place: the free records ordered alphabeticly.
>
> these results need to be paginated.
>
> is it possible to do that in the same task?
>
> Thanks,
>
> Paula

Re: about groups of random results + alphabetical result

2010-12-20 Thread Walter Underwood

You probably do not want this ranking, because any query with a common word, 
like "the", will match most of the corpus in step two.

Instead, use Solr to weight better quality matches more heavily, maybe 4X for 
exact matches, 2X for stemmed matches, and 1X for phonetic matches.

wunder

On Dec 20, 2010, at 4:01 AM, Paula C. Laun : Dataprisma wrote:

> hi. i'm looking for a technology who could have high performance in 
> searching a high amount of data (nearly  10 milion lines in a convencional 
> database like sql server) and i think PHP running under apache solr is a 
> good choice. i have only a doubt about its possibilities.
> 
> i need to show in first place: promoted records who have all the terms 
> searched by the user (ordered randomly).
> in second place: i need to show promoted records who have any term searched 
> by the user (ordered randomly).
> in third place: i need the promoted records found by the stemming search 
> (ordered randomly).
> in fourth place: i need the promoted records found by the phonetic search 
> (randomly).
> in fifth place: the free records ordered alphabeticly.
> 
> these results need to be paginated.
> 
> is it possible to do that in the same task?
> 
> Thanks,
> 
> Paula

Re: [Reload-Config] not working

2010-12-20 Thread Adam Estrada

This is the response I get...Does it matter that the configuration file is
called something other than data-config.xml? After I get this I still have
to restart the service. I wonder...do I need to commit the change?


 
-

 
-

   0
   520
  
 
-

 
-

   ./solr/conf/dataimporthandler/rss.xml
  
  
   reload-config
   idle
   Configuration Re-loaded sucessfully
   
   This response format is experimental. It is likely
to change in the future.
  



On Sun, Dec 19, 2010 at 11:12 PM, Ahmet Arslan  wrote:

> > http://localhost:8983/solr/select?clean=false&commit=true&qt=%2Fdataimport&command=full-import
> ">Full
> > Import
> > http://localhost:8983/solr/select?clean=false&commit=true&qt=%2Fdataimport&command=reload-config
> ">Reload
> > Configuration
> >
> > All,
> >
> > The links above are meant for me to reload the
> > configuration file after a
> > change is made and the other is to perform the full import.
> > My problem is
> > that The reload-config option does not seem to be working.
> > Am I doing
> > anything wrong? Your expertise is greatly appreciated!
>
> I am sorry, I hit the reply button accidentally.
>
> Are you receiving/checking the message
> Configuration Re-loaded sucessfully
> after the reload?
>
> And are checking that data-config.xml is a valid xml after editing it
> programatically?
>
> And instead of editing data-config.xml file cant you use  variable
> resolver? http://search-lucene.com/m/qYzPk2n86iI&subj
>
>
>
>

RE: A schema inside a Solr Schema (Schema in a can)

2010-12-20 Thread Dyer, James

Dennis,

If you need to search a key/value pair, you'll have to put them both in the 
same field, somehow.  One way is to re-index them using the key in the 
fieldname.  For instance, suppose you have:

contributor:  dyer, james
contributor:  smith, sam
role:  author
role:  editor

...but you want to search only for authors, you could index these again with 
fieldnames like:

contrib_author:  dyer, james
contrib_editor:  smith, sam

Then you would query "q=contributor:smtih" to search all contribtors and 
q=contrib_editor:smith just to get editors.

Another way to do it is to use some type of marker character sequence to define 
the "key" and index it like this:

contributor:  dyer, james __author
contributor:  smith, sam  __editor

then you can query like this:  "q=contributor:"smith __editor"~50 ... to search 
only for editors named Smith.

We are not yet fully developed here on SOLR but we currently use both of these 
approaches using a different search engine.  One nice thing SOLR could add to 
this second approach that is not an option with our other system is the 
possibility of writing a custom analyzer that could maybe take some of the 
complexity out of the app.  Not sure exactly how it'd work though...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net] 
Sent: Friday, December 17, 2010 6:52 PM
To: solr-user@lucene.apache.org
Subject: RE: A schema inside a Solr Schema (Schema in a can)

So this is a current usable plugin (except for the latest bug)?

And, is it possible to search jwithin ust one key:value pair in a multivalued 
field? 

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.

--- On Fri, 12/17/10, Ahmet Arslan  wrote:

> From: Ahmet Arslan 
> Subject: RE: A schema inside a Solr Schema (Schema in a can)
> To: solr-user@lucene.apache.org
> Date: Friday, December 17, 2010, 12:47 PM
> > The problem with this approach
> is that Lucene doesn't
> > support wildcards in phrases.  
> 
> With https://issues.apache.org/jira/browse/SOLR-1604 you can
> do that.
> 
> 
> 
>

Re: shard versus core

2010-12-20 Thread Tri Nguyen

Thought about it some more and after some reading.  I suppose the answer 
depends on what kind of response time is expected to be good enough.

I can do some stress testing and see if disk i/o is the bottleneck as the index 
grows.  I can also look into optimizing/configuring solr parameters to help 
performance.  One thing I've read is my disk should be at least 2 times the 
index.

--- On Mon, 12/20/10, Tri Nguyen  wrote:

From: Tri Nguyen 
Subject: Re: shard versus core
To: solr-user@lucene.apache.org
Date: Monday, December 20, 2010, 4:04 AM

Hi Erick,

Thanks for the explanation.

At which point does the index get too big where sharding is appropriate where 
it affects performance?

Tri

--- On Sun, 12/19/10, Erick Erickson  wrote:

From: Erick Erickson 
Subject: Re: shard versus core
To: solr-user@lucene.apache.org
Date: Sunday, December 19, 2010, 7:36 AM

Well, they can be different beasts. First of all, different cores can have
different schemas, which is not true of shards. Also, shards are almost
assumed to be running on different machines as a scaling technique,
whereas it multiple cores are run on a single Solr instance.

So using multiple cores is very similar to running multiple "virtual" Solr
serves on a single machine, each independent of the other. This can make
sense if, for instance, you wanted to have a bunch of small indexes all
on one machine. You could use multiple cores rather than multiple
instances of Solr. These indexes may or may not have anything to do with
each other.

Sharding, on the other hand, is almost always used to split a single logical
index up amongst multiple machines in order to improve performance. The
assumption usually is that the index is too big to give satisfactory
performance
on a single machine, so you'll split it into parts. That assumption really
implies that it makes no sense to put multiple shards on the #same# machine.

So really, the answer to your question is that you choose the right
technique
for the problem you're trying to solve. They aren't really different
solutions to
the same problem...

Hope this helps.
Erick

On Sun, Dec 19, 2010 at 4:07 AM, Tri Nguyen  wrote:

> Hi,
>
> Was wondering about  the pro's and con's of using sharding versus cores.
>
> An index can be split up to multiple cores or multilple shards.
>
> So why one over the other?
>
> Thanks,
>
>
> tri

Re: shard versus core

2010-12-20 Thread Tri Nguyen

Hi Erick,

Thanks for the explanation.

At which point does the index get too big where sharding is appropriate where 
it affects performance?

Tri

--- On Sun, 12/19/10, Erick Erickson  wrote:

From: Erick Erickson 
Subject: Re: shard versus core
To: solr-user@lucene.apache.org
Date: Sunday, December 19, 2010, 7:36 AM

Well, they can be different beasts. First of all, different cores can have
different schemas, which is not true of shards. Also, shards are almost
assumed to be running on different machines as a scaling technique,
whereas it multiple cores are run on a single Solr instance.

So using multiple cores is very similar to running multiple "virtual" Solr
serves on a single machine, each independent of the other. This can make
sense if, for instance, you wanted to have a bunch of small indexes all
on one machine. You could use multiple cores rather than multiple
instances of Solr. These indexes may or may not have anything to do with
each other.

Sharding, on the other hand, is almost always used to split a single logical
index up amongst multiple machines in order to improve performance. The
assumption usually is that the index is too big to give satisfactory
performance
on a single machine, so you'll split it into parts. That assumption really
implies that it makes no sense to put multiple shards on the #same# machine.

So really, the answer to your question is that you choose the right
technique
for the problem you're trying to solve. They aren't really different
solutions to
the same problem...

Hope this helps.
Erick

On Sun, Dec 19, 2010 at 4:07 AM, Tri Nguyen  wrote:

> Hi,
>
> Was wondering about  the pro's and con's of using sharding versus cores.
>
> An index can be split up to multiple cores or multilple shards.
>
> So why one over the other?
>
> Thanks,
>
>
> tri

about groups of random results + alphabetical result

2010-12-20 Thread Paula C. Laun : Dataprisma

hi. i'm looking for a technology who could have high performance in 
searching a high amount of data (nearly  10 milion lines in a convencional 
database like sql server) and i think PHP running under apache solr is a 
good choice. i have only a doubt about its possibilities.

i need to show in first place: promoted records who have all the terms 
searched by the user (ordered randomly).
in second place: i need to show promoted records who have any term searched 
by the user (ordered randomly).
in third place: i need the promoted records found by the stemming search 
(ordered randomly).
in fourth place: i need the promoted records found by the phonetic search 
(randomly).
in fifth place: the free records ordered alphabeticly.

these results need to be paginated.

is it possible to do that in the same task?

Thanks,

Paula

Re: Dismax score - maximu of any one field?

2010-12-20 Thread Ahmet Arslan


> Can anyone tell me hoe the dismax score is computed? Is it
> the maximum score for any of the component fields that are
> searched? Thank You.

http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/

Dismax score - maximu of any one field?

2010-12-20 Thread Jason Brown


Can anyone tell me hoe the dismax score is computed? Is it the maximum score 
for any of the component fields that are searched? Thank You.

If you wish to view the St. James's Place email disclaimer, please use the link 
below

http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer

Re: DIH for sharded database?

2010-12-20 Thread Grijesh.singh


you can put table names in a different table and use like this


   
-
-
-
-
-
   



-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-for-sharded-database-tp2113767p2119370.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: master master, repeaters

2010-12-20 Thread Upayavira

I've successfully made extensive use of load balancers in sharded,
replicated slave setups - see [1].

My question is how that might work with a master. You can have a load
balancer, but you'd need to configure it into a 'fail over but please
don't fail back' configuration. I'm not sure if that is possible on the
load balancers we have used. Otherwise, if your master had a five minute
blip, you could have some content going to your backup, then traffic
returning to your master, leading to master/backup out of sync and
content missing from your master index.

It seems to me, unless I am missing something, that while a load
balancer can be useful, it is only as a part of a larger scheme when it
comes to master replication. Or am I missing something?

Upayavira

[1] http://www.slideshare.net/sourcesense/sharded-solr-setup-with-master

On Sun, 19 Dec 2010 22:41 -0800, "Lance Norskog" 
wrote:
> If you have a load balancer available, that is a much cleaner solution
> than anything else. After the main indexer comes back, you have to get
> the current index state to it to start again. But otherwise
> 
> On Sun, Dec 19, 2010 at 10:39 AM, Upayavira  wrote:
> >
> >
> > On Sun, 19 Dec 2010 10:20 -0800, "Tri Nguyen" 
> > wrote:
> >> How do we tell the slaves to point to the new master without modifying
> >> the config files?  Can we do this while the slave is up, issuing a
> >> command to it?
> >
> > I believe this can be done (details are in
> > http://wiki.apache.org/solr/SolrReplication), but I've not actually done
> > it.
> >
> > Upayavira
> >
> >> --- On Sun, 12/19/10, Upayavira  wrote:
> >>
> >>
> >> From: Upayavira 
> >> Subject: Re: master master, repeaters
> >> To: solr-user@lucene.apache.org
> >> Date: Sunday, December 19, 2010, 10:13 AM
> >>
> >>
> >> We had a (short) thread on this late last week.
> >>
> >> Solr doesn't support automatic failover of the master, at least in
> >> 1.4.1. I've been discussing with my colleague (Tommaso) about ways to
> >> achieve this.
> >>
> >> There's ways we could 'fake it', scripting the following:
> >>
> >> * set up a 'backup' master, as a replica of the actual master
> >> * monitor the master for 'up-ness'
> >> * if it fails:
> >>    * tell the master to start indexing to the backup instead
> >>    * tell the slave(s) to connect to a different master (the backup)
> >> * then, when the master is back:
> >>    * wipe its index (backing up dir first?)
> >>    * configure it to be a backup of the new master
> >>    * make it pull a fresh index over
> >>
> >> But, Jan Høydahl suggested using SolrCloud. I'm going to follow up on
> >> how that might work in that thread.
> >>
> >> Upayavira
> >>
> >>
> >> On Sun, 19 Dec 2010 00:20 -0800, "Tri Nguyen" 
> >> wrote:
> >> > Hi,
> >> >
> >> > In the master-slave configuration, I'm trying to figure out how to
> >> > configure the
> >> > system setup for master failover.
> >> >
> >> > Does solr support master-master setup?  From my readings, solr does not.
> >> >
> >> > I've read about repeaters as well where the slave can act as a master.
> >> > When the
> >> > main master goes down, do the other slaves switch to the repeater?
> >> >
> >> > Barring better solutions, I'm thinking about putting 2 masters behind  a
> >> > load
> >> > balancer.
> >> >
> >> > If this is not implemented already, perhaps solr can be updated to
> >> > support a
> >> > list of masters for fault tolerance.
> >> >
> >> > Tri
> >>
> >
> 
> 
> 
> -- 
> Lance Norskog
> goks...@gmail.com
>

RE: solr 4.0 - pagination

Recap on derived objects in Solr Index, 'schema in a can'

Re: Lower level filtering

Re: [Nutch] and Solr integration

Re: shard versus core

Re: master master, repeaters

Re: A schema inside a Solr Schema (Schema in a can)

RE: solr 4.0 - pagination

Mime type for JSON

Re: [Nutch] and Solr integration

[Nutch] and Solr integration

Re: A schema inside a Solr Schema (Schema in a can)

Re: Reg blank values ( ) tags in SOLR XML

Re: Syncing 'delta-import' with 'select' query

Re: about groups of random results + alphabetical result

Reg blank values ( ) tags in SOLR XML

Re: about groups of random results + alphabetical result

Re: about groups of random results + alphabetical result

Re: about groups of random results + alphabetical result

Re: about groups of random results + alphabetical result

Re: [Reload-Config] not working

RE: A schema inside a Solr Schema (Schema in a can)

Re: shard versus core

Re: shard versus core

about groups of random results + alphabetical result

Re: Dismax score - maximu of any one field?

Dismax score - maximu of any one field?

Re: DIH for sharded database?

Re: master master, repeaters

29 matches

Site Navigation

Mail list logo

Footer information