Re: Missing tokens

2010-08-19 Thread paul . moran
Great! Now I'm getting somewhere, this worked! The others didn't.

http://localhost/solr/select?q=contents:"OB10.";

Hope this makes sense to you. I'm still somewhat confused with the output
here. I had 'highlight matches' check, and from what I can tell, 'OB10'
wasn't found. When I enter 'OB10.' into the query, column 11 'ob10.' became
highlighted in the 'LowerCaseFilterFactory' table.

Am I using the wrong analyser, or supplying the wrong parameters to an
analyser?

Thanks for your help so far!
Paul

Index Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
|--++---+-++-+-+-+--++-+--+-+|
|term position |1   |2  |3|4   |5|6|7|8 |9  
 |10   |11|12   |13  |
|--++---+-++-+-+-+--++-+--+-+|
|  term text   |To  |produce|a|downloadable|file |using|a
|format|suitable|for  |OB10. |8-26 |Profiles|
|--++---+-++-+-+-+--++-+--+-+|
|  term type   |word|word   |word |word|word |word |word |word  |word   
 |word |word  |word |word|
|--++---+-++-+-+-+--++-+--+-+|
|source|0,2 |3,10   |11,12|13,25   |26,30|31,36|37,38|39,45 |46,54  
 |55,58|59,64 |65,69|70,78   |
|  start,end   ||   | || | | |  |   
 | |  | ||
|--++---+-++-+-+-+--++-+--+-+|
|   payload||   | || | | |  |   
 | |  | ||
|--++---+-++-+-+-+--++-+--+-+|


org.apache.solr.analysis.StandardFilterFactory {}
|--++---+-++-+-+-+--++-+--+-+|
|term position |1   |2  |3|4   |5|6|7|8 |9  
 |10   |11|12   |13  |
|--++---+-++-+-+-+--++-+--+-+|
|  term text   |To  |produce|a|downloadable|file |using|a
|format|suitable|for  |OB10. |8-26 |Profiles|
|--++---+-++-+-+-+--++-+--+-+|
|  term type   |word|word   |word |word|word |word |word |word  |word   
 |word |word  |word |word|
|--++---+-++-+-+-+--++-+--+-+|
|source|0,2 |3,10   |11,12|13,25   |26,30|31,36|37,38|39,45 |46,54  
 |55,58|59,64 |65,69|70,78   |
|  start,end   ||   | || | | |  |   
 | |  | ||
|--++---+-++-+-+-+--++-+--+-+|
|   payload||   | || | | |  |   
 | |  | ||
|--++---+-++-+-+-+--++-+--+-+|


org.apache.solr.analysis.LowerCaseFilterFactory {}
|--++---+-++-+-+-+--++-+-+-+|
|term position |1   |2  |3|4   |5|6|7|8 |9  
 |10   |11   |12   |13  |
|--++---+-++-+-+-+--++-+-+-+|
|  term text   |to  |produce|a|downloadable|file |using|a
|format|suitable|for  |ob10.|8-26 |profiles|
|--++---+-++-+-+-+--++-+-+-+|
|  term type   |word|word   |word |word|word |word |word |word  |word   
 |word |word |word |word|
|--++---+-++-+-+-+--++-+-+-+|
|source|0,2 |3,10   |11,12|13,25   |26,30|31,36|37,38|39,45 |46,54  
 |55,58|59,64|65,69|70,78   |
|  start,end   ||   | || | | |  |   
 | | | ||
|--++---+-++-+-+-+--++-+-+-+|
|   payload||   | || | | |  |   
 | | | ||
|--++---+-++-+-+-+--++-+-+-+|



Query Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
|--+---|
|term position |1  |
|--+---|
|  term text   |OB10   |
|--+---|
|  term type   |word   |
|--+---|
|source|0,4|
|  start,

Re: Indexing fieldvalues with dashes and spaces

2010-08-19 Thread PeterKerk

Sorry for late reply, just back from holiday :)

I did what you mentioned:




and then in url facet.field=services_raw

It works...awesome, thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1222961.html
Sent from the Solr - User mailing list archive at Nabble.com.


field collapsing on multiple fields

2010-08-19 Thread Bharat Jain
Hello,

   I was just wondering if there is field collapsing available for multiple
fields. Basically grouping in different ways, like languages, country etc.
Does anybody have any performance data available that they would like to
share.

Thanks
Bharat Jain


Showing results based on facet selection

2010-08-19 Thread PeterKerk

I have indexed all data (as can be seen below).

But now I want to be able to simulate when a user clicks on a facet value,
for example clicks on the value "Gemeentehuis" of facet "themes_raw" AND has
a selection on features facet on value "Strand"

I've been playing with facet.query function:
facet.query=themes_raw:Gemeentehuis&facet.query=features_raw:Strand

But without luck.

{
 "responseHeader":{
  "status":0,
  "QTime":0,
  "params":{
"facet":"true",
"fl":"id,title,city,score,themes,features,official,services",
"indent":"on",
"q":"*:*",
"facet.field":["province_raw",
 "services_raw",
 "themes_raw",
 "features_raw"],
"wt":"json"}},
 "response":{"numFound":3,"start":0,"maxScore":1.0,"docs":[
{
 "id":"1",
 "title":"Gemeentehuis Nijmegen",
 "services":[
  "Fotoreportage"],
 "features":[
  "Tuin",
  "Cafe"],
 "themes":[
  "Gemeentehuis"],
 "score":1.0},
{
 "id":"2",
 "title":"Gemeentehuis Utrecht",
 "services":[
  "Fotoreportage",
  "Exclusieve huur"],
 "features":[
  "Tuin",
  "Cafe",
  "Danszaal"],
 "themes":[
  "Gemeentehuis",
  "Strand & Zee"],
 "score":1.0},
{
 "id":"3",
 "title":"Beachclub Vroeger",
 "services":[
  "Exclusieve huur",
  "Live muziek"],
 "features":[
  "Strand",
  "Cafe",
  "Danszaal"],
 "themes":[
  "Strand & Zee"],
 "score":1.0}]
 },
 "facet_counts":{
  "facet_queries":{},
  "facet_fields":{
"province_raw":[
 "Gelderland",1,
 "Utrecht",1,
 "Zuid-Holland",1],
"services_raw":[
 "Exclusieve huur",2,
 "Fotoreportage",2,
 "Live muziek",1],
"themes_raw":[
 "Gemeentehuis",2,
 "Strand & Zee",2],
"features_raw":[
 "Cafe",3,
 "Danszaal",2,
 "Tuin",2,
 "Strand",1]},
  "facet_dates":{}}}

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Showing-results-based-on-facet-selection-tp1223362p1223362.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: edismax pf2 and ps

2010-08-19 Thread Ron Mayer
Chris Hostetter wrote:
> : Perhaps fold it into the pf/pf2 syntax?
> : 
> : pf=text^2// current syntax... makes phrases with a boost of 2
> : pf=text~1^2  // proposed syntax... makes phrases with a slop of 1 and
> : a boost of 2
> : 
> : That actually seems pretty natural given the lucene query syntax - an
> : actual boosted sloppy phrase query already looks like
> : text:"foo bar"~1^2
> 
> Big +1 to this idea ... the existing "ps" param can stick arround as the 
> default for any field that doesn't specify it's own slop in the pf/pf2/pf3 
> fields using the "~" syntax.

I think I have a decent first draft of a patch that implements this.

Hopefully I'm figuring out the right way to submit patches to this community.
I added a ticket here: https://issues.apache.org/jira/browse/SOLR-2058
and attached my patch to that ticket.   Any feedback, either on the patch
or on how best to submit things to this community would be appreciated.


This patch seems to happily turn a query like
  
http://localhost:8983/solr/select?defType=edismax&fl=id,text,score&q=enterprise+search+foobar&ps=5&qf=text&debugQuery=true&pf2=name~0^&pf2=name^12+name~10
into what I believe is the desired parsed query:

+((text:enterpris) (text:search) (text:foobar))
 ((name:"enterprise search"~5^12.0) (name:"search foobar"~5^12.0))
 ((name:"enterprise search"^.0) (name:"search foobar"^.0))
 ((name:"enterprise search"~10) (name:"search foobar"~10))

which looks like it should give a high boost to docs where both words
appear right next to each other, but still substantial boosts to docs
where the pairs of words are a few words apart.


I'll start testing it with real data today.


One question:

* Where might I find documentation and/or test cases for the pf2, pf3
  parameters? I quick grep of the sources from the tree I got from
  git://git.apache.org/lucene-solr.git
  didn't reveal any obvious docs or tests with those parameters.
  $ git grep pf2 | grep -v 'Binary file'
  solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java:
   U.parseFieldBoostsAndSlop(solrParams.getParams("pf2"));


Am I on the right track?


   Ron


RE: Solr for multiple websites

2010-08-19 Thread Hitendra Molleti
Thanks Girjesh.

Can you please let me know what are the pros and cons of this apporoach.

Also, how can we setup load balancing between multiple solrs

Thanks

Hitendra 

-Original Message-
From: Grijesh.singh [mailto:pintu.grij...@gmail.com] 
Sent: Thursday, August 19, 2010 10:25 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr for multiple websites


Using multicore is the right approach 
-- 
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-for-multiple-websites-tp1173220p1219
772.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Jetty rerturning HTTP error code 413

2010-08-19 Thread Alexandre Rocco
Hi diddier,

I have updated my etc/jetty.xml and updated my headerBufferSize to 2x as:
16384

But the error persists. Do you know if there is any other config that should
be updated so this setting works?
Also, is there any way to check if jetty is use this config inside Solr
admin pages? I know that we can check the Java properties but I haven't
found any way to locate the jetty config there.

Thanks!
Alexandre

On Wed, Aug 18, 2010 at 4:58 PM, didier deshommes wrote:

> Hi Alexandre,
> Have you tried setting a higher headerBufferSize?  Look in
> etc/jetty.xml and search for 'headerBufferSize'; I think it controls
> the size of the url. By default it is 8192.
>
> didier
>
> On Wed, Aug 18, 2010 at 2:43 PM, Alexandre Rocco 
> wrote:
> > Guys,
> >
> > We are facing an issue executing very large query (~4000 bytes in the
> URL)
> > in Solr.
> > When we execute the query, Solr (probably Jetty) returns a HTTP 413 error
> > (FULL HEAD).
> >
> > I guess that this is related to the very big query being executed, and
> > currently we can't make it short.
> > Is there any configuration that need to be tweaked on Jetty or other
> > component to make this query work?
> >
> > Any advice is really appreciated.
> >
> > Thanks!
> > Alexandre Rocco
> >
>


Faceting by fields that contain special characters

2010-08-19 Thread Christos Constantinou
Hi all,

I am doing a faceted search on a solr field that contains URLs, for the sole 
purpose of trying to locate duplicate URLs in my documents.

However, the solr response I get looks like this:
public 'com' => int 492198
  public 'flickr' => int 492198
  public 'http' => int 492198
  public 'www' => int 253881
  public 'photo' => int 253843
  public 'n' => int 253318
  public 'httpwwwflickrcomphoto' => int 253316
  public 'farm' => int 238317
  public 'httpfarm' => int 238317
  public 'jpg' => int 238317
  public 'static' => int 238317
  public 'staticflickrcom' => int 238317
  public '5' => int 237939
  public '00' => int 61009
  public 'b' => int 59463
  public 'c' => int 59094
  public 'f' => int 59004
  public 'd' => int 58995
  public 'e' => int 58818
  public 'a' => int 58327
  public '08' => int 33797
  public '06' => int 33341
  public '04' => int 29902
  public '02' => int 29224
  public '2' => int 26671
  public '4' => int 26613
  public '6' => int 26606
  public '03' => int 26506
  public '1' => int 26389
  public '8' => int 26384
It should instead have the entire URL as the variable name, but the name is 
only a part of the URL. Is this because characters like :// in http:// cannot 
be used in variable names? If so, is there any workaround to the problem or an 
alternative way to detect duplicates?

Thanks

Christos



RE: Faceting by fields that contain special characters

2010-08-19 Thread Markus Jelsma
A very common issue, you need to facet on a non-analyzed field.


http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-td1023699.html#a1222961
 
-Original message-
From: Christos Constantinou 
Sent: Thu 19-08-2010 15:08
To: solr-user@lucene.apache.org; 
Subject: Faceting by fields that contain special characters

Hi all,

I am doing a faceted search on a solr field that contains URLs, for the sole 
purpose of trying to locate duplicate URLs in my documents.

However, the solr response I get looks like this:
public 'com' => int 492198
         public 'flickr' => int 492198
         public 'http' => int 492198
         public 'www' => int 253881
         public 'photo' => int 253843
         public 'n' => int 253318
         public 'httpwwwflickrcomphoto' => int 253316
         public 'farm' => int 238317
         public 'httpfarm' => int 238317
         public 'jpg' => int 238317
         public 'static' => int 238317
         public 'staticflickrcom' => int 238317
         public '5' => int 237939
         public '00' => int 61009
         public 'b' => int 59463
         public 'c' => int 59094
         public 'f' => int 59004
         public 'd' => int 58995
         public 'e' => int 58818
         public 'a' => int 58327
         public '08' => int 33797
         public '06' => int 33341
         public '04' => int 29902
         public '02' => int 29224
         public '2' => int 26671
         public '4' => int 26613
         public '6' => int 26606
         public '03' => int 26506
         public '1' => int 26389
         public '8' => int 26384
It should instead have the entire URL as the variable name, but the name is 
only a part of the URL. Is this because characters like :// in http:// cannot 
be used in variable names? If so, is there any workaround to the problem or an 
alternative way to detect duplicates?

Thanks

Christos



RE: Showing results based on facet selection

2010-08-19 Thread Markus Jelsma
Hi,

 

A facet query serves a different purpose [1]. You need to filter your result 
set [2]. And don't forget to follow the links on caching and such.

 

[1]: 
http://wiki.apache.org/solr/SimpleFacetParameters#facet.query_:_Arbitrary_Query_Faceting

[2]: http://wiki.apache.org/solr/CommonQueryParameters#fq
 

Cheers, 
-Original message-
From: PeterKerk 
Sent: Thu 19-08-2010 14:10
To: solr-user@lucene.apache.org; 
Subject: Showing results based on facet selection


I have indexed all data (as can be seen below).

But now I want to be able to simulate when a user clicks on a facet value,
for example clicks on the value "Gemeentehuis" of facet "themes_raw" AND has
a selection on features facet on value "Strand"

I've been playing with facet.query function:
facet.query=themes_raw:Gemeentehuis&facet.query=features_raw:Strand

But without luck.

{
"responseHeader":{
 "status":0,
 "QTime":0,
 "params":{
"facet":"true",
"fl":"id,title,city,score,themes,features,official,services",
"indent":"on",
"q":"*:*",
"facet.field":["province_raw",
"services_raw",
"themes_raw",
"features_raw"],
"wt":"json"}},
"response":{"numFound":3,"start":0,"maxScore":1.0,"docs":[
{
"id":"1",
"title":"Gemeentehuis Nijmegen",
"services":[
 "Fotoreportage"],
"features":[
 "Tuin",
 "Cafe"],
"themes":[
 "Gemeentehuis"],
"score":1.0},
{
"id":"2",
"title":"Gemeentehuis Utrecht",
"services":[
 "Fotoreportage",
 "Exclusieve huur"],
"features":[
 "Tuin",
 "Cafe",
 "Danszaal"],
"themes":[
 "Gemeentehuis",
 "Strand & Zee"],
"score":1.0},
{
"id":"3",
"title":"Beachclub Vroeger",
"services":[
 "Exclusieve huur",
 "Live muziek"],
"features":[
 "Strand",
 "Cafe",
 "Danszaal"],
"themes":[
 "Strand & Zee"],
"score":1.0}]
},
"facet_counts":{
 "facet_queries":{},
 "facet_fields":{
"province_raw":[
"Gelderland",1,
"Utrecht",1,
"Zuid-Holland",1],
"services_raw":[
"Exclusieve huur",2,
"Fotoreportage",2,
"Live muziek",1],
"themes_raw":[
"Gemeentehuis",2,
"Strand & Zee",2],
"features_raw":[
"Cafe",3,
"Danszaal",2,
"Tuin",2,
"Strand",1]},
 "facet_dates":{}}}

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Showing-results-based-on-facet-selection-tp1223362p1223362.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr for multiple websites

2010-08-19 Thread Markus Jelsma
http://osdir.com/ml/solr-user.lucene.apache.org/2009-09/msg00630.html

http://osdir.com/ml/solr-user.lucene.apache.org/2009-03/msg00309.html

 

Load balancing is bit out of scope here but all you need is a simple HTTP load 
balancer and a replication mechanism, depending on your set up.
 
-Original message-
From: Hitendra Molleti 
Sent: Thu 19-08-2010 14:38
To: solr-user@lucene.apache.org; 
CC: 'Jonathan DeMello' ; amer.mahf...@itp.com; 
'Nishchint Yogishwar' ; 
Subject: RE: Solr for multiple websites

Thanks Girjesh.

Can you please let me know what are the pros and cons of this apporoach.

Also, how can we setup load balancing between multiple solrs

Thanks

Hitendra 

-Original Message-
From: Grijesh.singh [mailto:pintu.grij...@gmail.com] 
Sent: Thursday, August 19, 2010 10:25 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr for multiple websites


Using multicore is the right approach 
-- 
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-for-multiple-websites-tp1173220p1219
772.html
Sent from the Solr - User mailing list archive at Nabble.com.




/update/extract

2010-08-19 Thread satya swaroop
Hi all,
   when we handle extract request handler what class gets invoked.. I
need to know the navigation of classes when we send any files to solr.
can anybody tell me the classes or any sources where i can get the answer..
or can anyone tell me what classes get invoked when we start the
solr... I be thankful if anybody can help me with regarding this..

Regards,
satya


Re: specifying the doc id in clustering component

2010-08-19 Thread Tommy Chheng
The solr schema has the fields, id,  name and desc.

  I would like to get docs:["name Field here" ] instead of the doc Id
field as in
"docs":["200066",         "195650",


On Wednesday, August 18, 2010, Stanislaw Osinski
 wrote:
> Hi Tommy,
>
>  I'm using the clustering component with solr 1.4.
>>
>> The response is given by the id field in the doc array like:
>>        "labels":["Devices"],
>>        "docs":["200066",
>>         "195650",
>>         "204850",
>> Is there a way to change the doc label to be another field?
>>
>> i couldn't this option in http://wiki.apache.org/solr/ClusteringComponent
>
>
> I'm not sure if I get you right. The "labels" field is generated by the
> clustering engine, it's a description of the group (cluster) of documents.
> The description is usually a phrase or a number of phrases. The "docs" field
> lists the ids of documents that the algorithm assigned to the cluster.
>
> Can you give an example of the input and output you'd expect?
>
> Thanks!
>
> Stanislaw
>


RE: Showing results based on facet selection

2010-08-19 Thread PeterKerk

Hi Markus,

Thanks for the quick reply. it works now! :)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Showing-results-based-on-facet-selection-tp1223362p1225626.html
Sent from the Solr - User mailing list archive at Nabble.com.


Using postCommit event to swap cores

2010-08-19 Thread simon
Hi there,

I have solr configured with 2 cores, "live" and "standby".  "Live" is
used to service search requests from our users.  "Standby" is used to
rebuild the index from scratch each night.  Currently I have the
postCommit hook setup to swap the two cores over as soon as the indexing
on "standby" is complete. 

It seems to work well on my development box, but I have not seen this
approach discussed elsewhere so I was wondering if I was missing
something here.

Feedback gratefully received!

Simon




Re: Jetty rerturning HTTP error code 413

2010-08-19 Thread Alexandre Rocco
Hi diddier,

Nevermind.
I figured it out. There was some miscommunication between me and our IT guy.

Thanks for helping. It's fixed now.

Alexandre

On Thu, Aug 19, 2010 at 9:59 AM, Alexandre Rocco  wrote:

> Hi diddier,
>
> I have updated my etc/jetty.xml and updated my headerBufferSize to 2x as:
> 16384
>
> But the error persists. Do you know if there is any other config that
> should be updated so this setting works?
> Also, is there any way to check if jetty is use this config inside Solr
> admin pages? I know that we can check the Java properties but I haven't
> found any way to locate the jetty config there.
>
> Thanks!
> Alexandre
>
> On Wed, Aug 18, 2010 at 4:58 PM, didier deshommes wrote:
>
>> Hi Alexandre,
>> Have you tried setting a higher headerBufferSize?  Look in
>> etc/jetty.xml and search for 'headerBufferSize'; I think it controls
>> the size of the url. By default it is 8192.
>>
>> didier
>>
>> On Wed, Aug 18, 2010 at 2:43 PM, Alexandre Rocco 
>> wrote:
>> > Guys,
>> >
>> > We are facing an issue executing very large query (~4000 bytes in the
>> URL)
>> > in Solr.
>> > When we execute the query, Solr (probably Jetty) returns a HTTP 413
>> error
>> > (FULL HEAD).
>> >
>> > I guess that this is related to the very big query being executed, and
>> > currently we can't make it short.
>> > Is there any configuration that need to be tweaked on Jetty or other
>> > component to make this query work?
>> >
>> > Any advice is really appreciated.
>> >
>> > Thanks!
>> > Alexandre Rocco
>> >
>>
>
>


Autosuggest on PART of cityname

2010-08-19 Thread PeterKerk

I want to have a Google-like autosuggest function on citynames. So when user
types some characters I want to show cities that match those characters but
ALSO the amount of locations that are in that city.

Now with Solr I now have the parameter:
"&fq=title:Bost"

But the result doesnt show the city Boston. So the fq parameter now seems to
be an exact match, where I want it to be a partial match as well, more like
this in SQL: WHERE title LIKE '%'

How can I do this?



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-on-PART-of-cityname-tp1226088p1226088.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solrj ContentStreamUpdateRequest Slow

2010-08-19 Thread Tod

On 8/19/2010 1:45 AM, Lance Norskog wrote:

'stream.url' is just a simple parameter. You should be able to just
add it directly.



I agree (code excluding imports):

public class CommonTest {

  public static void main(String[] args) {
System.out.println("main...");
try {
  String fileName = String fileName = 
"http://remoteserver/test/test.pdf";;

  String solrId = "1234";
  indexFilesSolrCell(fileName, solrId);

} catch (Exception ex) {
  ex.printStackTrace();
}
  }

  /**
   * Method to index all types of files into Solr.
   * @param fileName
   * @param solrId
   * @throws IOException
   * @throws SolrServerException
   */
  public static void indexFilesSolrCell(String fileName, String solrId)
throws IOException, SolrServerException {

System.out.println("indexFilesSolrCell...");

String urlString = "http://localhost:9080/solr";;

System.out.println("getting connection...");
SolrServer solr = new CommonsHttpSolrServer(urlString);

System.out.println("getting updaterequest handle...");
ContentStreamUpdateRequest req = new 
ContentStreamUpdateRequest("/update/extract");


System.out.println("setting params...");
req.setParam("stream.url", fileName);
req.setParam("literal.content_id", solrId);

System.out.println("making request...");
solr.request(req);

System.out.println("committing...");
solr.commit();

System.out.println("done...");
  }
}


At "making request" I get:

java.lang.NullPointerException
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:381)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)

at CommonTest.indexFilesSolrCell(CommonTest.java:59)
at CommonTest.main(CommonTest.java:26)

... which is pointing to the solr.request(req) line.



Thanks - Tod


RE: Autosuggest on PART of cityname

2010-08-19 Thread Markus Jelsma
You need a new analyzed field with the EdgeNGramTokenizer or you can try 
facet.prefix for this to work. To retrieve the number of locations for that 
city, just use the results from the faceting engine as usual.

 

I'm unsure which approach is actually faster but i'd guess using the 
EdgeNGramTokenizer is faster, but also takes up more disk space. Using the 
faceting engine will not take more disk space.
 
-Original message-
From: PeterKerk 
Sent: Thu 19-08-2010 16:46
To: solr-user@lucene.apache.org; 
Subject: Autosuggest on PART of cityname


I want to have a Google-like autosuggest function on citynames. So when user
types some characters I want to show cities that match those characters but
ALSO the amount of locations that are in that city.

Now with Solr I now have the parameter:
"&fq=title:Bost"

But the result doesnt show the city Boston. So the fq parameter now seems to
be an exact match, where I want it to be a partial match as well, more like
this in SQL: WHERE title LIKE '%'

How can I do this?



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-on-PART-of-cityname-tp1226088p1226088.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: improving search response time

2010-08-19 Thread Muneeb Ali

Thanks for your input guys. I will surely try these suggestions, in
particular, reducing heap size JAVA_OPTION and adjusting cache sizes to see
if that makes a difference. 

I am also considering upgrading RAM for slave nodes, and also looking into
moving from SATA enterprise HDD to SSD flash/DRAM storage... Is anyone using
SSDs for solr application?

What would be a better route to take? more memory or flash based SSD hard
drive?

Thanks,
-Muneeb


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/improving-search-response-time-tp1204491p1226372.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr data type for date faceting

2010-08-19 Thread Jan Høydahl / Cominvent
Yes, I forgot that strings support alphanumeric ranges.
However, they will potentially be very memory intensive since you dont get the 
trie-optimization and since strings take up more space than ints. Only way is 
to try it out.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 19. aug. 2010, at 05.20, Karthik K wrote:

> adding 
> facet.query=timestamp:[20100601+TO+201006312359]&facet.query=timestamp:[20100701+TO+201007312359]...
> in query should give the desired response without changing the schema or
> re-indexing.



RE: Autosuggest on PART of cityname

2010-08-19 Thread PeterKerk

Ok, I now tried this:
http://localhost:8983/solr/db/select/?wt=json&indent=on&q=*:*&fl=city&facet.field=city&facet.prefix=Bost

Then I get:
{
 "responseHeader":{
  "status":0,
  "QTime":0,
  "params":{
"fl":"city",
"indent":"on",
"q":"*:*",
"facet.prefix":"Bost",
"facet.field":"city",
"wt":"json"}},
 "response":{"numFound":4,"start":0,"docs":[
{},
{},
{},
{}]
 }}


So 4 total results, but I would have expected 1

What am I doing wrong?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-on-PART-of-cityname-tp1226088p1226571.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Missing tokens

2010-08-19 Thread Jan Høydahl / Cominvent
Hi,

Your bug is right there in the WhitespaceTokenizer, where you see that it does 
NOT strip away the "." as whitespace.
Try with StandardTokenizerFactory instead, as it removes punctuation.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 19. aug. 2010, at 12.16, paul.mo...@dds.net wrote:

> Great! Now I'm getting somewhere, this worked! The others didn't.
> 
> http://localhost/solr/select?q=contents:"OB10.";
> 
> Hope this makes sense to you. I'm still somewhat confused with the output
> here. I had 'highlight matches' check, and from what I can tell, 'OB10'
> wasn't found. When I enter 'OB10.' into the query, column 11 'ob10.' became
> highlighted in the 'LowerCaseFilterFactory' table.
> 
> Am I using the wrong analyser, or supplying the wrong parameters to an
> analyser?
> 
> Thanks for your help so far!
> Paul
> 
> Index Analyzer
> org.apache.solr.analysis.WhitespaceTokenizerFactory {}
> |--++---+-++-+-+-+--++-+--+-+|
> |term position |1   |2  |3|4   |5|6|7|8 |9
>|10   |11|12   |13  |
> |--++---+-++-+-+-+--++-+--+-+|
> |  term text   |To  |produce|a|downloadable|file |using|a
> |format|suitable|for  |OB10. |8-26 |Profiles|
> |--++---+-++-+-+-+--++-+--+-+|
> |  term type   |word|word   |word |word|word |word |word |word  |word 
>|word |word  |word |word|
> |--++---+-++-+-+-+--++-+--+-+|
> |source|0,2 |3,10   |11,12|13,25   |26,30|31,36|37,38|39,45 
> |46,54   |55,58|59,64 |65,69|70,78   |
> |  start,end   ||   | || | | |  | 
>| |  | ||
> |--++---+-++-+-+-+--++-+--+-+|
> |   payload||   | || | | |  | 
>| |  | ||
> |--++---+-++-+-+-+--++-+--+-+|
> 
> 
> org.apache.solr.analysis.StandardFilterFactory {}
> |--++---+-++-+-+-+--++-+--+-+|
> |term position |1   |2  |3|4   |5|6|7|8 |9
>|10   |11|12   |13  |
> |--++---+-++-+-+-+--++-+--+-+|
> |  term text   |To  |produce|a|downloadable|file |using|a
> |format|suitable|for  |OB10. |8-26 |Profiles|
> |--++---+-++-+-+-+--++-+--+-+|
> |  term type   |word|word   |word |word|word |word |word |word  |word 
>|word |word  |word |word|
> |--++---+-++-+-+-+--++-+--+-+|
> |source|0,2 |3,10   |11,12|13,25   |26,30|31,36|37,38|39,45 
> |46,54   |55,58|59,64 |65,69|70,78   |
> |  start,end   ||   | || | | |  | 
>| |  | ||
> |--++---+-++-+-+-+--++-+--+-+|
> |   payload||   | || | | |  | 
>| |  | ||
> |--++---+-++-+-+-+--++-+--+-+|
> 
> 
> org.apache.solr.analysis.LowerCaseFilterFactory {}
> |--++---+-++-+-+-+--++-+-+-+|
> |term position |1   |2  |3|4   |5|6|7|8 |9
>|10   |11   |12   |13  |
> |--++---+-++-+-+-+--++-+-+-+|
> |  term text   |to  |produce|a|downloadable|file |using|a
> |format|suitable|for  |ob10.|8-26 |profiles|
> |--++---+-++-+-+-+--++-+-+-+|
> |  term type   |word|word   |word |word|word |word |word |word  |word 
>|word |word |word |word|
> |--++---+-++-+-+-+--++-+-+-+|
> |source|0,2 |3,10   |11,12|13,25   |26,30|31,36|37,38|39,45 
> |46,54   |55,58|59,64|65,69|70,78   |
> |  start,end   ||   | || | | |  | 
>| | | ||
> |--++---+-++-+-+-+--++-+-+-+|
> |   payload|  

RE: Autosuggest on PART of cityname

2010-08-19 Thread Markus Jelsma
Hmm, you have only four documents in your index i guess? That would make sense 
because you query for *:*. This technique doesn't rely on the found documents 
but the faceting engine so you should include rows=0 in your query and the fl 
parameter is not required anymore. Also, add facet=true to enable the faceting 
engine.

 

http://localhost:8983/solr/db/select/?wt=json&q=*:*&rows=0&facet=true&facet.field=city&facet.prefix=bost


 
-Original message-
From: PeterKerk 
Sent: Thu 19-08-2010 17:11
To: solr-user@lucene.apache.org; 
Subject: RE: Autosuggest on PART of cityname


Ok, I now tried this:
http://localhost:8983/solr/db/select/?wt=json&indent=on&q=*:*&fl=city&facet.field=city&facet.prefix=Bost

Then I get:
{
"responseHeader":{
 "status":0,
 "QTime":0,
 "params":{
"fl":"city",
"indent":"on",
"q":"*:*",
"facet.prefix":"Bost",
"facet.field":"city",
"wt":"json"}},
"response":{"numFound":4,"start":0,"docs":[
{},
{},
{},
{}]
}}


So 4 total results, but I would have expected 1

What am I doing wrong?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-on-PART-of-cityname-tp1226088p1226571.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrIndex / LuceneIndex

2010-08-19 Thread stockii

Hello. 

in
http://lucene.apache.org/solr/api/index.html?org/apache/solr/common/SolrDocument.html
  

Is the talk about "SolrIndex" -->  "A concrete representation of a document
within a Solr index" 

is solr create an special SolrIndex or is here mean an LuceneIndex ? 

thx ;)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrIndex-LuceneIndex-tp1226714p1226714.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: improving search response time

2010-08-19 Thread Jan Høydahl / Cominvent
It is crucial to MEASURE your system to confirm your bottleneck.
I agree that you are very likely to be disk I/O bound with such little
memory left for the OS, a large index and many terms in each query.

Have your IT guys do some monitoring on your disks and log this while
under load. Then you should easily be able to see whether disk I/O
is peaking while CPU is health.

You should also look into whether you can shorten down your query size:

+((tags:case^1.2 | authors:case^7.5 | title:case^65.5 | matchAll:case |
keywords:case^2.5 | meshterm:case^3.2 | abstract1:case^9.5)~0.01
(tags:studi^1.2 | authors:study^7.5 | title:study^65.5 | matchAll:study |
keywords:studi^2.5 | meshterm:studi^3.2 | abstract1:studi^9.5)~0.01
(tags:research^1.2 | authors:research^7.5 | title:research^65.5 |
matchAll:research | keywords:research^2.5 | meshterm:research^3.2 |
abstract1:research^9.5)~0.01) (tags:"case studi research"~50^1.2 |
authors:"case study research"~50^7.5 | title:"case study research"~50^65.5 |
matchAll:case study research | keywords:"case studi research"~50^2.5 |
meshterm:"case studi research"~50^3.2 | abstract1:"case studi
research"~50^9.5)~0.01 (sum(sdouble(yearScore)))^1.1
(sum(sdouble(readerScore)))^2.0

Do you need "pf" at all? Can you smash together similarly weighted fields
with copyfield into a new one, reducing the number of fiels to lookup
from 7 to perhaps 5?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 19. aug. 2010, at 16.58, Muneeb Ali wrote:

> 
> Thanks for your input guys. I will surely try these suggestions, in
> particular, reducing heap size JAVA_OPTION and adjusting cache sizes to see
> if that makes a difference. 
> 
> I am also considering upgrading RAM for slave nodes, and also looking into
> moving from SATA enterprise HDD to SSD flash/DRAM storage... Is anyone using
> SSDs for solr application?
> 
> What would be a better route to take? more memory or flash based SSD hard
> drive?
> 
> Thanks,
> -Muneeb
> 
> 
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/improving-search-response-time-tp1204491p1226372.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Proper Escaping of Ampersands

2010-08-19 Thread Nikolas Tautenhahn
Hi,

I have a problem with, for example, company names like "AT&S".
A Job is sending data to the solr 1.4 (also tested it with 1.4.1) index
via python in XML, everything is escaped properly ("&" becomes "&").

When I search for "at s"(q=%22at%20s%22), using the dismax handler, I
find the dataset to this company and I get all names back (The company
is still called at&s and not something like at&s).

But when I search for q=at%26s (=at&s), I get nothing.
I also tried q=at%5C%26s (=at\&s) and q=at%5C%5C%26s blindly following
any clues for escaping with backslashes...


So, my question is: How do I search (correctly) for at&s?


When I use the "Analysis" Page in the admin panel and select my
fieldname and enter Field Value (Index) "AT&S" and enter the Field Value
(Query) as "AT&S" it shows me that the query matches - so I assume, SOLR
doesn't get the correct query string...

If it is necessary, I can supply information from schema.xml for the
fields in use, but as the "Analysis"-Page showed the match, I don't
think this is very useful...

best regards,
Nikolas Tautenhahn


Re: Missing tokens

2010-08-19 Thread paul . moran
I did that and it worked.

Thanks  very much for your expert assistance, Jan!

Paul



From:   Jan Høydahl / Cominvent 
To: solr-user@lucene.apache.org
Date:   19/08/2010 16:15
Subject:Re: Missing tokens



Hi,

Your bug is right there in the WhitespaceTokenizer, where you see that it
does NOT strip away the "." as whitespace.
Try with StandardTokenizerFactory instead, as it removes punctuation.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 19. aug. 2010, at 12.16, paul.mo...@dds.net wrote:

> Great! Now I'm getting somewhere, this worked! The others didn't.
>
> http://localhost/solr/select?q=contents:"OB10.";
>
> Hope this makes sense to you. I'm still somewhat confused with the output
> here. I had 'highlight matches' check, and from what I can tell, 'OB10'
> wasn't found. When I enter 'OB10.' into the query, column 11 'ob10.'
became
> highlighted in the 'LowerCaseFilterFactory' table.
>
> Am I using the wrong analyser, or supplying the wrong parameters to an
> analyser?
>
> Thanks for your help so far!
> Paul
>
> Index Analyzer
> org.apache.solr.analysis.WhitespaceTokenizerFactory {}
>
|--++---+-++-+-+-+--++-+--+-+|

> |term position |1   |2  |3|4   |5|6|7|8 |
9   |10   |11|12   |13  |
>
|--++---+-++-+-+-+--++-+--+-+|

> |  term text   |To  |produce|a|downloadable|file |using|a|format|
suitable|for  |OB10. |8-26 |Profiles|
>
|--++---+-++-+-+-+--++-+--+-+|

> |  term type   |word|word   |word |word|word |word |word |word  |
word|word |word  |word |word|
>
|--++---+-++-+-+-+--++-+--+-+|

> |source|0,2 |3,10   |11,12|13,25   |26,30|31,36|37,38|39,45 |
46,54   |55,58|59,64 |65,69|70,78   |
> |  start,end   ||   | || | | |  |
| |  | ||
>
|--++---+-++-+-+-+--++-+--+-+|

> |   payload||   | || | | |  |
| |  | ||
>
|--++---+-++-+-+-+--++-+--+-+|

>
>
> org.apache.solr.analysis.StandardFilterFactory {}
>
|--++---+-++-+-+-+--++-+--+-+|

> |term position |1   |2  |3|4   |5|6|7|8 |
9   |10   |11|12   |13  |
>
|--++---+-++-+-+-+--++-+--+-+|

> |  term text   |To  |produce|a|downloadable|file |using|a|format|
suitable|for  |OB10. |8-26 |Profiles|
>
|--++---+-++-+-+-+--++-+--+-+|

> |  term type   |word|word   |word |word|word |word |word |word  |
word|word |word  |word |word|
>
|--++---+-++-+-+-+--++-+--+-+|

> |source|0,2 |3,10   |11,12|13,25   |26,30|31,36|37,38|39,45 |
46,54   |55,58|59,64 |65,69|70,78   |
> |  start,end   ||   | || | | |  |
| |  | ||
>
|--++---+-++-+-+-+--++-+--+-+|

> |   payload||   | || | | |  |
| |  | ||
>
|--++---+-++-+-+-+--++-+--+-+|

>
>
> org.apache.solr.analysis.LowerCaseFilterFactory {}
>
|--++---+-++-+-+-+--++-+-+-+|

> |term position |1   |2  |3|4   |5|6|7|8 |
9   |10   |11   |12   |13  |
>
|--++---+-++-+-+-+--++-+-+-+|

> |  term text   |to  |produce|a|downloadable|file |using|a|format|
suitable|for  |ob10.|8-26 |profiles|
>
|--++---+-++-+-+-+--++-+-+-+|

> |  term type   |word|word   |word |word|word |word |word |word  |
word|word |word |word |word|
>
|--++---+-++-+-+-+--++-+-+-+|

> |source|0,2 |3,10   |11,12|13,25   |26,30|31,36|37,38|39,45 |
46,54   |55,58|59,64|65,69|70,78   |
> |  start,end   ||   | || | | |  |
| |   

Re: tire fields and sortMissingLast

2010-08-19 Thread harish.agarwal

Just curious if there has been any progress on implementing sortMissingLast
on TrieFields?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/trie-fields-and-sortMissingLast-tp479233p1227971.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: tire fields and sortMissingLast

2010-08-19 Thread Yonik Seeley
On Thu, Aug 19, 2010 at 12:28 PM, harish.agarwal
 wrote:
> Just curious if there has been any progress on implementing sortMissingLast
> on TrieFields?

Not yet - that info is not available from the lucene FieldCache.

-Yonik
http://www.lucidimagination.com


Re: tire fields and sortMissingLast

2010-08-19 Thread harish.agarwal

Is there a good opportunity to work on this issue right now?  I'd be happy to
do it, if you could provide some initial advice on how to attack the
problem.  Moving forward, I'd like to use Trie fields, but the lack of this
option is really holding me back...
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/trie-fields-and-sortMissingLast-tp479233p1229285.html
Sent from the Solr - User mailing list archive at Nabble.com.


SpellCheckComponent question

2010-08-19 Thread fabritw

Hi,

I am having some trouble with SpellCheckComponent when using queries such as
"2galwy city".

The spellchecker seems to ignore the number and suggest "galway". This is
fine but in the collation it adds the number back onto the suggestion
"2galway". This causes problems for me as I'm using it for a search
suggestion tool.

Is there a way  to configure the spell checker to provide a collation
without the number ("galway city")?

Any advise would be much appreciated. Please find the query xml below:





0
9

true
all
true
default
5
true
false
all
2galwy city
0






5
1
6
0


galway
10095


galwey
46


galwaya
2


galwayi
1


galway2
1



false
2galway city




-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SpellCheckComponent-question-tp1229575p1229575.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SpellCheckComponent question

2010-08-19 Thread Dyer, James
This possibly might be a bug.  See
http://lucene.472066.n3.nabble.com/Spellcheck-help-td951059.html#a990476

James Dyer
E-Commerce Systems
Ingram Book Company
(615) 213-4311

-Original Message-
From: fabritw [mailto:fabr...@gmail.com] 
Sent: Thursday, August 19, 2010 12:51 PM
To: solr-user@lucene.apache.org
Subject: SpellCheckComponent question


Hi,

I am having some trouble with SpellCheckComponent when using queries
such as
"2galwy city".

The spellchecker seems to ignore the number and suggest "galway". This
is
fine but in the collation it adds the number back onto the suggestion
"2galway". This causes problems for me as I'm using it for a search
suggestion tool.

Is there a way  to configure the spell checker to provide a collation
without the number ("galway city")?

Any advise would be much appreciated. Please find the query xml below:





0
9

true
all
true
default
5
true
false
all
2galwy city
0






5
1
6
0


galway
10095


galwey
46


galwaya
2


galwayi
1


galway2
1



false
2galway city




-- 
View this message in context:
http://lucene.472066.n3.nabble.com/SpellCheckComponent-question-tp122957
5p1229575.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problems to clustering on tomcat

2010-08-19 Thread Claudio Devecchi
Tks so much Otis, It was very helpfull

On Tue, Aug 10, 2010 at 3:37 AM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

> Claudio,
>
> It sounds like the word "Cluster" there is adding confusion.
> ClusteringComponent has to do with search results clustering.  What you
> seem to
> be after is creation of a Solr cluster.
>
> You'll find good pointers here:
> http://search-lucene.com/?q=master+slave&fc_project=Solr&fc_type=wiki
>
> Perhaps this is the best place to start:
> http://wiki.apache.org/solr/SolrReplication
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
> > From: Claudio Devecchi 
> > To: solr-user@lucene.apache.org
> > Sent: Mon, August 9, 2010 7:07:54 PM
> > Subject: Problems to clustering on tomcat
> >
> > Hi everybody,
> >
> > I need to do some tests in my solr instalation, previously  I configured
> my
> > application on a single node, and now I need to make some  tests on a
> cluster
> > configuration.
> > I followed the steps on "http://wiki.apache.org/solr/ClusteringComponent
> "
> > and when a startup the  example system everything is ok, but when I try
> to
> > run it on tomcat I receive  the error bellow, somebody have an idea?
> >
> > SEVERE: Could not start SOLR.  Check solr/home property
> > org.apache.solr.common.SolrException: Error loading  class
> > 'org.apache.solr.handler.clustering.ClusteringComponent'
> >
> > --
> > Claudio Devecchi
> > flickr.com/cdevecchi
> >
>



-- 
Claudio Devecchi
flickr.com/cdevecchi


facets - id and display value

2010-08-19 Thread Satish Kumar
Hi,

Is it possible to associate properties to a facet? For example, facet on
categoryId (1, 2, 3 etc. ) and get properties display name, image, etc?


Thanks,
Satish


Basic conceptual questions about solr

2010-08-19 Thread Shaun McArthur
I'm looking for a Google search appliance look-a-like. We have a file share 
with 1000's of documents in a hierarchy that makes it ridiculously difficult to 
locate documents.

Here are some basic questions:

Is the idea to install Solr on separate hardware and have it crawl the file 
system?
Can crawls be scheduled?
If installed on a remote server, can it be configured to insert users' local 
content in search results?
I assumed that once it's functioning, users surf to a web page for results?

Appreciate any input, and I have started to RTFJavadocs :)
Shaun


Shaun McArthur
Dir. Technical Operations
Autodata Solutions
Mobile : (226) 268-6458
Skype :shaun-mcarthur



Re: multiple values

2010-08-19 Thread Erick Erickson
The first thing I'd do is look at the document in the admin pages and
determine what you
actually have in the index. If that's OK, have you dumped your responses to
see if the
returned document has multiple entries but you're parsing is off?

Best
Erick

On Wed, Aug 18, 2010 at 5:00 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] <
xiao...@mail.nlm.nih.gov> wrote:

> Hello,
>
> I only can display one author which is last one. It looks like overwrite
> others.
>
> In xml, I have more than one name in
> .
>
> In data_config.xml, I put the  xpath="/PublishedArticles/Article/AuthorList/Author" />.
>
> In schema.xml, I put  stored="true" multiValued="true"/>.
>
> Please let me know if I did something wrong, or how I can display it in
> jsp.
>
> I really appreciate your help!
>


Re: Date sorting

2010-08-19 Thread Erick Erickson
Whew! Thanks for bringing closure to that one, it looked ugly
at the start!

Best
Erick

On Thu, Aug 19, 2010 at 2:03 AM, kirsty  wrote:

>
>
> Grijesh.singh wrote:
> >
> > provide schema.xml and solrconfig.xml to dig the problem and by which
> > version of solr u have indexed the data?
> >
> My greatest apologies, I have seen my mistake! ...looks like someone had
> added a sort into the requestHandler on another date already...which I was
> not aware of, so it seems that was causing a conflict!
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Date-sorting-tp1219372p1219574.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Basic conceptual questions about solr

2010-08-19 Thread Jan Høydahl / Cominvent
Hi,

You can place Solr wherever you want, but if your data is veery large, you'd 
want dedicated box.

Have a look at DIH (http://wiki.apache.org/solr/DataImportHandler). It can both 
crawl a file share periodically, indexing only files changed since a timestamp 
(can be e.g. NOW-1HOUR) and extract resulting text using Tika.

However if you require security, have a look at LCF 
(http://incubator.apache.org/connectors/) which adds security but may lack a 
powerful file crawler..

You choose how the results are presented back to the user, but normally it's a 
traditional web page with links which when clicked will point to that resource 
in some way.

Wrt. user's local content - what is that? Sounds like you want to hook in to a 
local search on the laptop like Google does. To do that you'd have to develop a 
local service sitting in the system tray on each computer, exposing some API on 
some port. And then when user searches your search portal, e.g. 
search.mycompany.com/?q=foo, the GUI uses some AJAX to reach out to the local 
search service and filter that in to the results...

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 19. aug. 2010, at 21.31, Shaun McArthur wrote:

> I'm looking for a Google search appliance look-a-like. We have a file share 
> with 1000's of documents in a hierarchy that makes it ridiculously difficult 
> to locate documents.
> 
> Here are some basic questions:
> 
> Is the idea to install Solr on separate hardware and have it crawl the file 
> system?
> Can crawls be scheduled?
> If installed on a remote server, can it be configured to insert users' local 
> content in search results?
> I assumed that once it's functioning, users surf to a web page for results?
> 
> Appreciate any input, and I have started to RTFJavadocs :)
> Shaun
> 
> 
> Shaun McArthur
> Dir. Technical Operations
> Autodata Solutions
> Mobile : (226) 268-6458
> Skype :shaun-mcarthur
> 



Re: specifying the doc id in clustering component

2010-08-19 Thread Stanislaw Osinski
> The solr schema has the fields, id,  name and desc.
>
>  I would like to get docs:["name Field here" ] instead of the doc Id
> field as in
> "docs":["200066", "195650",
>

The idea behind using the document ids was that based on them you could
access the individual documents' content, including the other fields, right
from the "response" field. Using ids limits duplication in the response text
as a whole. Is it possible to use this approach in your application?

Staszek