RE: running SOLR on same server as your website

2011-09-07 Thread Tim Gilbert
Just make sure that outside users can't talk directly to your solr
instance.  If they can talk to Solr, they can add/delete documents which
will affect your site.

Tim

-Original Message-
From: okayndc [mailto:bodymo...@gmail.com] 
Sent: Wednesday, September 07, 2011 10:45 AM
To: solr-user@lucene.apache.org
Subject: Re: running SOLR on same server as your website

Right now, the index is relatively small in size ~less than 1mb.  I
think
right now, it's okay but, a couple years down the road, we may have to
transfer SOLR onto a separate application server.

On Wed, Sep 7, 2011 at 10:15 AM, Jaeger, Jay - DOT
jay.jae...@dot.wi.govwrote:

 You could host Solr inside the same Tomcat container, or in a
different
 servlet container (say, a second Tomcat instance) on the same server.

 Be aware of your OS memory requirements, though:  In my experience,
Solr
 performs best when it has lots of OS memory to cache index files (at
least,
 if your index is very big).  For that reason alone, we chose to host
our
 Solr instance (used internally only) in a separate virtual machine in
its
 own web app server instance.

 It is all a matter of managing your memory, CPU and disk performance.
If
 those are already constrained or nearly constrained on your website,
then
 adding Solr into that mix is probably not such a good idea.  If those
are
 not issues on your existing website, and your Solr load is modest,
then you
 can probably squeeze it onto the same server.

 Like most real-world answers, it comes down to it depends.

 JRJ

 -Original Message-
 From: okayndc [mailto:bodymo...@gmail.com]
 Sent: Wednesday, September 07, 2011 9:02 AM
 To: solr-user@lucene.apache.org
 Subject: running SOLR on same server as your website

 Hi everyone!

 Is it not a good practice to run SOLR on the same server where you
website
 files sit?  Or is it a MUST to house SOLR on it's own application
server?
 The problem that I'm facing is that, my website's files sit on a
servlet
 container (Tomcat) and I think it would be more convenient to house
the
 SOLR
 instance on the same server?  Is this not a good idea?  What is your
SOLR
 setup?

 Thanks



Fast DIH with 1:M multValue entities

2011-04-14 Thread Tim Gilbert
We are working on importing a large number of records into Solr using
DIH.  We have one schema with ~2000 fields declared which map off to
several database schemas so that typically each document will have ~500
fields in use.  We have about 2 million rows which we are importing,
and we are seeing  20 minutes in test across 14 different entity's
which really map off to one virtual document.  Then we added our
multiValue stuff and, well, it didn't work out nearly as well. :-)

 

We have several fields which are 1:M and so in our data-config.xml we
might have something like this:

 

document name=allfund

entity name=FundId dataSource=getFundManager query={call
dbo.getFundManager_Id()}

field column=FundId name=HS04C /

entity name=FundData dataSource=getFundManager 

query={call dbo.getFundManager_Data(${FundId.FundId})}

 

field column=ManagerName name=OF015 /

/entity

/entity

/document

 

That is a lot of database queries for a small result set which is really
slowing things down for us.

 

My question is more to ask advice, so it's a multi-parter :-)

 

1)   Is there a way to declare in DIH an in-memory
lookup where we can query for the entire Many side of the query in one
database query, and match up on the PK?  Then we can declare that field
multiValued.

2)   Assuming that isn't currently available, I thought
denormalizing the 1:M into a delimited list and then using
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDel
imiterFilterFactory to tokenize.  That would allow us to search on
individual bits, and build something into the front-end to handle the
display.  That means we wouldn't use multiValued and we'd have to modify
our db but we'd lose out on some of the abilities.

3)   The third option was to open up DIH and try to add
the first feature into it ourselves.

 

Am I approaching this the right way?  Are there other ways I haven't
considered or don't know about?

 

Thanks in advance,

 

Tim



RE: Fast DIH with 1:M multValue entities

2011-04-14 Thread Tim Gilbert
How did I miss that?  Thanks, I will try that as it seems to be in
memory lookup solution I needed.

Thanks Erick,

Tim

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, April 14, 2011 10:58 AM
To: solr-user@lucene.apache.org
Subject: Re: Fast DIH with 1:M multValue entities

I'm not sure this applies, but have you looked at
http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor

http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
Best
Erick

On Thu, Apr 14, 2011 at 9:12 AM, Tim Gilbert
tim.gilb...@morningstar.comwrote:

 We are working on importing a large number of records into Solr using
 DIH.  We have one schema with ~2000 fields declared which map off to
 several database schemas so that typically each document will have
~500
 fields in use.  We have about 2 million rows which we are importing,
 and we are seeing  20 minutes in test across 14 different entity's
 which really map off to one virtual document.  Then we added our
 multiValue stuff and, well, it didn't work out nearly as well. :-)



 We have several fields which are 1:M and so in our data-config.xml we
 might have something like this:



 document name=allfund

 entity name=FundId dataSource=getFundManager query={call
 dbo.getFundManager_Id()}

 field column=FundId name=HS04C /

 entity name=FundData dataSource=getFundManager

 query={call dbo.getFundManager_Data(${FundId.FundId})}



 field column=ManagerName name=OF015 /

 /entity

 /entity

 /document



 That is a lot of database queries for a small result set which is
really
 slowing things down for us.



 My question is more to ask advice, so it's a multi-parter :-)



 1)   Is there a way to declare in DIH an in-memory
 lookup where we can query for the entire Many side of the query in one
 database query, and match up on the PK?  Then we can declare that
field
 multiValued.

 2)   Assuming that isn't currently available, I
thought
 denormalizing the 1:M into a delimited list and then using

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDel
 imiterFilterFactory to tokenize.  That would allow us to search on
 individual bits, and build something into the front-end to handle the
 display.  That means we wouldn't use multiValued and we'd have to
modify
 our db but we'd lose out on some of the abilities.

 3)   The third option was to open up DIH and try to
add
 the first feature into it ourselves.



 Am I approaching this the right way?  Are there other ways I haven't
 considered or don't know about?



 Thanks in advance,



 Tim




RE: Javabin-JSon

2011-03-29 Thread Tim Gilbert
Markus is right, this isn't the list for Java questions, but you can
look into Jackson.  Jackson is a java binder that can convert java pojos
into json.

http://jackson.codehaus.org/

I use it in Spring MVC to convert my output to json.

Tim

-Original Message-
From: paulohess [mailto:pauloh...@yahoo.com] 
Sent: Tuesday, March 29, 2011 3:16 PM
To: solr-user@lucene.apache.org
Subject: Javabin-JSon

Hi guys,

I have a Javabin object  and I need to convert that to a JSon object.
How ?
pls help?
I am using solrj (client) that doesn't support JSON so (wt=json) won't
convert it to JSon.

thanks
Paulo

--
View this message in context:
http://lucene.472066.n3.nabble.com/Javabin-JSon-tp2750066p2750066.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: keeping data consistent between Database and Solr

2011-03-15 Thread Tim Gilbert
I use Solr + MySql with data coming from several DHI type loaders that
I have written to move data from many different databases into my BI
solution.  I don't use DHI because I am not simply replicating the data,
but I am moving/merging/processing the incoming data during the loading.

For me, I have an Aspect (aspectj) which wraps my Data Access Object and
every time a persist is called (I am using hibernate), I update Solr
with the same data an instant later using @Around advice.  This handles
nearly every event during the day.  I have a simple retry procedure on
my Solrj add/commit on network error in hopes that it will eventually
work.

In case of error I rebuild the solr index from scratch each night by
recreating it based on the data in MySQL.  That takes about 10 minutes
and I run it at night.  This allows for me to have eventual
consistency for any issues that cropped up during the day. 

Obviously the size of my database ( 2 million records) makes this
approach manageable.  YMMV.

Tim

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Tuesday, March 15, 2011 9:13 AM
To: solr-user@lucene.apache.org
Subject: Re: keeping data consistent between Database and Solr

On 3/14/2011 9:38 PM, onlinespend...@gmail.com wrote:
 But my main question is, how do I guarantee that data between my
Cassandra
 database and Solr index are consistent and up-to-date?

Our MySQL database has two unique indexes.  One is a document ID, 
implemented in MySQL as an autoincrement integer and in Solr as a long.

The other is what we call a tag id, implemented in MySQL as a varchar 
and Solr as a single lowercased token and serving as Solr's uniqueKey.  
We have an update trigger on the database that updates the document ID 
whenever the database document is updated.

We have a homegrown build system for Solr.  In a nutshell, it keeps 
track of the newest document ID in the Solr Index.  If the DIH 
delta-import fails, it doesn't update the stored ID, which means that on

the next run, it will try and index those documents again.  Changes to 
the entries in the database are automatically picked up because the 
document ID is newer, but the tag id doesn't change, so the document in 
Solr is overwritten.

Things are actually more complex than I've written, because our index is

distributed.  Hopefully it can give you some ideas for yours.

Shawn



RE: Solr and Permissions

2011-03-11 Thread Tim Gilbert
What about using the BitwiseQueryParserPlugin?  

https://issues.apache.org/jira/browse/SOLR-1913

You could encode your documents with a series of permissions based on
Bit flags and then OR them on query.

Tim 

-Original Message-
From: r...@intelligencebank.com [mailto:r...@intelligencebank.com] On
Behalf Of Liam O'Boyle
Sent: Thursday, March 10, 2011 7:53 PM
To: solr-user@lucene.apache.org
Subject: Solr and Permissions

Morning,

We use solr to index a range of content to which, within our
application,
access is restricted by a system of user groups and permissions.  In
order
to ensure that search results don't reveal information about items which
the
user doesn't have access to, we need to somehow filter the results; this
needs to be done within Solr itself, rather than after retrieval, so
that
the facet and result counts are correct.

Currently we do this by creating a filter query which specifies all of
the
items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR
...)),
but this has definite scalability issues - we're starting to run into
issues, as this can be a set of ORs of potentially unlimited size (and
practically, we're hitting the low thousands sometimes).  While we can
adjust maxBooleanClauses upwards, I understand that this has performance
implications...

So, has anyone had to implement something similar in the past?  Any
suggestions for a more scalable approach?  Any advice on safe and
sensible
limits on how far I can push maxBooleanClauses?

Thanks for your advice,

Liam


uniqueKey merge documents on commit

2011-03-03 Thread Tim Gilbert
Hi,

 

I have a unique key within my index, but rather than the default
behavour of overwriting I am wondering if there is a method to merge
the two different documents on commit of the second document.  I have a
testcase which explains what I'd like to happen:

 

@Test

  public void testMerge() throws SolrServerException, IOException

  {

SolrInputDocument doc1 = new SolrInputDocument();

doc1.addField(secid, testid);

doc1.addField(value1_i, 1);



SolrAllSec.GetSolrServer().add(doc1);

SolrAllSec.GetSolrServer().commit();



SolrInputDocument doc2 = new SolrInputDocument();

doc2.addField(secid, testid);

doc2.addField(value2_i, 2);

 

SolrAllSec.GetSolrServer().add(doc2);

SolrAllSec.GetSolrServer().commit();



SolrQuery solrQuery = new  SolrQuery();

solrQuery = solrQuery.setQuery(secid:testid);

QueryResponse response =
SolrAllSec.GetSolrServer().query(solrQuery, METHOD.GET);



ListSolrDocument result = response.getResults();

Assert.isTrue(result.size() == 1);

Assert.isTrue(result.contains(value1));

Assert.isTrue(result.contains(value2));

  } 

 

Other than reading doc1 and adding the fields from doc2 and
recommitting, is there another way?

 

Thanks in advance,

 

Tim

 



RE: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-21 Thread Tim Gilbert

 Where do you get your Lucene/Solr downloads from?

 [X] ASF Mirrors (linked in our release announcements or via the Lucene
 website)

 [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

 [X] I/we build them from source via an SVN/Git checkout.

 [] Other (someone in your company mirrors them internally or via a
 downstream project)


-Original Message-
From: Juan Grande [mailto:juan.gra...@gmail.com] 
Sent: Friday, January 21, 2011 10:25 AM
To: solr-user@lucene.apache.org
Subject: Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF
Mirrors?


 Where do you get your Lucene/Solr downloads from?

 [] ASF Mirrors (linked in our release announcements or via the Lucene
 website)

 [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

 [X] I/we build them from source via an SVN/Git checkout.

 [] Other (someone in your company mirrors them internally or via a
 downstream project)



Juan Grande


RE: Mulitple facet - fq

2010-10-20 Thread Tim Gilbert
As Prasad said:

fq=(category:corporate category:personal)

But you might want to check your schema.xml to see what you have here:

!-- SolrQueryParser configuration: defaultOperator=AND|OR --
solrQueryParser defaultOperator=AND /

You can always specify your operator in your search between your facets.


fq=(category:corporate AND category:personal)

or

fq=(category:corporate OR category:personal)

I have an application where I am using searches on 10 more facets with
AND OR + and - options and it works flawlessly.

fq=(+category:corporate AND -category:personal)

meaning category is corporate and not personal.

Tim

-Original Message-
From: Pradeep Singh [mailto:pksing...@gmail.com] 
Sent: Wednesday, October 20, 2010 11:56 AM
To: solr-user@lucene.apache.org
Subject: Re: Mulitple facet - fq

fq=(category:corporate category:personal)

On Wed, Oct 20, 2010 at 7:39 AM, Yavuz Selim YILMAZ
yvzslmyilm...@gmail.com
 wrote:

 Under category facet, there are multiple selections, whicih can be
 personal,corporate or other 

 How can I get both personal and corporate ones, I tried
 fq=category:corporatefq=category:personal

 It looks easy, but I can't find the solution.


 --

 Yavuz Selim YILMAZ



RE: Mulitple facet - fq

2010-10-20 Thread Tim Gilbert
Sorry, what Pradeep said, not Prasad.  My apologies Pradeep.

-Original Message-
From: Tim Gilbert 
Sent: Wednesday, October 20, 2010 12:18 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: Mulitple facet - fq

As Prasad said:

fq=(category:corporate category:personal)

But you might want to check your schema.xml to see what you have here:

!-- SolrQueryParser configuration: defaultOperator=AND|OR --
solrQueryParser defaultOperator=AND /

You can always specify your operator in your search between your facets.


fq=(category:corporate AND category:personal)

or

fq=(category:corporate OR category:personal)

I have an application where I am using searches on 10 more facets with
AND OR + and - options and it works flawlessly.

fq=(+category:corporate AND -category:personal)

meaning category is corporate and not personal.

Tim

-Original Message-
From: Pradeep Singh [mailto:pksing...@gmail.com] 
Sent: Wednesday, October 20, 2010 11:56 AM
To: solr-user@lucene.apache.org
Subject: Re: Mulitple facet - fq

fq=(category:corporate category:personal)

On Wed, Oct 20, 2010 at 7:39 AM, Yavuz Selim YILMAZ
yvzslmyilm...@gmail.com
 wrote:

 Under category facet, there are multiple selections, whicih can be
 personal,corporate or other 

 How can I get both personal and corporate ones, I tried
 fq=category:corporatefq=category:personal

 It looks easy, but I can't find the solution.


 --

 Yavuz Selim YILMAZ



RE: Schema required?

2010-10-18 Thread Tim Gilbert
Hi Frank,

Check out the Dynamic Fields option from here
http://wiki.apache.org/solr/SchemaXml

Tim

-Original Message-
From: Frank Calfo [mailto:fca...@aravo.com] 
Sent: Monday, October 18, 2010 5:25 PM
To: solr-user@lucene.apache.org
Subject: Schema required?

We need to index documents where the fields in the document can change
frequently.

It appears that we would need to update our Solr schema definition
before we can reindex using new fields.

Is there any way to make the Solr schema optional?



--frank



RE: Advice requested. How to map 1:M or M:M relationships with support for facets

2010-09-08 Thread Tim Gilbert
Thank you for your advice.

Tim
-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: Tuesday, September 07, 2010 11:01 PM
To: solr-user@lucene.apache.org
Subject: Re: Advice requested. How to map 1:M or M:M relationships with
support for facets

These days the best practice for a 'drill-down' facet in a UI is to 
encode both the unique value of the facet and the displayable string 
into one facet value. In the UI, you unpack and show the display string,

and search with the full facet string.

If you want to also do date ranges, make a separate matching 'date' 
field. This will store the date twice. Solr schema design is all about 
denormalizing.

Tim Gilbert wrote:

 Hi guys,

 *Question:*

 What is the best way to create a solr schema which supports a 
 'multivalue' where the value is a two item array of event category and

 a date. I want to have faceted searches, counts and Date Range ability

 on both the category and the dates.

 *Details:*

 This is a person database where Person can have details about them 
 (like address) and Person have many Events. Events have a category 
 (type of event) and a Date for when that event occurred. At the bottom

 you will see a simple diagram showing the relationship. Briefly, a 
 Person has many Events and Events have a single category and a single 
 person.

 What I would like to be able to do is:

 Have a facet which shows all of the event categories, with a 
 'sub-facet' that show Category + date. For example, if a Category was 
 Attended Conference and date was 2008-09-08, I'd be able to show a 
 count of all Attended Conference, then have a tree type control and 
 show the years (for example):

 Eg.

 + Attended Conference (1038)

 |

 + 2010 (100)

 +--- 2009 (134)

 +--- 2008 (234)

 |

 + Another Event Category (23432)

 |

 +-2010 (234)

 +2009 (245)

 Etc.

 For scale, I expect to have  100 Event Categories and  a million 
 person_event records on  250,000 persons. I don't care very much 
 about disk space, so if it's a 1 GB or 100 GB due to indexing, that's 
 okay if the solution works (and its fast! J)

 *Solutions I looked at:*

 * I looked at poly but they seem to be a fixed length and appeared
   to be the same type. Typical use case was latitude  longitude.
   I don't think this will work because there are a variable number
   of events attached to a person.
 * I looked at multiValued but it didn't seem to permit two fields
   having a relationship, ie. Event Category  Event Date. It
   seemed to me that they need to be broken out. That's not
   necessarily a bad thing, but it didn't seem ideal.
 * I thought about concatenating category  date to create a fake
   fields strictly for faceting purposes, but I believe that will
   break date ranges. Eg. EventCategoryId + | + Date = 1|2009 as
   a facet would allow me to show counts for that event type. Seems
   a bit unwieldy to me...

 What's the groups advice for handling this situation in the best way?

 Thanks in advance, as always sorry if this question has been asked and

 answered a few times already. I googled for a few hours before writing

 this... but things change so fast with Solr that any article older
than 
 a year was suspect to me, also there are so many patches that provide 
 additional functionality...

 Tim

 Schema:



Advice requested. How to map 1:M or M:M relationships with support for facets

2010-09-07 Thread Tim Gilbert
Hi guys,

 

Question:

 

What is the best way to create a solr schema which supports a
'multivalue' where the value is a two item array of event category and a
date. I want to have faceted searches, counts and Date Range ability on
both the category and the dates.

 

Details:

 

This is a person database where Person can have details about them (like
address) and Person have many Events.  Events have a category (type of
event) and a Date for when that event occurred.  At the bottom you will
see a simple diagram showing the relationship.  Briefly, a Person has
many Events and Events have a single category and a single person.

 

What I would like to be able to do is:

 

Have a facet which shows all of the event categories, with a 'sub-facet'
that show Category + date.  For example, if a Category was Attended
Conference and date was 2008-09-08, I'd be able to show a count of all
Attended Conference, then have a tree type control and show the years
(for example):

 

Eg.

 

+ Attended Conference (1038)

|

+ 2010 (100)

+--- 2009 (134)

+--- 2008 (234) 

|

+ Another Event Category (23432)

|

+-2010 (234)

+2009 (245)

 

Etc.

 

For scale, I expect to have  100 Event Categories and  a million
person_event records on  250,000 persons.  I don't care very much about
disk space, so if it's a 1 GB or 100 GB due to indexing, that's okay if
the solution works (and its fast! :-))

 

 

Solutions I looked at:

 

*   I looked at poly but they seem to be a fixed length and appeared
to be the same type.  Typical use case was latitude  longitude.  I
don't think this will work because there are a variable number of events
attached to a person.
*   I looked at multiValued but it didn't seem to permit two fields
having a relationship, ie. Event Category  Event Date.  It seemed to me
that they need to be broken out.  That's not necessarily a bad thing,
but it didn't seem ideal.
*   I thought about concatenating category  date to create a fake
fields strictly for faceting purposes, but I believe that will break
date ranges.  Eg.  EventCategoryId + | + Date  = 1|2009 as a facet
would allow me to show counts for that event type.  Seems a bit unwieldy
to me... 

 

What's the groups advice for handling this situation in the best way?

 

Thanks in advance, as always sorry if this question has been asked and
answered a few times already.  I googled for a few hours before writing
this... but things change so fast with Solr that any article older than
a year was suspect to me, also there are so many patches that provide
additional functionality... 

 

Tim

 

 

 

 

Schema:

 



RE: date boosting and dismax

2010-07-14 Thread Tim Gilbert
I used this before my search term and it works well:

{!boost b=recip(ms(NOW,publishdate),3.16e-11,1,1)}

Its enough that when I search for *:* the articles appear in
chronological order.

Tim

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Wednesday, July 14, 2010 11:47 AM
To: solr-user@lucene.apache.org
Subject: date boosting and dismax


  I've started a couple of previous threads on this topic, but I did not

have a good date field in my index to use at the time.  I now have a 
schema with the document's post_date in tdate format, so I would like to

actually do some implementation.  Right now, we are not doing relevancy 
ranking at all - we sort by descending post_date.  We have been working 
on our application code so we can switch to dismax and use relevancy, 
but it's still important to have a small bias towards newer content.

The idea is nothing this list hasn't heard before - to give newer 
documents a slight relevancy boost.  An important sub-goal is to ensure 
that the adjustment doesn't render Solr's caches useless.  I'm thinking 
that this means that at a minimum, I need to round dates to a resolution

of 1 day, but if it's doable, 1 week might be even better.  I do like 
the idea of having different boosts for different time ranges.

Can anyone give me a starting point on how to do this?  I will need 
actual URL examples and dismax configuration snippets.

Thanks,
Shawn



RE: date boosting and dismax

2010-07-14 Thread Tim Gilbert
Re: flexibility.

This boost does decays over time, the further it gets from now the less
of a boost it receives.  You are right though, it doesn't allow a fine
degree of control, particularly if you don't want to smoothly decay the
boost.  I hadn't considered your suggestion, so I'll keep it in mind if
the need arises.

Re:  Adding boost to query:

I am no expert, but I did this and it worked:

SolrJ:  solrQuery.setQuery({!boost
b=recip(ms(NOW,publishdate),3.16e-11,1,1)}  + queryparam);

Where queryparam is what you are searching for.  You quite literally
just prepend it.


Via http://localhost:8080/apache-solr-1.4.0/select, just prepend it to
your q= like this: 
q={!boost+b%3Drecip(ms(NOW,publishdate),3.16e-11,1,1)}+findthis

Tim

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Wednesday, July 14, 2010 1:16 PM
To: solr-user@lucene.apache.org
Subject: Re: date boosting and dismax

One of the replies I got on a previous thread mentioned range queries, 
with this example:

[NOW-6MONTHS TO NOW]^5.0 ,
[NOW-1YEARS TO NOW-6MONTHS]^3.0
[NOW-2YEARS TO NOW-1YEARS]^2.0
[* TO NOW-2YEARS]^1.0

Something like this seems more flexible, and into it, I read an 
implication that the performance would be better than the boost function

you've shown, but I don't know how to actually put it into a URL or 
handler config.

I also seem to remember seeing something about how to do less than in 
range queries as well as the less than or equal to implied by the 
above, but I cannot find it now.

Thanks,
Shawn


On 7/14/2010 10:26 AM, Tim Gilbert wrote:
 I used this before my search term and it works well:

 {!boost b=recip(ms(NOW,publishdate),3.16e-11,1,1)}

 Its enough that when I search for *:* the articles appear in
 chronological order.

 Tim



RE: Foreign characters question

2010-07-13 Thread Tim Gilbert
I had the same problem, the correction differs by which application server you 
are using.  

If it's Tomcat, try here:  http://wiki.apache.org/solr/SolrTomcat near uri 
charset. 

I use glassfish, and I added this entry to the wiki after getting help from 
this group:  http://wiki.apache.org/solr/SolrGlassfish 

I hope this helps.

Tim

-Original Message-
From: Blargy [mailto:zman...@hotmail.com] 
Sent: Tuesday, July 13, 2010 12:55 PM
To: solr-user@lucene.apache.org
Subject: Foreign characters question


I am trying to add the following synonym while indexing/searching

swimsuit, bañadores, bañador

I testing searching for bañadores however it didn't return any results.
After further inspection I noticed in the field analysis admin that swimsuit
gets expanded to ba�adores. Not sure if it will show up but the n is a
black diamond with a white question mark in it. 

So basically, how can I add support for foreign characters?  Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p964078.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: TikaEntityProcessor on Solr 1.4?

2010-06-08 Thread Tim Gilbert
When I wanted to add some content to the solrj wiki for glassfish, I had a 
problem in that their anti-spam measures broke the ability to create a new 
account.  Someone here (Chris I think) was kind enough to create a ticket in 
the correct place:

https://issues.apache.org/jira/browse/INFRA-2726

You can see it was very quickly solved.  I am not suggesting that the problem 
is the same, only that this may be the correct place to create a new ticket 
with the problem of getting a file from the wiki and perhaps someone can help 
you there.

Tim

-Original Message-
From: Sixten Otto [mailto:six...@sfko.com] 
Sent: Tuesday, June 08, 2010 3:53 PM
To: solr-user@lucene.apache.org
Subject: Re: TikaEntityProcessor on Solr 1.4?

2010/5/22 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com:
 just copy the dih-extras jar file from the nightly should be fine

Now that I've finally got a server on which to attempt to set these
things up... this turns out not to be a viable solution. The extras
jar does contain the TikaEntityProcessor class, but NOT the
BinFileDataSource and BinURLDataSource on which it depends. I tried
both replacing the 1.4 DIH jar with the one from the trunk, and adding
those two specific classes to the extras jar, neither of which worked.
(And I apologize, but I didn't copy down the exceptions involved; if I
can find some free time, I might go back and make the attempt again, a
bit more methodically.)

Sixten


RE: solrj Unicode queries don't return results

2010-06-07 Thread Tim Gilbert
I had the same problem a while back. You didn't mention which
application server you are using (if any) but some application servers
have problems with UTF-8 queries and GET.

Tomcat has a well documented solution
http://wiki.apache.org/solr/SolrTomcat (near the bottom), I recently
experienced problems with glassfish and switched to post to solve it
(http://wiki.apache.org/solr/SolrGlassfish)

Tim

-Original Message-
From: jlist9 [mailto:jli...@gmail.com] 
Sent: Monday, June 07, 2010 2:33 PM
To: solr-user@lucene.apache.org
Cc: dioxide.softw...@gmail.com
Subject: solrj Unicode queries don't return results

Hi, I'm having a problem with Unicode queries using solrj.
I have an index with unicode strings. From /solr/admin web interface,
I can find results using the Java unicode format, such as \u751f\u6d3b.
(If I just type in a UTF-8 string, I can't find any result though. Not
sure why.)

But in solrj, I tried having the string in UTF-8 in UTF-8 encoded Java
source
file, and I also tried using the Java unicode format in query.setQuery(
),
but none of these approaches return any results.

When I searched online, I found a similar question here w/o no answers.
http://www.mail-archive.com/solr-user@lucene.apache.org/msg21380.html

So what's the right way of doing unicode queries with solrj?

Thank you,
Jack


RE: Auto-suggest internal terms

2010-06-02 Thread Tim Gilbert
I was interested in the same thing and stumbled upon this article:

http://www.mattweber.org/2009/05/02/solr-autosuggest-with-termscomponent
-and-jquery/

I haven't followed through, but it looked promising to me.

Tim

-Original Message-
From: Jay Hill [mailto:jayallenh...@gmail.com] 
Sent: Wednesday, June 02, 2010 4:02 PM
To: solr-user@lucene.apache.org
Subject: Auto-suggest internal terms

I've got a situation where I'm looking to build an auto-suggest where
any
term entered will lead to suggestions. For example, if I type wine I
want
to see suggestions like this:

french *wine* classes
*wine* book discounts
burgundy *wine*

etc.

I've tried some tricks with shingles, but the only solution that worked
was
pre-processing my queries into a core in all variations.

Anyone know any tricks to accomplish this in Solr without doing any
custom
work?

-Jay


RE: SolrJ Unicode problem

2010-05-28 Thread Tim Gilbert
I had a similar problem a few days ago and I found that the documents where not 
being loaded correctly as UTF-8 into Solr.  In my case, the loader program was 
a Java.jar I was executing from a cron job.  There I added this:

java -Dfile.encoding=UTF-8 -jar /home/tim/solr/bin/loadSiteSearch.jar

Then, within that program, I wrote function to take the strings I was loading 
and expressly declare them as UTF-8 like this:

private String toUTF8(String value)
{
return new String(value.getBytes(), UTF-8);
}

and that solved the problem for me.

Tim

-Original Message-
From: Hugh Cayless [mailto:philomou...@gmail.com] 
Sent: Friday, May 28, 2010 12:51 PM
To: solr-user@lucene.apache.org
Subject: SolrJ Unicode problem

Hi, I'm a solr newbie, and I'm hoping someone can point me in the right 
direction.

I'm trying to index a bunch of documents with Greek text in them.  I can 
successfully index documents by generating add xml and using curl to send them 
to my server, but when I use solrj to create and send documents, the encoding 
gets throughly messed up.


Instead of the result (from an add doc posted with curl):

result name=response numFound=1 start=0
  doc
str name=idc.etiq.mom;;2077/str
str name=transcriptionΤης Βησο ς Χρη εις Πανοπολίτης/str
  /doc
/result

I get (from a SolrInputDocument loaded with solrj):

result name=response numFound=1 start=0 
 doc 
  str name=idc.etiq.mom;;2077/str 
  str name=transcription???  ? ??? ??? �??/str 
 /doc 
/result

I can confirm that the SolrInputDocument's transcription field contains Greek 
text before I call .add(documents) on the StreamingUpdateSolrServer (i.e., I 
can get Greek back out of it).  So I don't know what to do next.  Any ideas?

Thanks,
Hugh


Non-English query via Solr Example Admin corrupts text

2010-05-20 Thread Tim Gilbert
Hi guys/gals,

 

I am using apache-solr-1.4.0.war deployed to glassfishv3 on my development 
machine which is Ubuntu 9.10 64-bit.  I am using Solrj 1.4 using the 
CommonsHttpSolrServer connection to that Solr instance 
(http://localhost:8080/apache-solr-1.4.0) during my development.  To simplify 
things however, I have found that I can duplicate my issue directly from Solr 
example admin page so for ease of confirmation, I will use the Solr Example 
Admin page for this example:

 

I deployed the apache-solr-1.4.0/dist/apache-solr-1.4.0.war file to my 
glassfishv3 application server.  It deploys successfully.  I access 
http://localhost:8080/apache-solr-1.4.0/admin/form.jsp and enter into 
Solr/Lucene Statement textarea this word:

 

numéro  (Note the é)

 

When I check the server.log file, I see this:

 

INFO: [] webapp=/apache-solr-1.4.0 path=/select 
params={indent=onversion=2.2q=numérofq=start=0rows=10fl=*,scoreqt=standardwt=standardexplainOther=hl.fl=}
 hits=0 status=0 QTime=16 

 

As well, the output from the Admin system is with the same incorrect decoding.

 

 

 

In my SolrJ using application, I have a test case which queries for numéro 
and succeeds if I use Embedded and fails if I use CommonsHttpSolrServer... I 
don't want to use embedded for a number of reasons including that its not 
recommended (http://wiki.apache.org/solr/EmbeddedSolr)

 

I am sorry if you'd dealt with this issue in the past, I've spent a few hours 
googling for solr utf-8 query and glassfishv3 utf-8 uri  plus other 
permutations/combinations but there were seemingly endless amounts of chaff 
that I couldn't find anything useful after scouring it for a few hours.  I 
can't decide whether it's a glassfish issue or not so I am not sure where to 
direct my energy.  Any tips or advice are appreciated! 

 

Thanks in advance,

 

Tim Gilbert



RE: Non-English query via Solr Example Admin corrupts text

2010-05-20 Thread Tim Gilbert
Chris,

You are the best.  Switching to POST solved the problem.  I hadn't noticed that 
option earlier but after finding: 
https://issues.apache.org/jira/browse/SOLR-612 I found the option in the code.

Thank you, you just made my day.

Secondly, in an effort to narrow down whether this was a glassfish issue or 
not, here is what I found.

Starting with glassfishv3 (I think) UTF-8 is the default for URI.  You can see 
this by going to the admin site, clicking on Network Config | Network Listeners 
| then select the listener.  Select the tab HTTP and about half way down, you 
will see URI Encoding: UTF-8.

HOWEVER, that doesn't appear to be correct because following Abdelhamid Abid's 
advice, I deployed Solr to Tomcat, then followed the direction here:
http://wiki.apache.org/solr/SolrTomcat to force tomcat to UTF-8 for URI.  Then 
I deployed Solr to tomcat, and using CommonsHttpSolrServer, connected to that 
tomcat served instance.  It worked- first time.

So, it appears that there is a problem with glassfishv3 and UTF-8 URI's for at 
least the apache-solr-1.4.0.war.  I wonder if I added that sun-web.xml file 
into the war to force UTF-8 it might work... not sure.  However, the workaround 
is to change the method to POST as Chris suggested.  You can do that in Solrj 
here:

server.query(solrQuery, METHOD.POST);

and it works as you'd expect.

Thanks for the advice/tips,

Tim

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Thursday, May 20, 2010 2:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Non-English query via Solr Example Admin corrupts text


: I am using apache-solr-1.4.0.war deployed to glassfishv3 on my 
...
: INFO: [] webapp=/apache-solr-1.4.0 path=/select 
: 
params={indent=onversion=2.2q=numérofq=start=0rows=10fl=*,scoreqt=standardwt=standardexplainOther=hl.fl=}
 
: hits=0 status=0 QTime=16
...
: In my SolrJ using application, I have a test case which queries for 
: numéro and succeeds if I use Embedded and fails if I use 
: CommonsHttpSolrServer... I don't want to use embedded for a number of 
...
: I am sorry if you'd dealt with this issue in the past, I've spent a few 
: hours googling for solr utf-8 query and glassfishv3 utf-8 uri plus other 
: permutations/combinations but there were seemingly endless amounts of 
: chaff that I couldn't find anything useful after scouring it for a few 
: hours.  I can't decide whether it's a glassfish issue or not so I am not 
: sure where to direct my energy.  Any tips or advice are appreciated!

I suspect if you switched to using POST instead of GET your problem would 
go away -- this stems from amiguity in the way HTTP servers/browsers deal 
with encoding UTF8 in URLs.  a quick search for glassfish url encoding 
turns up this thread...

  http://forums.java.net/jive/thread.jspa?threadID=38020

which refreneces...

http://wiki.glassfish.java.net/Wiki.jsp?page=FaqHttpRequestParameterEncoding

...it looks like you want to modify the default-charset attribute of the 
parameter-encoding


-Hoss


RE: Non-English query via Solr Example Admin corrupts text

2010-05-20 Thread Tim Gilbert
I wanted to improve the documentation in the solr wiki by adding in my
findings.  However, when I try to log in and create a new account, I
receive this error message:

You are not allowed to do newaccount on this page. Login and try again.

Does anyone know how I can get permission to add a page to the
documentation?

Tim


-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Thursday, May 20, 2010 3:21 PM
To: solr-user@lucene.apache.org
Subject: RE: Non-English query via Solr Example Admin corrupts text


: Starting with glassfishv3 (I think) UTF-8 is the default for URI.  You

: can see this by going to the admin site, clicking on Network Config | 
: Network Listeners | then select the listener.  Select the tab HTTP
and 
: about half way down, you will see URI Encoding: UTF-8.
: 
: HOWEVER, that doesn't appear to be correct because following
Abdelhamid 
...

I know nothing about glassfish, but according to that forum URL i 
mentioned before, the URI Encoding option in glassfish explicitly (and
evidently  
contenciously) does not apply to hte query args -- only the path, hence 
the two different config options mentioned in the FAQ...


:   http://forums.java.net/jive/thread.jspa?threadID=38020
...
:
http://wiki.glassfish.java.net/Wiki.jsp?page=FaqHttpRequestParameterEnco
ding



-Hoss