Re: Solr CMS Integration

2009-08-07 Thread Tim Archambault
I would second that and add that you may want to consider acquia.com as they
provide a solid infrustracture to support the solr instance.

On Fri, Aug 7, 2009 at 11:20 AM, Andre Hagenbruch
andre.hagenbr...@rub.dewrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 wojtekpia schrieb:

 Hi Wojtek,

  I've been asked to suggest a framework for managing a website's content
 and
  making all that content searchable. I'm comfortable using Solr for
 search,
  but I don't know where to start with the content management system. Is
  anyone using a CMS (open source or commercial) that you've integrated
 with
  Solr for search and are happy with? This will be a consumer facing
 website
  with a combination or articles, blogs, white papers, etc.

 if you're comfortable with PHP you might want to look at Drupal
 (http://drupal.org/project/apachesolr) which sounds like a good match
 for your requirements...

 Regards,

 Andre
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.9 (Darwin)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

 iEYEARECAAYFAkp8YlQACgkQ3wuzs9k1icVFSACgjRy7AOd+Aney7LDmpWTaIssz
 p74AnAn+/5So+qSfpfbXOXShCYZfAppS
 =zqHU
 -END PGP SIGNATURE-




-- 
Contact me:
801.850.2953 (cell or sms)
facebook: http://www.facebook.com/profile.php?id=534661678
LinkedIn: http://www.linkedin.com/profile?viewProfile=key=3902213
website:scanalytix.com


Re: SOLR developer

2007-08-31 Thread Tim Archambault
Thanks. I didn't mean to send that to the list-serv :}

On 8/31/07, Bertrand Delacretaz [EMAIL PROTECTED] wrote:

 On 8/31/07, Tim Archambault [EMAIL PROTECTED] wrote:
  ...I'm thinking of sending a similar
  list-serv item out, but I noticed this is a solr-user list, not
 necessarily
  a developers list so I thought I'd ask

 Note that there's also [EMAIL PROTECTED] for such purposes, see
 http://www.apachenews.org/archives/000465.html

 But AFAIK, project-related job offers are ok on ASF lists, preferably
 with a [JOB] marker in the subject line.

 -Bertrand (*not* available for consulting ATM, and currently inactive
 on Solr anyway)



Re: SOLR developer

2007-08-30 Thread Tim Archambault
Mark,

Did you get any responses to your inquiry? I'm thinking of sending a similar
list-serv item out, but I noticed this is a solr-user list, not necessarily
a developers list so I thought I'd ask.

I'm looking for someone to integrate with Drupal.

Tim Archambault
Online Manager
Bangordailynews.com, Bangor, ME

On 8/26/07, Mark Jarecki [EMAIL PROTECTED] wrote:

 Hi all

 We are looking for a developer to set-up a SOLR search server for a
 Django-based website.

 We are based in Melbourne, Australia.

 If you're interested, or have any questions send me an email:
 [EMAIL PROTECTED]

 Cheers

 Mark Jarecki



Re: PriceJunkie.com using solr!

2007-05-17 Thread Tim Archambault

I did a search and noticed pages were executed through aspx. Are you using
.net to parse the xml results from SOLR? Nice site, just trying to figure
out where SOLR fits into this.

On 5/16/07, Mike Austin [EMAIL PROTECTED] wrote:



I just wanted to say thanks to everyone for the creation of solr.  I've
been
using it for a while now and I have recently brought one of my side
projects
online.  I have several other projects that will be using solr for it's
search and facets.

Please check out www.pricejunkie.com and let us know what you think.. You
can give feedback and/or sign up on the mailing list for future updates.
The site is very basic right now and many new and useful features plus
merchants and product categories will be coming soon!  I thought it would
be
a good idea to at least have a few people use it to get some feedback
early
and often.

Some of the nice things behind the scenes that we did with solr:
- created custom request handlers that have category to facet to attribute
caching built in
- category to facet management
- ability to manage facet groups (attributes within a set facet)
and assign
them to categories
- ability to create any category structure and share facet groups

- facet inheritance for any category (a facet group can be defined on a
parent category and pushed down to all children)
- ability to create sub-categories as facets instead of normal sub
categories
- simple xml configuration for the final outputted category configuration
file


I'm sure there are more cool things but that is all for now.  Join the
mailing list to see more improvements in the future.

Also.. how do I get added to the Using Solr wiki page?


Thanks,
Mike Austin




Re: SOLR hosting

2007-03-23 Thread Tim Archambault

Is your question inherently asking if someone out there provides a service
that manages the indexes, etc for you and pre-installs and configures the
software?

If NOT, I can tell you that I bought a Linux VPS at Hostmysite.com cheaply
and dedicated 1 virtual domain to my SOLR instance and it worked fairly
easily. I'm no tech expert and got it to run.

Hope that helps.

Tim

On 3/21/07, Michael Kimsal [EMAIL PROTECTED] wrote:


Are there any companies that offer hosted SOLR services?

If not, is there any interest in the community in a service like this?


--
Michael Kimsal
http://webdevradio.com



Re: Editing wiki-page Powerd by Solr

2007-03-23 Thread Tim Archambault

fabio,

Off topic, but thanks for the link to your newspaper classifieds. I manage
newspaper website here in Maine, USA and am VERY INTERESTED in  using solr
to power our jobs, etc.

Looking to integrate SOLR with DRUPAL right now.

I'd like to collaborate with you in the future if possible.

Thank you kindly.

Tim

On 3/23/07, Fabio Confalonieri [EMAIL PROTECTED] wrote:



I have a problem posting an update to the Powered By Solr wiki page.

I would like to add the line:
* [http://annunci.repubblica.it La Repubblica Newspaper Classifieds] (in
Italian) uses Solr for faceted browsing/filtering through classifieds of
one
of the main Italian Newspapers

But I receive this error:
Sorry, can not save page because annunci.repubblica.it is not allowed in
this wiki.

I understand annunci.repubblica.it is somehow blacklisted, but I cannot
argue why.

Sorry for posting here, I could not find a reference on wiki
posting/editing.

Thank You

Fabio Confalonieri




--
View this message in context:
http://www.nabble.com/Editing-wiki-page-%22Powerd-by-Solr%22-tf3454859.html#a9638264
Sent from the Solr - User mailing list archive at Nabble.com.




shared server

2006-10-26 Thread Tim Archambault

Signed up for hosting at performancehosting.net which has shared Tomcat
services. I'm using it to play with Solr.

I've installed the Solr war file from the downloadable Solr Nightly
Update zip through the Tomcat interface.

Obviously nothing works. I know nothing about Java so can anyone give me a
hint as to what variables need to be adjusted to work in this scenario?

http://strategic-points.com:8180/solr/admin/stats.jsp: Unable to compile
class for JSP
http://strategic-points.com:8180/solr/admin/index.jsp HTTP Status 500

Any help is greatly appreciated.

Tim


Re: shared server

2006-10-26 Thread Tim Archambault

I figured it had something to do with making changes and recompiling the
war. I've got it working fine locally with Jetty. It' great.

Thanks.


On 10/26/06, Yonik Seeley [EMAIL PROTECTED] wrote:


One potential problem is the old tomcat version (5.5.9)... I haven't
tried that version myself.
You might want to verify locally that it works.

More likely, it's the configuration of their shared tomcat services.
The solr.war alone won't work w/o config like a schema and solrconfig.xml
If you can, put the solr home (containing conf, data, etc) in the
directory that tomcat starts in.

Baring that, you may have to make a custom war by exploding the Solr
war and editing the web.xml, pointing solr.home to the correct place.

-Yonik

On 10/26/06, Tim Archambault [EMAIL PROTECTED] wrote:
 Signed up for hosting at performancehosting.net which has shared Tomcat
 services. I'm using it to play with Solr.

 I've installed the Solr war file from the downloadable Solr Nightly
 Update zip through the Tomcat interface.

 Obviously nothing works. I know nothing about Java so can anyone give me
a
 hint as to what variables need to be adjusted to work in this scenario?

 http://strategic-points.com:8180/solr/admin/stats.jsp: Unable to compile
 class for JSP
 http://strategic-points.com:8180/solr/admin/index.jsp HTTP Status 500

 Any help is greatly appreciated.

 Tim



Re: shared server

2006-10-26 Thread Tim Archambault

Yonik,

*put the solr home (containing conf, data, etc) in the directory that
tomcat starts in*

Would this be related to the instance of Tomcat I'm using?

I supposed this could be available under my root. If it's where Tomcat
starts on the server, I doubt I can place any files in that directory.

Thanks for the help.





On 10/26/06, Yonik Seeley [EMAIL PROTECTED] wrote:


One potential problem is the old tomcat version (5.5.9)... I haven't
tried that version myself.
You might want to verify locally that it works.

More likely, it's the configuration of their shared tomcat services.
The solr.war alone won't work w/o config like a schema and solrconfig.xml
If you can, put the solr home (containing conf, data, etc) in the
directory that tomcat starts in.

Baring that, you may have to make a custom war by exploding the Solr
war and editing the web.xml, pointing solr.home to the correct place.

-Yonik

On 10/26/06, Tim Archambault [EMAIL PROTECTED] wrote:
 Signed up for hosting at performancehosting.net which has shared Tomcat
 services. I'm using it to play with Solr.

 I've installed the Solr war file from the downloadable Solr Nightly
 Update zip through the Tomcat interface.

 Obviously nothing works. I know nothing about Java so can anyone give me
a
 hint as to what variables need to be adjusted to work in this scenario?

 http://strategic-points.com:8180/solr/admin/stats.jsp: Unable to compile
 class for JSP
 http://strategic-points.com:8180/solr/admin/index.jsp HTTP Status 500

 Any help is greatly appreciated.

 Tim



Re: shared server

2006-10-26 Thread Tim Archambault

Interesting, these files that you speak of are not even in my ftp site. They
must not be part of the war file. This might actually make sense because
they are actually the config files that run the Solr model if I'm not
mistaken (excuse my simpleness here).

On 10/26/06, Yonik Seeley [EMAIL PROTECTED] wrote:


On 10/26/06, Tim Archambault [EMAIL PROTECTED] wrote:
 Yonik,

 *put the solr home (containing conf, data, etc) in the directory that
 tomcat starts in*

 Would this be related to the instance of Tomcat I'm using?

Probably, yes.

 I supposed this could be available under my root. If it's where Tomcat
 starts on the server, I doubt I can place any files in that directory.

The best thing then would be to edit the web.xml and bake in the
location of where you can put the solr home (containing ./conf ./data
)

The system property you would want to set is solr.solr.home
http://wiki.apache.org/solr/ConfiguringSolr

Or if you have access to the Tomcat config directories, you could use
the method described under Multiple Solr Webapps here:
http://wiki.apache.org/solr/SolrTomcat

-Yonik



Re: Simple Faceted Searching out of the box

2006-09-22 Thread Tim Archambault

I have a couple of questions from some online newspaper folks who are
interested in Solr and are trying to understand how and why it came to be. I
think inherent in these questions is the underlying theme I hear all the
time and that is Solr is not a content management system. It's a search
engine.

What I really wonder about CNet is how they manage their content and how
Solr fits into their overall architecture -- is it an add-on? a
purpose-built hammer to handle a specific problem they were having? was it
something they wanted ... or instead something they needed to do, despite
preferring something else?

Another question asked of me was Will Solr ever connect with datasources
directly?

Thanks in advance for any feedback I can supply the folks.

Tim


On 9/10/06, Chris Hostetter [EMAIL PROTECTED] wrote:



:   What is faceted browsing? Maybe an example of a site interface

Whoops! ... sorry about that, i tend to get ahead of my self.

The examples Erik pointed out are very representative, but there are more
subtle ways faceted searching can come into play -- for example, if you
look at these two search results...

  http://shopper-search.cnet.com/search?q=gta
  http://shopper-search.cnet.com/search?q=ipod

...the categories in the left nav change based on what you search on,
because we treat category as a facet, and the individual categories as
possible constraints ... we don't show the user the exact count of how
many products match in each category but we use that information to
determine the order of the categories (or wether we should include a
category in the list at all)

: website and this would be a great way to break out content. Kind of
greys
: the lines between what is search and what is browsing categories, which
is a
: great thing actually. Thanks for the help.

Even without facets, browsing a set of documents is just a search for
all docuemnts (or depending on who you talk to: searching is just
browsing with a special user entered constraint on the text facet)




-Hoss




Re: Simple Faceted Searching out of the box

2006-09-22 Thread Tim Archambault

Obvious datasources: MSSQL, MySQL, etc. I'm under the impression that I have
to send an XML request to SOLR for every add, update, delete, etc. in my
database.

I believe there's a way to access MSSQL, MySQL etc. directly with Lucene,
but not sure how to do this with SOLR.

Thanks for all your feedback. While I started out way over my head. Solr is
actually fun to play around with, even for non-programmers or marginal
programmers like myself.

On 9/22/06, Yonik Seeley [EMAIL PROTECTED] wrote:


On 9/22/06, Tim Archambault [EMAIL PROTECTED] wrote:
 I have a couple of questions from some online newspaper folks who are
 interested in Solr and are trying to understand how and why it came to
be. I
 think inherent in these questions is the underlying theme I hear all the
 time and that is Solr is not a content management system. It's a search
 engine.

 What I really wonder about CNet is how they manage their content and how
 Solr fits into their overall architecture -- is it an add-on? a
 purpose-built hammer to handle a specific problem they were having? was
it
 something they wanted ... or instead something they needed to do,
despite
 preferring something else?

Putting on my CNET hat for a little history:

We had a search server... a very thin layer built around a proprietary
search engine, used in a ton of places, for search-box type
functionality and direct generation of dynamic content.

That search engine was being discontinued by the vendor, so a
replacement was needed.  RFPs were put out, and all the commercial
alternatives were examined, but licensing costs  for the number of
servers we were talking about was exorbitant.

So we decided to build our own...

The replacement: ATOMICS- a MySQL/Apache hybrid.
http://conferences.oreillynet.com/cs/mysqluc2005/view/e_sess/7066
It works well for many of the search collections we have that don't
need much in the way of full-text search (MySQL does have full-text
capabilities, but nothing like Lucene).

Backup plan: something based on Lucene.
SOLAR really started out as a pure backup plan... just in case ATOMICS
had problems in some areas.  I had joined CNET a week earlier, and the
task of building something lucene-based was luckily handed to me as
I didn't have any other responsibilities yet.  Pretty much no
requirements except for the preference of something that spoke
HTTP/XML that could be put behind a load-balancer and scaled.

ATOMICS was pretty much done by the time I started on SOLAR, and was
rapidly deployed across CNET.  SOLAR had a tough time gaining traction
until someone crossed a problem that ATOMICS couldn't easily handle:
faceted browsing.  There was finally something concrete to aim for,
and filter caching, docsets, autowarming, custom query handlers, etc,
were rapidly added to allow the ability to write custom plugins that
could acutally do the faceting logic.

The result:
http://www.mail-archive.com/java-user@lucene.apache.org/msg02645.html

It soulds like Hoss might go into some more details in his ApacheCon
session:
http://www.us.apachecon.com/html/sessions.html#FR26

 Another question asked of me was Will Solr ever connect with
datasources
 directly?

As far as where Solr fits into our architecture, it's a back-end
component in the generation of dynamic content... sort of the same
place that a database would occupy.

I don't know much about content generation in CNET, and specific
content manangement syustems, but a lot of it ends up in databases.
An indexer piece normally pulls stuff from one or more databases,
and puts them into a solr master, which is replicated out to solr
searchers (or slaves) that the app-servers generating dynamic content
hit through a load-balancer.

There is a diagram of that from my ApacheCon presentation:
http://people.apache.org/~yonik/ApacheConEU2006/

As far as connecting to datasources directly... I think that being
able to pull content from a database is a good idea, and It's on the
todo list.  What specific other data sources did you have in mind?

-Yonik



Re: Simple Faceted Searching out of the box

2006-09-22 Thread Tim Archambault

Okay, I'll use an example.

A recruitment (jobs) customer goes onto our website and posts an online job
posting to our newspaper website. Upon insert into the database, I need to
generate an xml file to be sent to SOLR to ADD as  a record to the search
engine. Same  goes for an edit, my database updates the record and then I
have to send an ADD statement to Solr again to commit my change. 2x the
work.

I've been talking with other papers about Solr and I think what bothers many
is that there a is a deposit of information in a structured database here
[named A], then we have another set of basically the same data over here
[named B] and they don't understand why they have to manage to different
sets of data [A  B] that are virtually the same thing.  Many foresee a
maintenance nightmare. I've come to the conclusion that there's somewhat of
a disconnect between what a database does and what a search engine does. I
accept that the redundancy is necessary given the very different tasks that
each performs [keep in mind I'm still naive to the programming details here,
I understand conceptually].

In writing this to you another thought came to mind. Maybe there are
alternative ways to inject records into Solr outside the bounds of the
cygwin and CURL examples I've been using. Maybe that is the question we need
to be asking. What are some alternative ways to populate Solr?

Enough said, it's Friday afternoon.

Have a great weekend.

Tim

On 9/22/06, Erik Hatcher [EMAIL PROTECTED] wrote:



On Sep 22, 2006, at 2:45 PM, Tim Archambault wrote:
 I believe there's a way to access MSSQL, MySQL etc. directly with
 Lucene,
 but not sure how to do this with SOLR.

Nope.  Lucene is a pure search engine, with no hooks to databases, or
document parsers, etc.  Lots of folks have built these kinds of
things on top of Lucene, but the Lucene core is purely the text engine.

How would you envision communicating with Solr with a database in the
picture?   How would the entire database be initially indexed?  How
would changes to the database trigger Solr updates?   I'm not quite
clear on what it would mean for Solr to work with a database directly
so I'm curious.

Erik




Re: Simple Faceted Searching out of the box

2006-09-22 Thread Tim Archambault

I'm really confused. I don't mean store the data figuratively as in a
lucene/solr command. Storing an ID number in a solr index isn't going to
help a user find nurse. I think part of this is that some people feel that
databases like MSSQL, MYSQL should be able to provide quality search
experience, but they just flat out don't. It's a separate utility.

Thanks Walter.

On 9/22/06, Walter Underwood [EMAIL PROTECTED] wrote:


On 9/22/06 12:25 PM, Tim Archambault [EMAIL PROTECTED]
wrote:

 A recruitment (jobs) customer goes onto our website and posts an online
job
 posting to our newspaper website. Upon insert into the database, I need
to
 generate an xml file to be sent to SOLR to ADD as  a record to the
search
 engine. Same  goes for an edit, my database updates the record and then
I
 have to send an ADD statement to Solr again to commit my change. 2x the
 work.

 I've been talking with other papers about Solr and I think what bothers
many
 is that there a is a deposit of information in a structured database
here
 [named A], then we have another set of basically the same data over here
 [named B] and they don't understand why they have to manage to different
 sets of data [A  B] that are virtually the same thing.

The work isn't duplicated. Two servers are building two kinds of index,
a transactional record index and a text index. That is two kinds of
work, not a duplication.

Storing the data is the small part of a database or a search engine.
The indexes are the real benefit.

In fact, the data does not have to be stored in Solr. You can return a
database key as the only field, then get the details from the database.
That is  how our current search works -- the search result is a list
of keys in relevance order. Period.

wunder
--
Walter Underwood
Search Guru, Netflix




Re: Simple Faceted Searching out of the box

2006-09-22 Thread Tim Archambault

Okay. We are all on the same page. I just don't express myself as well in
programming speak yet.

I'm going to read up on Otis' Lucene in Action tonight. I'd swear he had
an example of how to inject records into a lucene index using java and sql.
Maybe I'm wrong though.



On 9/22/06, Walter Underwood [EMAIL PROTECTED] wrote:


Sorry, I was not being exact with store. Lucene has separate
control over whether the value of a field is stored and whether
it is indexed. The term nurse might be searchable, but the
only value that is stored in the index for retrieval is the
database key for each matching job.

It seems like text search should be easy to add to a transactional
database, but lots of smart people have tried to make that work
and failed. Maybe it is possible, but neither Oracle nor Microsoft
nor the open source community have been able to make it happen.
The text search in RDBMSs seems to always be slow and lame.

There is one product that does transactional query and text
search: MarkLogic. It does a good job of both, but it is very
XML-centric. It might be a good match, if you are into commercial
software. It is a rather different style of programming than
SQL or Lucene. You write XQuery to define the result XML with
the contents fetched from the database.

wunder (not affiliated with MarkLogic)

On 9/22/06 12:42 PM, Tim Archambault [EMAIL PROTECTED]
wrote:

 I'm really confused. I don't mean store the data figuratively as in a
 lucene/solr command. Storing an ID number in a solr index isn't going to
 help a user find nurse. I think part of this is that some people feel
that
 databases like MSSQL, MYSQL should be able to provide quality search
 experience, but they just flat out don't. It's a separate utility.

 Thanks Walter.

 On 9/22/06, Walter Underwood [EMAIL PROTECTED] wrote:

 On 9/22/06 12:25 PM, Tim Archambault 
[EMAIL PROTECTED]
 wrote:

 A recruitment (jobs) customer goes onto our website and posts an
online
 job
 posting to our newspaper website. Upon insert into the database, I
need
 to
 generate an xml file to be sent to SOLR to ADD as  a record to the
 search
 engine. Same  goes for an edit, my database updates the record and
then
 I
 have to send an ADD statement to Solr again to commit my change. 2x
the
 work.

 I've been talking with other papers about Solr and I think what
bothers
 many
 is that there a is a deposit of information in a structured database
 here
 [named A], then we have another set of basically the same data over
here
 [named B] and they don't understand why they have to manage to
different
 sets of data [A  B] that are virtually the same thing.

 The work isn't duplicated. Two servers are building two kinds of index,
 a transactional record index and a text index. That is two kinds of
 work, not a duplication.

 Storing the data is the small part of a database or a search engine.
 The indexes are the real benefit.

 In fact, the data does not have to be stored in Solr. You can return a
 database key as the only field, then get the details from the database.
 That is  how our current search works -- the search result is a list
 of keys in relevance order. Period.

 wunder
 --
 Walter Underwood
 Search Guru, Netflix






Re: Simple Faceted Searching out of the box

2006-09-22 Thread Tim Archambault

Amen Hoss. I appreciated you explaining in terms of what I can understand,
jobs. Makes it easier for me to learn.

What you are saying is right-on with what I'm trying to understand. Right
now I have simple Lucene Indexes that  basically re-created once daily and
that simply isn't doing the job for about 30% of my content.

I'm learning a framework called Model-Glue Unity that uses Reactor which is
an ORM. I'll have to think of how I might be able to make that work.  But as
you say, not all relationships are equal.

For indexing news articles for instance, I want the article, all reader
comments, photos, links, multimedia files associated with the article to be
indexed together as one entity so that if Chris Hostetter commented on the
high cost of heating oil in Maine article, I can find the article by
searching on your name, etc

Have a great weekend and thanks for all the help.

Tim



On 9/22/06, Chris Hostetter [EMAIL PROTECTED] wrote:



: I've been talking with other papers about Solr and I think what bothers
many
: is that there a is a deposit of information in a structured database
here
: [named A], then we have another set of basically the same data over here
: [named B] and they don't understand why they have to manage to different
: sets of data [A  B] that are virtually the same thing.  Many foresee a

The big issue is that while SQL Schemas may be fairly consistent, uses
of those schemas can be very different ... there is no clear cut way to
look at an arbitrary schema and know how far down a chain of foreign key
relationships you should go and still consider the data you find relevant
to the item you started with (from a search perspective) ... ORM tools
tend to get arround this by Lazy-Loading .. if your front end application
starts with a single jobPostId and then asks for the name of the city it's
mapped to, or the named of the company it's mapped to it will dynamicaly
fetch the Company object from teh company table, or maybe it will only
fetch the single companyName field ... but when building a search index
you can't get that lazy evaluation -- you have to proactively fetch that
data in advance, which means you have to know in advance how far down the
rabbit hole you want to go.

not all relationships are equal either: you might have a Skills table
and a many-to-many relationship between JobPosting and skills, with a
mappintType on the mapping indicating which skills are required and
which are just desirable -- those should probably go in seperate fields of
your index, but some code somewhere needs to know that.

once you've solved that problem, once you've got a function that you can
point at your DB, give it a primary key and get back a flattened view of
the data that can represent your Solr/Lucene Document you're 80% done
... the problem is that 80% isn't a genericly solvable problem ... there
aren't simple rules you can apply to any DB schema to drive that function.

Even the last 20% isn't really generic; knowing when to re-index a
particular document ... the needs of a system where individual people
update JobPostings one at a time is very differnet from a system where
JobPostings are bulk imported thousands at a time ... it's hard to write a
usefull indexer that can function efficiently in both cases.  Even in
the first case, dealing with individual document updates where the primary
JobPosting data changes is only the common problem, there are still the
less-common situations where a Company name changes and *all* of the
associated Job Postings need reindexed ... for small indexes it might be
worthwhile to just rebuild the index from scratch, for bigger indexes you
might need a more complex solution for dealing with this situation.

The advice i give people at CNET when they need to build a Solr index is:

1) start by deciding what the minimum freshness is for your data ... ie:
what is the absolute longest you can live with needing to wait for data to
be added/deleted/updated in your Solr index once it's been
added/deleted/modified in your DB.

2) write a function that can generate a Solr Document from an instance of
your data (be it a bean, a DB row, whatever you've got)

3) write a simple wrapper program that iterates over all of yor data, and
calls the function from #1


If #3 takes less time to run then #1 - cron it to rebuild the index from
scratch over and over again and use snapshooter and snappuller to expose
itto the world ... if #3 takes longer then #1, then look at ways to more
systematically decide docs should be updated, and how.



-Hoss




Re: Re: Default XML Output Schema

2006-09-21 Thread Tim Archambault

This structure was inhibiting to me at first too using Coldfusion.
However, I was able to create a function that dynamically creates a
query recordset for both facets and search results and will accomodate
new/additional fields at any time. If I can do it, any reasonable
programmer can handle it.

On 9/21/06, sangraal aiken [EMAIL PROTECTED] wrote:

Thanks for the great explanation Yonik, I passed it on to my collegues for
reference... I knew there was a good reason.

-Sangraal

On 9/21/06, Yonik Seeley [EMAIL PROTECTED] wrote:

 On 9/21/06, sangraal aiken [EMAIL PROTECTED] wrote:
  Perhaps a silly questions, but I'm wondering if anyone can tell me why
 solr
  outputs XML like this:

 During the initial development of Solr (2004), I remember throwing up
 both options, and most developers preferred to have a limited number
 of well defined tags.

 It allows you to have rather arbitrary field names, which you couldn't
 have if you used the field name as the tag.

 It also allows consistency with custom data.  For example, here is the
 representation of an array of integer:
 arrint1/intint2/int/arr
 If field names were used as tags, we would have to either make up a
 dummy-name, or we wouldn't be able to use the same style.


  doc
  int name=id201038/id
  int name=siteId31/siteId
  date name=modified2006-09-15T21:36:39.000Z/date
  /doc
 
  rather than like this:
 
  doc
  id type=int201038/id
  siteId type=int31/siteId
  modified type=date2006-09-15T21:36:39.000Z/modified
  /doc
 
  A front-end PHP developer I know is having trouble parsing the default
 Solr
  output because of that format and mentioned it would be much easier in
 the
  former format... so I was curious if there was a reason it is the way it
 is.

 There are a number of options for you.
 You could write your own QueryResponseWriter to output XML just as you
 like it, or use an XSLT stylesheet in conjunction with
 http://issues.apache.org/jira/browse/SOLR-49
 or use another format such as JSON.

 -Yonik





Re: Fixed first hits - custom RequestHandler?

2006-09-21 Thread Tim Archambault

Otis,

I'm curious as to what you find out here. I'm looking at setting up a
second Solr instance to handle keyword advertising and the first
instance to handle the site search for our newspaper website. Never
thought of your question.

Thanks,

Tim

On 9/21/06, Otis Gospodnetic [EMAIL PROTECTED] wrote:

Hello,

I have a situation where I want certain documents to appear at the top of the 
hit list for certain searches, regardless of their score.  One can think of it 
as the ads right on top of Google's search results (but I'm not dealing with 
ads).

Example:
If I'm searching books in a bookstore, and a person is searching for lucene, the owner of the bookstore may want to 
promote the recently published Lucene in Action instead of some other book about Lucene, so he wants any search for 
lucene or java search to put the link to Lucene in Action on top.

Is there a good way to accomplish this in Solr?
My initial thoughts are that it would be best to have an external store, maybe 
even a Lucene index.  This store would host the data to display on top of hits, 
as well as keywords/phrases that would have to match user's search terms.  A 
custom RequestHandler would then perform a regular search (a la any of the 
existing RequestHandlers), plus pull the data from this side store, and stick 
those in the response.

Is this a good candidate for a custom RequestHandler?

Thanks,
Otis






Re: How to best index user-generated content

2006-09-20 Thread Tim Archambault

Whatever programming language you are using probably has a function that
makes xml-safe text. For example, I'm using Coldfusion to integrate with
Solr and all data is set like follows:

#xmlformat(usergeneratedcontent)#

My guess is PHP, ASP, etc. all have a function like this


On 9/20/06, Nick Snels [EMAIL PROTECTED] wrote:


Hi,

I want users to add content to my site using tinyMCE, which generates
HTML.
When I tried adding the data to Solr, Solr refused to add it (or at least
generated an error):

SEVERE: org.xmlpull.v1.XmlPullParserException: parser must be on START_TAG
or TEXT to read text (position: START_TAG seen ...field
name=textp...
@4:39)
   at org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1071)
   at org.apache.solr.core.SolrCore.readDoc(SolrCore.java:910)
   at org.apache.solr.core.SolrCore.update(SolrCore.java:685)
   at org.apache.solr.servlet.SolrUpdateServlet.doPost(
SolrUpdateServlet.java:52)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
   at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
ApplicationFilterChain.java:252)
   at org.apache.catalina.core.ApplicationFilterChain.doFilter(
ApplicationFilterChain.java:173)
   at org.apache.catalina.core.StandardWrapperValve.invoke(
StandardWrapperValve.java:213)
   at org.apache.catalina.core.StandardContextValve.invoke(
StandardContextValve.java:178)
   at org.apache.catalina.core.StandardHostValve.invoke(
StandardHostValve.java:126)
   at org.apache.catalina.valves.ErrorReportValve.invoke(
ErrorReportValve.java:105)
   at org.apache.catalina.valves.RequestFilterValve.process(
RequestFilterValve.java:275)
   at org.apache.catalina.valves.RemoteAddrValve.invoke(
RemoteAddrValve.java:80)
   at org.apache.catalina.core.StandardEngineValve.invoke(
StandardEngineValve.java:107)
   at org.apache.catalina.connector.CoyoteAdapter.service(
CoyoteAdapter.java:148)
   at org.apache.coyote.http11.Http11Processor.process(
Http11Processor.java
:869)
   at

org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection
(Http11BaseProtocol.java:664)
   at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(
PoolTcpEndpoint.java:527)
   at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(
LeaderFollowerWorkerThread.java:80)
   at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(
ThreadPool.java:684)
   at java.lang.Thread.run(Thread.java:595)

So I searched the archives to resolve this issue, since I didn't want to
strip out the HTML entirely. The solution proved to be to add ![CDATA[
around the HTML text, like so:

adddoc
  field name=text![CDATA[#{field.text}]]/field
/add/doc

This also drew my attention to another problem, characters likeare
all 'invalid' characters between xml tags. So that would mean, I have to
put
![CDATA[ around all the fields I want to index!? Because I don't know or
cann't control what my users will input. Is this the only solution or is
their a way for Solr to handle these 'invalid' characters in the indexed
text by itself, without generating errors?

Kind regards,

Nick




Re: duplicating all records added to index

2006-09-14 Thread Tim Archambault

absolutely.

On 9/14/06, Chris Hostetter [EMAIL PROTECTED] wrote:



:   My index seems to be duplicating all records on insert even though I
have
:  my add statements set to not allow duplicates.
: 
:  I've provided a samle xml file of add docs. Anyone experienced this?

Is your id field listed as the uniqueKey in your schema.xml?


-Hoss



duplicating all records added to index

2006-09-13 Thread Tim Archambault

My index seems to be duplicating all records on insert even though I have my
add statements set to not allow duplicates.

I've provided a samle xml file of add docs. Anyone experienced this?


add  allowDups=false overwriteCommitted=true overwritePending=true

doc
  field name=idobituaries_/field
  field name=link
http://www.bangordailynews.com/a/class/obituaries/obituary.cfm?id=16140
/field
  field name=posted20001204/field
  field name=adnumber/field
  field name=verticalObituaries/field
  field name=authorBDN Classifieds/field
  field name=headlineBURGESS, Fidalis apos;Daleapos; J., 82/field
  field name=summaryHERMON - Fidalis quot;Dalequot; J. Burgess, 82,
husband of the late Lottie (Glidden) Burgess, passed away unexpectedly Dec.
1, 2000, at his residence. He was born Jan. 21, 1918, in Bangor, the son of
Elias and Margaret (Cheverie) Burgess.  Dale lived in Hermon   /field
  field name=article/field
/doc


doc
  field name=idobituaries_/field
  field name=link
http://www.bangordailynews.com/a/class/obituaries/obituary.cfm?id=16141
/field
  field name=posted20001204/field
  field name=adnumber/field
  field name=verticalObituaries/field
  field name=authorBDN Classifieds/field
  field name=headlineCLARKE, Paul H., 79/field
  field name=summaryBANGOR- Paul H. Clarke, 79, died Dec. 1, 2000, at
his residence. He was born Jan. 14, 1921, in Saco the son of Charles and
Jennie (Larson) Clarke.  Paul was a life long member of Elks Lodge 244,
Bangor.  He worked most of his life as a meat cutter for/field
  field name=article/field
/doc


doc
  field name=idobituaries_/field
  field name=link
http://www.bangordailynews.com/a/class/obituaries/obituary.cfm?id=16142
/field
  field name=posted20001204/field
  field name=adnumber/field
  field name=verticalObituaries/field
  field name=authorBDN Classifieds/field
  field name=headlineCOMEAU, Janice Mary, 63/field
  field name=summaryWALTHAM - Janice Mary Comeau, 63, died Dec. 1,
2000, at her home in Waltham. She was born Oct. 22, 1937, in Hartford,
Conn., the daughter of Joseph Edmund and Helen (LeBel) Comeau.  Janice
served her country in the U.S. Army as a nurse. She graduated /field
  field name=article/field
/doc


doc
  field name=idobituaries_/field
  field name=link
http://www.bangordailynews.com/a/class/obituaries/obituary.cfm?id=16143
/field
  field name=posted20001204/field
  field name=adnumber/field
  field name=verticalObituaries/field
  field name=authorBDN Classifieds/field
  field name=headlineCONNERS, Lois Marie/field
  field name=summaryHERMON AND BANGOR - Funeral services for Lois Marie
Conners will be held 9:30 a.m. Monday at Brookings-Smith, 133 Center St.,
Bangor with the Rev. Robert T. Carlson, pastor of the East Orrington
Congregational Church, officiating.  Interment will be in   /field
  field name=article/field
/doc


doc
  field name=idobituaries_/field
  field name=link
http://www.bangordailynews.com/a/class/obituaries/obituary.cfm?id=16144
/field
  field name=posted20001204/field
  field name=adnumber/field
  field name=verticalObituaries/field
  field name=authorBDN Classifieds/field
  field name=headlineCORBETT, Linda L., 53/field
  field name=summaryCARIBOU - Linda L. Corbett, 53, wife of Nathan
Corbett, died Dec. 1, 2000, at Bangor. She was born at Caribou, March 7,
1947, the daughter of Jerry and Luella (Clark) Hewitt.  She was a graduate
of the Caribou High School and was a loving and devoted /field
  field name=article/field
/doc


doc
  field name=idobituaries_/field
  field name=link
http://www.bangordailynews.com/a/class/obituaries/obituary.cfm?id=16145
/field
  field name=posted20001204/field
  field name=adnumber/field
  field name=verticalObituaries/field
  field name=authorBDN Classifieds/field
  field name=headlineEDWARDS, James E./field
  field name=summaryBANGOR - Mr. James E. Edwards died Dec. 2, 2000, at
his residence after a long illness. He was born in Hartford, Conn., May 27,
1929, the son of James V. and Cecelia (Fury) Edwards.  James served in the
U.S. Navy in Guam attaining the rank of fireman. He /field
  field name=article/field
/doc


doc
  field name=idobituaries_/field
  field name=link
http://www.bangordailynews.com/a/class/obituaries/obituary.cfm?id=16146
/field
  field name=posted20001204/field
  field name=adnumber/field
  field name=verticalObituaries/field
  field name=authorBDN Classifieds/field
  field name=headlineFOLSOM, Robert E., 64/field
  field name=summaryMILLINOCKET - Robert E. Folsom, 64, died at a local
hospital, Dec. 1, 2000, after a brief illness. He was born in Millinocket,
the son of Lee and Ada (Hall) Folsom.  Bob retired from Great Northern Paper
Co. after many years. He was a member of BPO Elks  /field
  field name=article/field
/doc


doc
  field name=idobituaries_/field
  field name=link
http://www.bangordailynews.com/a/class/obituaries/obituary.cfm?id=16147
/field
  field name=posted20001204/field
  field name=adnumber/field
  field name=verticalObituaries/field
  

Re: duplicating all records added to index

2006-09-13 Thread Tim Archambault

In the example I sent the id field is not unique, but I've long since
corrected that and still getting duplication. FYI

On 9/14/06, Tim Archambault [EMAIL PROTECTED] wrote:


 My index seems to be duplicating all records on insert even though I have
my add statements set to not allow duplicates.

I've provided a samle xml file of add docs. Anyone experienced this?


add  allowDups=false overwriteCommitted=true overwritePending=true

 doc
   field name=idobituaries_/field
   field name=link
http://www.bangordailynews.com/a/class/obituaries/obituary.cfm?id=16140
/field
   field name=posted20001204/field
   field name=adnumber/field
   field name=verticalObituaries/field
   field name=authorBDN Classifieds/field
   field name=headlineBURGESS, Fidalis apos;Daleapos; J., 82/field

   field name=summaryHERMON - Fidalis quot;Dalequot; J. Burgess, 82,
husband of the late Lottie (Glidden) Burgess, passed away unexpectedly Dec.
1, 2000, at his residence. He was born Jan. 21, 1918, in Bangor, the son of
Elias and Margaret (Cheverie) Burgess.  Dale lived in Hermon   /field
   field name=article/field
 /doc


 doc
   field name=idobituaries_/field
   field 
name=linkhttp://www.bangordailynews.com/a/class/obituaries/obituary.cfm?id=16141
/field
   field name=posted20001204/field
   field name=adnumber/field
   field name=verticalObituaries/field
   field name=authorBDN Classifieds/field
   field name=headlineCLARKE, Paul H., 79/field
   field name=summaryBANGOR- Paul H. Clarke, 79, died Dec. 1, 2000, at
his residence. He was born Jan. 14, 1921, in Saco the son of Charles and
Jennie (Larson) Clarke.  Paul was a life long member of Elks Lodge 244,
Bangor.  He worked most of his life as a meat cutter for/field
   field name=article/field
 /doc


 doc
   field name=idobituaries_/field
   field 
name=linkhttp://www.bangordailynews.com/a/class/obituaries/obituary.cfm?id=16142
/field
   field name=posted20001204/field
   field name=adnumber/field
   field name=verticalObituaries/field
   field name=authorBDN Classifieds/field
   field name=headlineCOMEAU, Janice Mary, 63/field
   field name=summaryWALTHAM - Janice Mary Comeau, 63, died Dec. 1,
2000, at her home in Waltham. She was born Oct. 22, 1937, in Hartford,
Conn., the daughter of Joseph Edmund and Helen (LeBel) Comeau.  Janice
served her country in the U.S. Army as a nurse. She graduated /field
   field name=article/field
 /doc


 doc
   field name=idobituaries_/field
   field 
name=linkhttp://www.bangordailynews.com/a/class/obituaries/obituary.cfm?id=16143
/field
   field name=posted20001204/field
   field name=adnumber/field
   field name=verticalObituaries/field
   field name=authorBDN Classifieds/field
   field name=headlineCONNERS, Lois Marie/field
   field name=summaryHERMON AND BANGOR - Funeral services for Lois
Marie Conners will be held 9:30 a.m. Monday at Brookings-Smith, 133 Center
St., Bangor with the Rev. Robert T. Carlson, pastor of the East Orrington
Congregational Church, officiating.  Interment will be in   /field
   field name=article/field
 /doc


 doc
   field name=idobituaries_/field
   field 
name=linkhttp://www.bangordailynews.com/a/class/obituaries/obituary.cfm?id=16144
/field
   field name=posted20001204/field
   field name=adnumber/field
   field name=verticalObituaries/field
   field name=authorBDN Classifieds/field
   field name=headlineCORBETT, Linda L., 53/field
   field name=summaryCARIBOU - Linda L. Corbett, 53, wife of Nathan
Corbett, died Dec. 1, 2000, at Bangor. She was born at Caribou, March 7,
1947, the daughter of Jerry and Luella (Clark) Hewitt.  She was a graduate
of the Caribou High School and was a loving and devoted /field
   field name=article/field
 /doc


 doc
   field name=idobituaries_/field
   field 
name=linkhttp://www.bangordailynews.com/a/class/obituaries/obituary.cfm?id=16145
/field
   field name=posted20001204/field
   field name=adnumber/field
   field name=verticalObituaries/field
   field name=authorBDN Classifieds/field
   field name=headlineEDWARDS, James E./field
   field name=summaryBANGOR - Mr. James E. Edwards died Dec. 2, 2000,
at his residence after a long illness. He was born in Hartford, Conn., May
27, 1929, the son of James V. and Cecelia (Fury) Edwards.  James served in
the U.S. Navy in Guam attaining the rank of fireman. He /field
   field name=article/field
 /doc


 doc
   field name=idobituaries_/field
   field 
name=linkhttp://www.bangordailynews.com/a/class/obituaries/obituary.cfm?id=16146
/field
   field name=posted20001204/field
   field name=adnumber/field
   field name=verticalObituaries/field
   field name=authorBDN Classifieds/field
   field name=headlineFOLSOM, Robert E., 64/field
   field name=summaryMILLINOCKET - Robert E. Folsom, 64, died at a
local hospital, Dec. 1, 2000, after a brief illness. He was born in
Millinocket, the son of Lee and Ada (Hall) Folsom.  Bob retired from Great
Northern Paper Co. after many years. He was a member of BPO Elks  /field
   field name

Re: Simple Faceted Searching out of the box

2006-09-10 Thread Tim Archambault

For those using PHP to interface with can you explain to me how your PHP
code interacts with Solr? Does PHP create a query_string manually and post
an URL like this:
http://localhost:8983/solr/select?q=vertical%3Ajobs+accountingversion=2.1start=0rows=10fl=qt=standardstylesheet=indent=onexplainOther=hl.fl=
for example then using some PHP command to read a webpage, it then parses
it?

I'm not much of a programmer, but I do know Coldfusion so I'm trying to
apply the PHP principles to CF.

Thanks for any and all help.

Tim


On 9/10/06, Erik Hatcher [EMAIL PROTECTED] wrote:



On Sep 9, 2006, at 9:09 AM, Tim Archambault wrote:
 I need to understand this then. Thanks. I want to use Solr for our
 newspaper
 website and this would be a great way to break out content. Kind of
 greys
 the lines between what is search and what is browsing categories,
 which is a
 great thing actually. Thanks for the help.

greys the lines indeed.  there isn't any difference between search
and browse in my view now.  let's just call it findability :)  (by
the way, Ambient Findability is a fantastic book)

   Erik




Re: Simple Faceted Searching out of the box

2006-09-09 Thread Tim Archambault

I need to understand this then. Thanks. I want to use Solr for our newspaper
website and this would be a great way to break out content. Kind of greys
the lines between what is search and what is browsing categories, which is a
great thing actually. Thanks for the help.

Tim


On 9/9/06, Erik Hatcher [EMAIL PROTECTED] wrote:



On Sep 9, 2006, at 8:15 AM, Tim Archambault wrote:
 What is faceted browsing? Maybe an example of a site interface
 that is
 using it would be good. Dumb question, I know.

Faceted browsing is like this:  http://shopper.cnet.com/ and http://
www.nines.org/collex

In Collex, the constrain further box are the facets.  Clicking on
them adds them to your constraints.  The idea is to divide the
documents in the index into distinct buckets (or sets) and show the
counts of how many results are in each set.

   Erik




Re: Re: SolrCore as Singleton?

2006-09-09 Thread Tim Archambault

In regard to the comment about lack of an interface, I view this as a
benefit of the tool.

Whether I'm developing with Python, PHP, Coldfusion, .NET, Java, etc.
I can create my own customizable interface. As a coldfusion programmer
with moderate programming capabilities, this tool is perfect for my
needs.



On 9/8/06, Andrew May [EMAIL PROTECTED] wrote:

Chris Hostetter wrote:
 : Nice.  Is the same doable under Jetty? (never had to deal with JNDI
 : under Jetty)

 i haven't tried it personally, but according to Yoav reading JNDI
 options is part of hte Servlet Spec, and billa found a refrene to
 useing env-entry to do so...

 http://www.nabble.com/Re%3A-multiple-solr-webapps-p3991310.html

 ...where exactly that option goes in Jetty's configuration isn't something
 i'm clear on.


env-entry values go in web.xml, so it would mean having modified versions of 
solr.war
for each collection.

env-entry is an optional part of the Servlet spec for standalone servlet
implementations. The basic version of Jetty does not have any JNDI support, you 
need to
use JettyPlus (http://jetty.mortbay.org/jetty5/plus/index.html) for that.

-Andrew



search terms submitted

2006-09-09 Thread Tim Archambault

Just wondering what others do with the search terms people type into your
solr search boxes?

Does CNet use this information for Popular Searches?

Just curious.

FYI. SOLR is up  and running on my Windows 2003 IIS machine. Thanks for
everyone's feedback.


Re: SOLR stylesheet

2006-07-17 Thread Tim Archambault

Andre,

I believe I had posted the same message several months ago and was told the
stylesheet functionality was for internal use in a previous release and is
not functional now.


On 7/17/06, Andre Basse [EMAIL PROTECTED] wrote:


Hi SOLR users,

I know this issue has been discussed before but I'm not sure if there
was a final answer.

I would like to apply a stylesheet as mentioned in the tutorial.

http://localhost:8983/solr/select/?stylesheet=

Any ideas where to place the stylesheet, any examples available?


Thanks,

Andre








*
The information contained in this e-mail message and any accompanying
files is or may be confidential.  If you are not the intended recipient, any
use, dissemination, reliance, forwarding, printing or copying of this e-mail
or any attached files is unauthorised. This e-mail is subject to copyright.
No part of it should be reproduced, adapted or communicated without the
written consent of the copyright owner. If you have received this e-mail in
error, please advise the sender immediately by return e-mail, or telephone
and delete all copies. Fairfax does not guarantee the accuracy or
completeness of any information contained in this e-mail or attached files.
Internet communications are not secure, therefore Fairfax does not accept
legal responsibility for the contents of this message or attached files.

*





Re: List of indexed terms for a field

2006-06-07 Thread Tim Archambault

Great question. Please share your answers. I'd like to use this for a
GOOGLE SUGGEST Ajax scenario.

On 6/7/06, Paul Terray [EMAIL PROTECTED] wrote:


Hello,



I am trying Solr for some projects and I am very impressed by its
simplicity
and clarity of use.



I am trying to make an index: Is there any way to get a list of all
indexed
terms for a field (especially a string or text one)?



Thanks.






Paul Terray




Consultant Avant-Vente




SOLLAN






27, bis rue du Progrès
93100 Montreuil - France
Tel :  +33 (0)1 48 51 15 44
Fax : +33 (0)1 48 51 15 48
mailto:[EMAIL PROTECTED] [EMAIL PROTECTED]
http://www.sollan.com www.sollan.com

STRICTLY PERSONAL AND CONFIDENTIAL. This email may contain confidential
and
proprietary material for the sole use of the intended recipient. Any
review
or distribution by others is strictly prohibited. If you are not the
intended recipient please contact the sender and delete all copies.


http://www.sollan.com/signature_mail/lien_signature.php SOLLAN







stylesheet issue

2006-06-02 Thread Tim Archambault

I've got solr installed and running, with only one failure left to date.
Whenver I try to select a stylesheet for my search, I get an error message
such as this:

Error loading stylesheet: A network error occured loading an XSLT
stylesheet:http://localhost:8983/admin/tabular.xsl

Something tells me something isn't mapped correctly here either in Jetty or
in a Solar config. My hunch is the path should be 
http://localhost:8983/solr/admin/tabular.xsl;

I must say the product is great and the synonym tool is unbelievable. Can't
say enough.

Any help with this stylesheet issue is greatly appreciated.

Tim


Re: stylesheet issue

2006-06-02 Thread Tim Archambault

That'll be fine. As you can probably tell, I'm not a programmer. I am just a
dangerous end-user with expertise in marketing  online operations trying to
save a buck. I am going to try to learn XSL or if that doesn't work, I'll
bastardize the results into a coldfusion recordset.

I know I shouldn't ask you questions directly, but I have to ask you.

How many queries per minute can Solr handle in a high use situation? Our
website gets about 4 million page views a month and about 40,000 daily
visitors, which is about an hour for CNET probably. I am envisioning Solr
being the search engine for our jobs, autos, classifieds, and as a global
search experience that includes them all. I really want to greatly limit the
use of database connections on our site. Do you think Solr can be a global
solution for search on our site. It's one thing to test, yet another in a
production environment.

Which java-based web server component do you recommend for a windows
platform? Tomcat? Another? I know nothing about these tools. I am using
Jetty for testing.

Thank you for all your help.

Tim



On 6/2/06, Yonik Seeley [EMAIL PROTECTED] wrote:


On 6/2/06, Tim Archambault [EMAIL PROTECTED] wrote:
 I've got solr installed and running, with only one failure left to date.
 Whenver I try to select a stylesheet for my search, I get an error
message
 such as this:

Hi Tim,

There is no stylesheet :-)

It's a hold-over from an old XML format that Solr used to support
before it was open-sourced.  That old XML format was for compatibility
with another internal product.  It turned out that it wasn't flexible
enough to add extra info like multiple result sets, or faceted
browsing info, so we came up with v2 of the XML (but no new stylesheet
to go with it).

The XML is fairly readable though, so it hasn't been much of a problem
in practice.

-Yonik



Re: stylesheet issue

2006-06-02 Thread Tim Archambault

By global do you mean Solr as the search solution for all those
collections, or do you mean having all those different types of
documents (jobs, autos, classifieds) in a single Solr index?
Yes I did. I envisioned separating them by custom fields named vertical
and then within vertical category

Unless there is a good reason to put multiple document types in the
same index, you will get better performance by putting them in their
own index.
So my educated guess would be that I would create additional schema xml
elements in my schema.xml separately for jobs, homes, cars, news, obits, etc
( in the tutorial, I note the schema name example) and my search query
strings would have to specify which schema to use in the query, but I don't
see a variable for schema.

NumDocs: It looks like I am going to have an index of about 300,000
documents initially and should grow by about 150 per day..


On 6/2/06, Yonik Seeley [EMAIL PROTECTED] wrote:


On 6/2/06, Tim Archambault [EMAIL PROTECTED] wrote:
 That'll be fine. As you can probably tell, I'm not a programmer. I am
just a
 dangerous end-user with expertise in marketing  online operations
trying to
 save a buck. I am going to try to learn XSL or if that doesn't work,
I'll
 bastardize the results into a coldfusion recordset.

 I know I shouldn't ask you questions directly, but I have to ask you.

 How many queries per minute can Solr handle in a high use situation?

It depends on how many documents are in the collection, the nature of
the documents (unique terms, size of fields, etc), and heavily depends
on the nature of the queries, and the CPU and memory of your hardware.

I've seen up to 1000 queries/sec for very simple queries on a 1M doc
index.

 Our
 website gets about 4 million page views a month and about 40,000 daily
 visitors,

That shouldn't be a problem unless the collection is just too big.
It's pretty easy to scale Solr to higher query traffic by putting more
query servers behind a load balancer, *provided* that the latency of a
single query is acceptable.  If the collection is too big (to many
documents, to big of documents), then you need to split up the
collection and use federated search (Solr doesn't have it yet, but it
will in the future).

 I am envisioning Solr
 being the search engine for our jobs, autos, classifieds, and as a
global
 search experience that includes them all. I really want to greatly limit
the
 use of database connections on our site. Do you think Solr can be a
global
 solution for search on our site.

By global do you mean Solr as the search solution for all those
collections, or do you mean having all those different types of
documents (jobs, autos, classifieds) in a single Solr index?

Unless there is a good reason to put multiple document types in the
same index, you will get better performance by putting them in their
own index.

 Which java-based web server component do you recommend for a windows
 platform? Tomcat? Another? I know nothing about these tools. I am using
 Jetty for testing.

Tomcat is the most widely used I think... and therefore easier to find
docs and find help/support for it.  I started a little Tomcat
installation guide on the Wiki last night.

-Yonik



solr newbie

2006-06-01 Thread Tim Archambault

Trying to run the test tutorial to index an xml file and keep getting an
error message: curl: command not found?

Any help is greatly appreciated.


Re: solr newbie

2006-06-01 Thread Tim Archambault

Thanks Yonik. All looks good except for the statement: curl installed from
the Web category.

Don't understand what web category means. SH.

On 6/1/06, Yonik Seeley [EMAIL PROTECTED] wrote:


Hi Tim,

Curl is a little command-line networking tool.  The easiest way to get
it is cygwin if you are not on a UNIX system.

See the 'Requirements section of the tutorial:
3. On Win32, cygwin, for shell support. (If you plan to use Subversion
on Win32, be sure to select the subversion package when you install,
in the Devel category.) This tutorial will assume that sh is in
your PATH, and that you have curl installed from the Web category.

-Yonik

On 6/1/06, Tim Archambault [EMAIL PROTECTED] wrote:
 Trying to run the test tutorial to index an xml file and keep getting an
 error message: curl: command not found?

 Any help is greatly appreciated.



Re: solr newbie

2006-06-01 Thread Tim Archambault

I'll need to install cygwin again I think. Thanks.


On 6/1/06, Yonik Seeley [EMAIL PROTECTED] wrote:


On 6/1/06, Tim Archambault [EMAIL PROTECTED] wrote:
 Don't understand what web category means. SH.

The cygwin installer has different categories of packages...
base,devel,etc.
If you are looking for the curl package, it should be filed under
web.  It's not installed by default, so you need to select it.

-Yonik



Re: solr newbie

2006-06-01 Thread Tim Archambault

I found the web options. Thank you very much. While that is installing
incrementally, two last questions.

Are there any example stylesheets to review to see how the data flows into
the layout?
How would one go about injecting database information into the indexs
without having to create XML files for each one?

Thanks again.

Tim

On 6/1/06, Yonik Seeley [EMAIL PROTECTED] wrote:


On 6/1/06, Tim Archambault [EMAIL PROTECTED] wrote:
 I'll need to install cygwin again I think. Thanks.

Don't uninstall cygwin... just re-run the cygwin setup.exe and it will
do incremental updates, installing packages that have changed, and
allowing you to select new packages to install.

-Yonik



Re: solr newbie

2006-06-01 Thread Tim Archambault

Great thanks. I manage a newspaper website in Maine USA with about
400,000-500,000 documents/database records (if not more) and I am going to
try and create a solr search engine for the site. We'll see how it goes.
I've been using a bastardized lucene search for my site up to now, but
this looks much better.

On 6/1/06, Yonik Seeley [EMAIL PROTECTED] wrote:


On 6/1/06, Tim Archambault [EMAIL PROTECTED] wrote:
 I found the web options. Thank you very much. While that is installing
 incrementally, two last questions.

 Are there any example stylesheets to review to see how the data flows
into
 the layout?
 How would one go about injecting database information into the indexs
 without having to create XML files for each one?

It's most efficient to make a builder application that reads from
the database, constructs XML documents *in memory* and sends them to
the Solr server.  Multiple threads/connections open to the Solr server
will speed up indexing and hide any request-response latency of
individual adds.

We don't have it yet, but there really should be a simple Java client
library that creates the XML add commands and handles sending them to
the server.

Also on the todo list is indexing directly from a SQL database w/o
the user having to write any code except select statements.

-Yonik