subject:"Solr indexing"

solr indexing

2011-02-22 Thread satya swaroop

Hi all,
   to my keen intrest on solr indexing mechanism i started mining the
code of solr indexing (/update/extract), i read the indexing file formats,
scoring procedure, i have some queries regarding this..
1) the scoring is performed on the dynamic and precalculated value(doc
boost, field boost, lengthnorm). In calculating the score if suppose a term
in the index consits nearly one million docs then is solr calculating the
score for each and every doc present for the term and getting the top docs
from the index??? or is it undergoing any mechanism such that limiting the
calculation of score to only a particular docs???

If anybody know about it or any documentation regarding this please inform
me...


Regards,
satya

Solr indexing

2007-07-03 Thread niraj tulachan

Hi all,
 I have successfully implemented the Solr so far but there are couple of 
questions I want the solr user to shine a light on them:
  1) In Solr, we create an index by POSTing a XML file to the server.  However, 
is there a way we can do that same process by db(containg metadat) approach?
  2) while updating the pre-exist index, the update won't happen until we do 
the "commit" on it.  However, While updating the index (before doing 'commit'), 
can we still search on that index (to use the old content)?
  Any info will be highly appericated..
  Cheers,
  Niraj

   
-
Need a vacation? Get great deals to amazing places on Yahoo! Travel.

Solr Indexing Performance

2011-01-29 Thread Darx Oman

Hi guys



I'm running a solr instance (trunk)  in my dev. Server to test my
configuration.  I'm doing a DIH full import to index 49 PDF files with their
corresponding database records.  Both the PDF files and database are local
in the server.

*Server : *

· Windows 2008 R2

· MS SQL server 2008 R2

· 16 core processor

· 16 GB ram

*Tomcat (7.0.5) : *

· Set JAVA_OPTS = %JAVA_OPTS%  -Xms1024M  -Xmx8192M

*Solrconfig:*

· Main index configurations
2048
50

*DIH configuration:*

· 2 data sources defined  jdbcDataSource and BinFileDataSource

· One main entity with 3 sub entities



 

 

 



· Total schema fields are 8, three of which are text type and
multivalued.

*My DIH import Status Messages:*

· Total Requests made to DataSource = 99**

· Total Rows Fetched = 2124**

· Total DocumentsProcessed = 49**

· Time Taken = *0:2:3:880***

*
Is this time reasonable or it can be improved?*

Solr Indexing Patterns

2011-06-03 Thread Judioo

What is the "best practice" method to index the following in Solr:

I'm attempting to use solr for a book store site.

Each book will have a price but on occasions this will be discounted. The
discounted price exists for a defined time period but there may be many
discount periods. Each discount will have a brief synopsis, start and end
time.

A subset of the desired output would be as follows:

...
"response":{"numFound":1,"start":0,"docs":[
  {
"name":"The Book",
"price":"$9.99",
"discounts":[
{
 "price":"$3.00",
 "synopsis":"thanksgiving special",
 "starts":"11-24-2011",
 "ends":"11-25-2011",
},
{
 "price":"$4.00",
 "synopsis":"Canadian thanksgiving special",
 "starts":"10-10-2011",
 "ends":"10-11-2011",
},
 ]
  },
  .

A requirement is to be able to search for just discounted publications. I
think I could use date faceting for this ( return publications that are
within a discount window ). When a discount search is performed no
publications that are not currently discounted will be returned.

My question are:

   - Does solr support this type of sub documents

In the above example the discounts are the sub documents. I know solr is not
a relational DB but I would like to store and index the above representation
in a single document if possible.

   - what is the best method to approach the above

I can see in many examples the authors tend to denormalize to solve similar
problems. This suggest that for each discount I am required to duplicate the
book data or form a document
association.
Which method would you advise?

It would be nice if solr could return a response structured as above.

Much Thanks

Solr indexing questions

2011-06-11 Thread Frank A

I currently have my site setup using SOLR for some pretty simple queries and
am looking to add some additional features and was hoping to get some
guidance.

Heres my situation, for a given restaurant I have the following info:

rest name,
editorial,
list of features (e.g. Reservations, Good for Groups, etc)
list of cuisines (American, Italian, etc)
List of user reviews
Additional meta data

There are 2 different things I want to do:

Build a directory based on "keywords or phrases" - e.g. looking through all
the data find the common keywords/phrases - e.g. "hot dog" or "Brazilian
steakhouse". I'm not sure how to extract these keyphrases from the data
without having to input them myself.  Is this a good fit for SOLR?  If so
what features should I look into?

Second, is an "advanced" search that basically matches user input on ANY of
the fields.  However I'd like it to have some basic handling for mispelled
words, synonyms (bbq and bar-b-q) and weight the user of the terms
differently (e.g. name of restaurant vs. in a users comments).  I'm sure
this is SOLRs sweet spot but I'm having trouble figuring out how to put it
all together.

Thanks in advance.

offline solr indexing

2009-04-27 Thread Charles Federspiel

Solr Users,
Our app servers are setup on read-only filesystems.  Is there a way
to perform indexing from the command line, then copy the index files to the
app-server and use Solr to perform search from inside the servlet container?

If the Solr implementation is bound to http requests, can Solr perform
searches against an index that I create with Lucene?
thank you,
Charles Federspiel

Dynamic Solr indexing

2010-03-01 Thread Peter S

Hi,

I wonder if anyone could shed some insight on a dynamic indexing question...?

The basic requirement is this:

Indexing:

A process writes to an index, and when it reaches a certain size (say, 1GB), a
new index (core) is 'automatically' created/deployed (i.e. the process doesn't
know about it) and further indexing now goes into the new core. When that one
reaches its threshold size, a new index is deplyoed, and so on.

The process that is writing to the indices doesn't actually know that it is
writing to different cores.

Searching:

When a search is directed at the above index, the actual search is a
distrbitued shard search across all the shards that have been deployed. Again,
the searcher process doesn't know this, but gets back the aggregated results,
as if it had specified all the shards in the request URL, but as these are
changing dynamically, it of course can't know what they all are at any given
time.

This requirement sounds to me perhaps like a Katta thing. I've had a look at
Solr-1395, and there's questions in Lucid that sound similar (e.g.
http://www.lucidimagination.com/search/document/4b3d00055413536d/solr_katta_integration#4b3d00055413536d),
so I guess (hope) I'm not the only one with this requirement.

I couldn't find anything in either Katta or SOLR-1395 that fit both the writing
and searching requirement, but I could easily have missed it.

Is Katta/Solr-1395 the way to go to achieve this? Would such a solution be
'production-ready'? Has anyone deployed this type of thing in a production
environment?

1 2 >

1 - 100 of 179 matches

Mail list logo