RE: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-05 Thread Ephraim Ofir
I'm not sure about the scale you're aiming for, but you probably want to
do both sharding and replication.  There's no central server which would
be the bottleneck. The guidelines should probably be something like:
1. Split your index to enough shards so it can keep up with the update
rate.
2. Have enough replicates of each shard master to keep up with the rate
of queries.
3. Have enough aggregators in front of the shard replicates so the
aggregation doesn't become a bottleneck.
4. Make sure you have good load balancing across your system.

Attached is a diagram of the setup we have.  You might want to look into
SolrCloud as well.

Ephraim Ofir


-Original Message-
From: Jens Mueller [mailto:supidupi...@googlemail.com] 
Sent: Tuesday, April 05, 2011 4:25 AM
To: solr-user@lucene.apache.org
Subject: Very very large scale Solr Deployment = how to do (Expert
Question)?

Hello Experts,



I am a Solr newbie but read quite a lot of docs. I still do not
understand
what would be the best way to setup very large scale deployments:



Goal (threoretical):

 A.) Index-Size: 1 Petabyte (1 Document is about 5 KB in Size)

 B) Queries: 10 Queries/ per Second

 C) Updates: 10 Updates / per Second




Solr offers:

1.)Replication => Scales Well for B)  BUT  A) and C) are not
satisfied


2.)Sharding => Scales well for A) BUT B) and C) are not satisfied
(=> As
I understand the Sharding approach all goes through a central server,
that
dispatches the updates and assembles the quries retrieved from the
different
shards. But this central server has also some capacity limits...)




What is the right approach to handle such large deployments? I would be
thankfull for just a rough sketch of the concepts so I can
experiment/search
further...


Maybe I am missing something very trivial as I think some of the "Solr
Users/Use Cases" on the homepage are that kind of large deployments. How
are
they implemented?



Thanky very much!!!

Jens


RE: FW: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-06 Thread Ephraim Ofir
Hi all,
I'd love to share the diagram, just not sure how to do that on the list
(it's a word document I tried to send as attachment).

Jens, to answer your questions:
1. Correct, in our setup the source of the data is a DB from which we
pull the data using DIH (search the list for my previous post "DIH -
deleting documents, high performance (delta) imports, and passing
parameters" if you want info about that).  We were lucky enough to have
the data sharded at the DB level before we started using Solr, so using
the same shards was an easy extension.  Note that we're not (yet...)
using SolrCloud, it was just something I thought you should consider.
2. I got the idea for the "aggregator" from the Solr book (PACKT).  I
don't remember if that term was used in the book or if I made it up (if
Google doesn't know it, I probably mad it up...), but I think it conveys
what this part of the puzzle does.  As you said, this is simply a Solr
instance which doesn't hold its own index, but shares the same schema as
the slaves and masters.  I actually defined the default query handler on
this instance to include the shards parameter (see below), so the client
doesn't have to know anything about the internal workings of the sharded
setup, it just hits the aggregator load balancer with a regular query
and everything is handled behind the scenes.  This simplifies the client
and allows me to change the architecture in the future (i.e. change the
number of shards or their DNS name) without requiring a client change.

Sharded query handler:

  

 
   explicit
   ${slaveUrls:null}
 
  

All of our Solr instances share the same configs (solrconfig.xml,
schema.xml, etc.) and different instances take different roles according
to properties defined in solr.xml which is generated by a script
specifically for each Solr instance (the script has a "map" of which
instances should be on which host, and has to be run once on each host).
In this case, this is how the generated solr.xml looks:


   -- just a name that
appears in Solr management
  -- to make it easier
to know which instance you're on

   -- this tells the
instance is an aggregator,
  -- so it should use
the sharded request handler by default
  -- masters and slaves
have master/slave accordingly do define
  -- replication, a
regular default search handler for slaves,
  -- and DIH on masters

 -- this is used by instances
which are shards in order to determine which
 -- DB they should import from
(masters)
 -- and which master they should
replicate from (slaves)

 --
used by the sharded request handler

-- used by load balancer to
 
-- know if this instance is alive
   
  -- just
one core for this instance
  --
indexers have 2 cores, one prod and one for full reindex
   



Let me know if I can assist any further.
Ephraim Ofir


-Original Message-
From: Jonathan DeMello [mailto:demello@googlemail.com] 
Sent: Wednesday, April 06, 2011 8:58 AM
To: solr-user@lucene.apache.org
Cc: Isan Fulia; Tirthankar Chatterjee
Subject: Re: FW: Very very large scale Solr Deployment = how to do
(Expert Question)?

I third that request.

Would greatly appreciate taking a look at that diagram!

Regards,

Jonathan

On Wed, Apr 6, 2011 at 9:12 AM, Isan Fulia 
wrote:

> Hi Ephraim/Jen,
>
> Can u share that diagram with all.It may really help all of us.
> Thanks,
> Isan Fulia.
>
> On 6 April 2011 10:15, Tirthankar Chatterjee
 >wrote:
>
> > Hi Jen,
> > Can you please forward the diagram attachment too that Ephraim sent.
:-)
> > Thanks,
> > Tirthankar
> >
> > -Original Message-
> > From: Jens Mueller [mailto:supidupi...@googlemail.com]
> > Sent: Tuesday, April 05, 2011 10:30 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: FW: Very very large scale Solr Deployment = how to do
> (Expert
> > Question)?
> >
> > Hello Ephraim,
> >
> > thank you so much for the great Document/Scaling-Concept!!
> >
> > First I think you really should publish this on the solr wiki. This
> > approach is nowhere documented there and not really obvious for
newbies
> and
> > your document is great and explains this very well!
> >
> > Please allow me to further questions regarding your document:
> > 1.) Is it correct, that you mean by "DB" the Origin-Data-Source of
the
> data
> > that is fed into the Solr "Cloud"

RE: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-07 Thread Ephraim Ofir
You can't view it online, but you should be able to download it from:
https://docs.google.com/leaf?id=0BwOEbnJ7oeOrNmU5ZThjODUtYzM5MS00YjRlLWI
2OTktZTEzNDk1YmVmOWU4&hl=en&authkey=COGel4gP

Enjoy,
Ephraim Ofir


-Original Message-
From: Jens Mueller [mailto:supidupi...@googlemail.com] 
Sent: Thursday, April 07, 2011 8:30 AM
To: solr-user@lucene.apache.org
Subject: Re: Very very large scale Solr Deployment = how to do (Expert
Question)?

Hello Ephraim, hello Lance, hello Walter,

thanks for your replies:

Ephraim, thanks very much for the further detailed explanation. I will
try
to setup a demo system in the next few days and use your advice.
LoadBalancers are an important aspect of your design. Can you recommend
one
LB specificallly? (I would be using haproxy.1wt.eu) . I think the Idea
with
uploading your document is very good. However Google-Docs seemed not be
be
working (at least for me with the docx format?), but maybe you can
simply
output the document as PDF and then I think Google Docs is working, so
all
the others can also have a look at your concept. The best approach would
be
if you could upload your advice directly somewhere to the solr wiki as
it is
really helpful.I found some other documents meanwhile, but yours is much
clearer and more complete, with the LBs and the Aggregators (
http://lucene-eurocon.org/slides/Solr-In-The-Cloud_Mark-Miller.pdf)

Lance, thanks I will have a look at what linkedin is doing.

Walter, thanks for the advice: Well you are right, mentioning google. My
question was also to understand how such large systems like
google/facebook
are actually working. So my numbers are just theoretical and made up. My
system will be smaller,  but I would be very happy to understand how
such
large systems are build and I think the approach Ephraim showd should be
working quite well at large scale. If you know a good documents (besides
the
bigtable research paper that I already know) that technically describes
how
google is working in detail that would be of great interest. You seem to
be
working for a company that handles large datasets. Does google use this
approach, sharing the index into N writers, and the procuded index is
then
replicated to N "read only searchers"?

thank you all.
best regards
jens



2011/4/7 Walter Underwood 

> The bigger answer is that you cannot get to this size by just
configuring
> Solr. You may have to invent a lot of stuff. Like all of Google.
>
> Where did you get these numbers? The proposed query rate is twice as
big as
> Google (Feb 2010 estimate, 34K qps).
>
> I work at MarkLogic, and we scale to 100's of terabytes, with fast
update
> and query rates. If you want a real system that handles that, you
might want
> to look at our product.
>
> wunder
>
> On Apr 6, 2011, at 8:06 PM, Lance Norskog wrote:
>
> > I would not use replication. LinkedIn consumer search is a flat
system
> > where one process indexes new entries and does queries
simultaneously.
> > It's a custom Lucene app called Zoie. Their stuff is on Github..
> >
> > I would get documents to indexers via a multicast IP-based queueing
> > system. This scales very well and there's a lot of hardware support.
> >
> > The problem with distributed search is that it is a) inherently
slower
> > and b) has inherently more and longer jitter. The "airplane wing"
> > distribution of query times becomes longer and flatter.
> >
> > This is going to have to be a "federated" system, where the
front-end
> > app aggregates results rather than Solr.
> >
> > On Mon, Apr 4, 2011 at 6:25 PM, Jens Mueller

> wrote:
> >> Hello Experts,
> >>
> >>
> >>
> >> I am a Solr newbie but read quite a lot of docs. I still do not
> understand
> >> what would be the best way to setup very large scale deployments:
> >>
> >>
> >>
> >> Goal (threoretical):
> >>
> >>  A.) Index-Size: 1 Petabyte (1 Document is about 5 KB in Size)
> >>
> >>  B) Queries: 10 Queries/ per Second
> >>
> >>  C) Updates: 10 Updates / per Second
> >>
> >>
> >>
> >>
> >> Solr offers:
> >>
> >> 1.)Replication => Scales Well for B)  BUT  A) and C) are not
> satisfied
> >>
> >>
> >> 2.)Sharding => Scales well for A) BUT B) and C) are not
satisfied
> (=> As
> >> I understand the Sharding approach all goes through a central
server,
> that
> >> dispatches the updates and assembles the quries retrieved from the
> different
> >> shards. But this central server has also some capacity limits...)
> >>
> >>
> >>
> >>
> >> What is the right approach to handle such large deployments? I
would be
> >> thankfull for just a rough sketch of the concepts so I can
> experiment/search
> >> further...
> >>
> >>
> >> Maybe I am missing something very trivial as I think some of the
"Solr
> >> Users/Use Cases" on the homepage are that kind of large
deployments. How
> are
> >> they implemented?
> >>
> >>
> >>
> >> Thanky very much!!!
> >>
> >> Jens
> >>
> >
>
>
>
>
>


RE: Fast DIH with 1:M multValue entities

2011-04-17 Thread Ephraim Ofir
Search the list for my post "DIH - deleting documents, high performance
(delta) imports, and passing parameters" which shows a different
approach to 1:M sub entities

Ephraim Ofir

-Original Message-
From: Tim Gilbert [mailto:tim.gilb...@morningstar.com] 
Sent: Thursday, April 14, 2011 6:02 PM
To: solr-user@lucene.apache.org
Subject: RE: Fast DIH with 1:M multValue entities

How did I miss that?  Thanks, I will try that as it seems to be "in
memory" lookup solution I needed.

Thanks Erick,

Tim

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, April 14, 2011 10:58 AM
To: solr-user@lucene.apache.org
Subject: Re: Fast DIH with 1:M multValue entities

I'm not sure this applies, but have you looked at
http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor

<http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor>
Best
Erick

On Thu, Apr 14, 2011 at 9:12 AM, Tim Gilbert
wrote:

> We are working on importing a large number of records into Solr using
> DIH.  We have one schema with ~2000 fields declared which map off to
> several database schemas so that typically each document will have
~500
> fields in use.  We have about 2 million "rows" which we are importing,
> and we are seeing < 20 minutes in test across 14 different "entity's"
> which really map off to one virtual document.  Then we added our
> multiValue stuff and, well, it didn't work out nearly as well. :-)
>
>
>
> We have several fields which are 1:M and so in our data-config.xml we
> might have something like this:
>
>
>
> 
>
> 
>
> 
>
> 
> query="{call dbo.getFundManager_Data(${FundId.FundId})}">
>
>
>
> 
>
> 
>
> 
>
> 
>
>
>
> That is a lot of database queries for a small result set which is
really
> slowing things down for us.
>
>
>
> My question is more to ask advice, so it's a multi-parter :-)
>
>
>
> 1)   Is there a way to declare in DIH an in-memory
> lookup where we can query for the entire Many side of the query in one
> database query, and match up on the PK?  Then we can declare that
field
> multiValued.
>
> 2)   Assuming that isn't currently available, I
thought
> "denormalizing" the 1:M into a delimited list and then using
>
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDel
> imiterFilterFactory to tokenize.  That would allow us to search on
> individual bits, and build something into the front-end to handle the
> display.  That means we wouldn't use multiValued and we'd have to
modify
> our db but we'd lose out on some of the abilities.
>
> 3)   The third option was to open up DIH and try to
add
> the first feature into it ourselves.
>
>
>
> Am I approaching this the right way?  Are there other ways I haven't
> considered or don't know about?
>
>
>
> Thanks in advance,
>
>
>
> Tim
>
>


RE: How could each core share configuration files

2011-04-20 Thread Ephraim Ofir
I just use soft-links...

Ephraim Ofir

-Original Message-
From: lboutros [mailto:boutr...@gmail.com] 
Sent: Wednesday, April 20, 2011 10:09 AM
To: solr-user@lucene.apache.org
Subject: Re: How could each core share configuration files

Perhaps this could help :

http://lucene.472066.n3.nabble.com/Shared-conf-td2787771.html#a2789447

Ludovic.

2011/4/20 kun xiong [via Lucene] <
ml-node+2841801-1701787156-383...@n3.nabble.com>

> Hi all,
>
> Currently in my project , most of the core configurations are
> same(solrconfig.xml, dataimport.properties...),  which are putted in
their
> own folder as reduplicative.
>
> I am wondering how could I put common ones in one folder, which each
core
> could share, and keep the different ones in their own folder still.
>
> Thanks
>
> Kun
>
>
> --
>  If you reply to this email, your message will be added to the
discussion
> below:
>
>
http://lucene.472066.n3.nabble.com/How-could-each-core-share-configurati
on-files-tp2841801p2841801.html
>  To start a new topic under Solr - User, email
> ml-node+472068-1765922688-383...@n3.nabble.com
> To unsubscribe from Solr - User, click
here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=u
nsubscribe_by_code&node=472068&code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0
Mzk2MDUxNjE=>.
>
>


-
Jouve
France.
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-could-each-core-share-configurati
on-files-tp2841801p2841875.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: DIH import and postImportDeleteQuery

2011-05-25 Thread Ephraim Ofir
Search the list for my post "DIH - deleting documents, high performance
(delta) imports, and passing parameters" which shows my solution a
similar problem.

Ephraim Ofir

-Original Message-
From: Alexandre Rocco [mailto:alel...@gmail.com] 
Sent: Tuesday, May 24, 2011 11:24 PM
To: solr-user@lucene.apache.org
Subject: DIH import and postImportDeleteQuery

Guys,

I am facing a situation in one of our projects that I need to perform a
cleanup to remove some documents after we perform an update via DIH.
The big issue right now comes from the fact that when we call the DIH
with
clean=false, the postImportDeleteQuery is not executed.

My setup is currently arranged like this:
- A SQL Server stored procedure that receives a parameter (specified in
the
URL) and returns the records to be indexed
- The procedure is able to return all the records (for a full-import) or
only the updated records (for a delta-import)
- This procedure returns valid and deleted records, from this point
comes
the need to run a postImportDeleteQuery to remove the deleted ones.

Everything works fine when I run a full-import, I am running always with
clean=true, and then the whole index is rebuilt.
When I need to do an incremental update, the records are updated
correctly,
but the command to delete the other records is not executed.

I've tried several combinations, with different results:
- Running full-import with clean=false: the records are updated but the
ones
that needs to be deleted stays on the index
- Running delta-import with clean=false: the records are updated but the
ones that needs to be deleted stays on the index
- Running delta-import with clean=true: all records are deleted from the
index and then only the records returned by the procedure are on the
index,
except the deleted ones.

I don't see any way to achieve my goal, without changing the process
that I
do to obtain the data.
Since this is a very complex stored procedure, with tons of joins and
custom
processing, I am trying everything to avoid messing with it.

See below a copy of my data-config.xml file. I made it simpler omitting
all
the fields, since it's out of scope of the issue:


















Any ideas or pointers that might help on this one?

Many thanks,
Alexandre


RE: How to delete documents from SOLR index using DIH

2010-08-26 Thread Ephraim Ofir
You have several options here:
1. Use the deletedPkQuery in delta import - you'll need to make a DB
query which generates the IDs to be deleted (something like: SELECT id
FROM your_table WHERE deleted = 1).
2. Add the $deleteDocById special command to your full/delta import.
3. Use preImportDeleteQuery/postImportDeleteQuery in your full/delta
query

If you want to use any of these separately from your import, you can put
them in a separate entity and do a full/delta import just on that
entity.

Ephraim Ofir


-Original Message-
From: Pawan Darira [mailto:pawan.dar...@gmail.com] 
Sent: Thursday, August 26, 2010 9:01 AM
To: solr-user@lucene.apache.org
Subject: Re: How to delete documents from SOLR index using DIH

Thanks Erick. Your solution do make sense. Actually i wanted to know,
how to
use delete via query or unique id through DIH?

Is there any specific query to be mentioned in data-config.xml? Also Is
there any separate command like "full-import" ,"delta-import" for
deleting
documents from index?



On Thu, Aug 26, 2010 at 12:03 AM, Erick Erickson
wrote:

> I'm not sure what you mean here. You can delete via query or unique
id. But
> DIH really isn't relevant here.
>
> If you've defined a unique key, simply re-adding any changed documents
will
> delete the old one and insert the new document.
>
> If this makes no sense, could you explain what the underlying problem
> you're
> trying to solve is?
>
> HTH
> Erick
>
> On Tue, Aug 24, 2010 at 8:56 PM, Pawan Darira  >wrote:
>
> > Hi
> >
> > I am using data import handler to build index. How can i delete
documents
> > from my index using DIH.
> >
> > --
> > Thanks,
> > Pawan Darira
> >
>



-- 
Thanks,
Pawan Darira


RE: How to do ? Articles and Its Associated Comments Indexing , One to Many relationship

2010-08-26 Thread Ephraim Ofir
Why not define the comment field as multiValued? That way you only index
each document once and you don't need to collapse anything...

Ephraim Ofir


-Original Message-
From: Sumit Arora [mailto:sumit1...@gmail.com] 
Sent: Thursday, August 26, 2010 12:54 PM
To: solr-user@lucene.apache.org
Subject: How to do ? Articles and Its Associated Comments Indexing , One
to Many relationship

I have set of Articles and then Comments on it, so in database I have
two
major tables one for Articles and one for Comments, but each Article
could
have many comments (One to Many).


If One Article will have 20 Comments, then on DB to SOLR - Index - Sync
:
Solr will index 20 Similar Documents with a difference of each Comment.


Use Case :

On Search: If keyword would be a fit to more than one comment, then it
will
return duplicate documents.


One Possible solution I thought to Apply:

**

I should go for Indexing 20 Similar Documents with a difference of each
Comment.


While retrieving results from Query: I could use: collapse.field = By
Article Id


Am I following right approach?


RE: Candidate Profile Search which have multiple employers and Educations.

2010-08-26 Thread Ephraim Ofir
As far as I can tell you should use multiValued for these fields:

  
  

In order to get the data from the DB you should either create a sub
entity with its own query or (the better performance option) use
something like:

SELECT cp.name,
GROUP_CONCAT(ce.CandidateEducation SEPARATOR '|') AS
multiple_educations,
GROUP_CONCAT(e.Employer SEPARATOR '|') AS multiple_employers
FROM CandidateProfile_Table cp
LEFT JOIN CandidateEducation_Table ce ON cp.name = ce.name
LEFT JOIN Employers_Table e ON cp.name = e.name
GROUP BY cp.name

This creates one line with the educations and employers concatenated
into pipe (|) delimited fields.  Then you'd have to break up the
multiple fields using a RegexTransformer - use something like:







The SQL probably doesn't fit your DB schema, but it's just to clarify
the idea.  You might have to pick a different field separator if pipe
(|) might be in your data...

Ephraim Ofir


-Original Message-
From: Sumit Arora [mailto:sumit1...@gmail.com] 
Sent: Thursday, August 26, 2010 1:36 PM
To: solr-user@lucene.apache.org
Subject: Candidate Profile Search which have multiple employers and
Educations.

I have to search candidate's profile , on which I have following Tables
:

Candidate Profile Record : CandidateProfile_Table

CandidateEducation : CandidateEducation_Table  //  EducationIn Different
Institutes or Colleges  :

Employers :  Employers_Table //More than One Employers :

If I denormalize this all three Table :

CandidateProfile_Table  - 1 Row for Sumit

CandidateEducation_Table - 5 Rows for Sumit

Employers_Table - 5 Rows for Sumit

If these three tables will go to Index in Solr , It will create 25
Documents
for one row.


In this Case What Should be My Approach :

DeNormalize all three tables and while querying from Solr use Field
Collpase
parameter by CandidateProfile Id, So It will return one record.

Or

I should use CandidateEducation_Table,CandidateEducation_Table as
MultiValued in Solr ?


If that is the case, then How I can apply Solr way to use MultiValue
e.g;

I need to use  Following Configuration in Scehma.xml :


  
  


After this :


I should pick all education values(from MySql Education Database Table)
concerned to one profile

and keep this in a one variable - EducationValuesForSolr

and then EducationValuesForSolr's value need to assign to Schema.XML
defined
variable education ?


Please let me know If I am using right approach and Comments?

/Sumit


DIH - deleting documents, high performance (delta) imports, and passing parameters

2010-08-30 Thread Ephraim Ofir
cById to work due to lack of
documentation...  I ended up having to look at the code in order to get
it to work.  I finally realized that $deleteDocById has 2 bugs:

1. Not sure it's a bug, but looks like a bug to me - if the query
returns any values other than $deleteDocById for the row you want
deleted, it deletes the row but also re-adds it with the rest of the
data, so in effect the row isn't deleted.  In order to work around this
issue, you have to either make sure no data other than
$deleteDocById= exists in rows to be deleted or add $skipDoc='true'
(which I think is a little counter-intuitive, but was the better choice
in my case).  My query looks something like:
SELECT u.id,
   u.name,
   ...
   IF(u.delete_flag > 0, u.id, NULL) AS $deleteDocById,
   IF(u.delete_flag > 0, 'true', NULL) AS $skipDoc
FROM users_tb u

2. $deleteDocById doesn't update the statistics of deleted documents.
This has 2 downsides, the obvious one is that you don't know if/how many
documents were deleted, the not-so-obvious one is that if your import
contains only deleted items, it won't be committed automatically by DIH
and you'll have to commit it manually.



That's all I have so far, planning to add the $deleteDocById bugs (and a
patch) to JIRA as soon as I get a chance. Hope this helps somebody and
open to any suggestions,
Ephraim Ofir


RE: SolrJ and Multi Core Set up

2010-09-05 Thread Ephraim Ofir
Not sure about SolrJ, but generally in multi core Solr your core has a
name and a data dir which don't have to be the same.  In your case, you
could have 2 cores called "live" and "rebuild" which reside on 2 data
dirs called "core0" and "core1".  You would always access the cores by
their names, and when you swap them your "rebuild" would become your
"live".  Whenever you swap them a different one will point to the same
data dir "core0", but you won't really care which one points where.

Ephraim Ofir

-Original Message-
From: Shaun Campbell [mailto:campbell.sh...@gmail.com] 
Sent: Friday, September 03, 2010 2:41 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrJ and Multi Core Set up

Thanks Chantal I hadn't spotted that that's a big help.

Thank you.
Shaun

On 3 September 2010 12:31, Chantal Ackermann <
chantal.ackerm...@btelligent.de> wrote:

> Hi Shaun,
>
> you create the SolrServer using multicore by just adding the core to
the
> URL. You don't need to add anything with SolrQuery.
>
> URL url = new URL(new URL(solrBaseUrl), coreName);
> CommonsHttpSolrServer server = new CommonsHttpSolrServer(url);
>
> Concerning the "default" core thing - I wouldn't know about that.
>
>
> Cheers,
> Chantal
>
> On Fri, 2010-09-03 at 12:03 +0200, Shaun Campbell wrote:
> > I'm writing a client using SolrJ and was wondering how to handle a
multi
> > core installation.  We want to use the facility to rebuild the index
on
> one
> > of the cores at a scheduled time and then use the SWAP facility to
switch
> > the "live" core to the newly rebuilt core.  I think I can do the
SWAP
> with
> > CoreAdminRequest.setAction() with a suitable parameter.
> >
> > First of all, does Solr have some concept of a default core? If I
have
> core0
> > as my "live" core and core1 which I rebuild, then after the swap I
expect
> > core0 to now contain my rebuilt index and core1 to contain the old
live
> core
> > data.  My application should then need to keep referring to core0 as
> normal
> > with no change.  Does I have to refer to core0 programmatically?
I've
> > currently got working client code to index and to query my Solr data
but
> I
> > was wondering whether or how I set the core when I move to multi
core?
> > There's examples showing it set as part of the URL so my guess it's
done
> by
> > using something like setParam on SolrQuery.
> >
> > Has anyone got any advice or examples of using SolrJ in a multi core
> > installation?
> >
> > Regards
> > Shaun
>
>
>
>


RE: Download document from solr

2010-09-05 Thread Ephraim Ofir
You could index into your id (or another field) the download url of the 
document and then use that to enable download of the document.

Ephraim Ofir


-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: Friday, September 03, 2010 1:07 AM
To: solr-user@lucene.apache.org
Subject: Re: Download document from solr

Yes. Indexing a PDF&other types with '/extract' means that Solr finds
words in the document and indexes those in a field 'content'. It does
not save the binary contents of the file. You could make a request
handler that fetches one document and generates a redirect to the
link.

On Thu, Sep 2, 2010 at 7:35 AM, Matteo Moci  wrote:
>  Thank you for the suggestions,
> I just completed the tutorial at http://lucene.apache.org/solr/tutorial.html
> and i understood that in the GET parameters I can choose wt=standard (and
> obtain an xml structure in the results),
> wt=json or wt=php.
>
> All of them display the results inline, in the sense that they are embedded
> and entirely included in the response.
>
> If I submit pdfs (i think it is also for docs and CSVs) files to solr, I
> will get in the results something like this in json:
>
> [some part of response omitted]
> response":{"numFound":1,"start":0,"maxScore":0.34002018,"docs":[
>    {
>     "last_modified":"2010-08-05T14:07:24Z",
>     "id":"doc1",
>     "content_type":["application/pdf"],
>     "score":0.34002018}]
>  }}
>
> ( example taken from http://wiki.apache.org/solr/ExtractingRequestHandler )
>
> that shows no content at all.
> The only way I have to retrieve and download the pdf file is to use the
> id=doc1 to access some repository (even a database table )
> that can provide me the content starting from the id.
>
> Does this look like a common practice?
>
> Thank you
>
>
>
>
>
> Il 02/09/10 08:47, Lance Norskog ha scritto:
>>
>> Solr can return the list of results in JSON or php format, so that you
>> UI can allow a download.
>>
>> You can write a UI in the Velocity toolkit- it's pretty easy.
>>
>> On Wed, Sep 1, 2010 at 8:24 AM, Matteo Moci  wrote:
>>>
>>>  Hello to All,
>>> I am a newbie with Solr, and I am trying to understand if I can use it
>>> form
>>> my purpose,
>>> and I was wondering how Solr lists the result documents: do they appear
>>> as
>>> "downloadable files",
>>> just like http://solr.machine.com/path/file.doc, or do I need develop
>>> another layer to take care of downloading?
>>> Even a link to the docs might work...
>>>
>>> Thank you,
>>> Matteo
>>>
>>>
>>
>>
>



-- 
Lance Norskog
goks...@gmail.com


RE: Delta Import with something other than Date

2010-09-12 Thread Ephraim Ofir
Alternatively, you could use the deltaQuery to retrieve the last indexed
id from the DB (you'd have to save it there on your previous import).
Your entity would look something like:




You could implement your deltaImportQuery as a stored procedure which
would store the appropriate id in last_id_table (for the next
delta-import) in addition to returning the data from the query.

Ephraim Ofir


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Friday, September 10, 2010 4:54 AM
To: solr-user@lucene.apache.org
Subject: Re: Delta Import with something other than Date

  On 9/9/2010 1:23 PM, Vladimir Sutskever wrote:
> Shawn,
>
> Can you provide a sample of passing the parameter via URL? And how
using it would look in the data-config.xml
>

Here's the URL that I send to do a full build on my last shard:

http://idxst5-a:8983/solr/build/dataimport?command=full-import&optimize=
true&commit=true&dataTable=ncdat&numShards=6&modVal=5&minDid=0&maxDid=24
2895591

If I want to do a delta, I just change the command to delta-import and 
give it a proper minDid value, rather than 0.

Below is the entity from my data-config.xml.  You have to have a 
deltaQuery defined for delta-import to work, but if you're going to use 
your own placeholders, just put something in that returns a single value

very quickly.  In my case, my query and deltaImportQuery are actually 
identical.







RE: using variables/properties in dataconfig.xml

2010-09-16 Thread Ephraim Ofir
No, it's not possible.  See workaround: 
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3c9f8b39cb3b7c6d4594293ea29ccf438b01702...@icq-mail.icq.il.office.aol.com%3e

Ephraim Ofir | Reporting and Host Development team | ICQ 
P: +972 3 7665510 | M: + 972 52 4888510 | F: +972 3 7665566 | ICQ#: 18981 | E: 
ephra...@icq.com 
 


-Original Message-
From: Jason Chaffee [mailto:jchaf...@ebates.com] 
Sent: Wednesday, September 15, 2010 9:58 PM
To: solr-user@lucene.apache.org
Subject: using variables/properties in dataconfig.xml

Is it possible to use the same type of property configuration in
dataconfig.xml as is possible in solrconfig.xml?

 

I tried it and it didn't seem to work.  For example,

 

${solr.data.dir:/opt/search/store/solr/data}

 

And in the dataconfig.xml, I would like to do this to configure the
baseUrl:

 

  

 

Thanks,

 

Jason



RE: DataImportHandler with multiline SQL

2010-09-19 Thread Ephraim Ofir
I'd go with moving this logic to the DB in a stored procedure...

Ephraim Ofir | Reporting and Host Development team | ICQ 
P: +972 3 7665510 | M: + 972 52 4888510 | F: +972 3 7665566 | ICQ#: 18981 | E: 
ephra...@icq.com 
 


-Original Message-
From: David Yang [mailto:dy...@nextjump.com] 
Sent: Thursday, September 16, 2010 9:07 PM
To: solr-user@lucene.apache.org
Subject: DataImportHandler with multiline SQL

Hi

 

I am using the DIH to retrieve data, and as part of the process, I
wanted to create a temporary table and then import data from that. I
have played around a little with DIH and it seems like for a query like:
"select x; select y;" you can have select y to return no results and do
random stuff, but the first select x needs to return results.

Does anybody know exactly how DIH handles multiple sql statements in the
query?

 

Cheers,

David



RE: Concurrent DB updates and delta import misses few records

2010-09-26 Thread Ephraim Ofir
You could store the last indexed ID in the DB.  Implement the delta
import as a stored procedure that saves the last imported ID in the DB.
On subsequent delta imports, use the deltaQuery to get that ID from the
DB and use it in the deltaImportQuery
See
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201009.mbox/%3
c9f8b39cb3b7c6d4594293ea29ccf438b0174c...@icq-mail.icq.il.office.aol.com
%3e


Ephraim Ofir


-Original Message-
From: Shashikant Kore [mailto:shashik...@gmail.com] 
Sent: Thursday, September 23, 2010 8:48 AM
To: solr-user@lucene.apache.org
Subject: Re: Concurrent DB updates and delta import misses few records

Thanks for the pointer, Shawn.  It, definitely, is useful.

I am wondering if you could retrieve minDid from the solr rather than
storing it externally. Max id from Solr index and max id from DB should
define the lower and upper thresholds, respectively, of the delta range.
Am
I missing something?

--shashi

On Wed, Sep 22, 2010 at 6:47 PM, Shawn Heisey  wrote:

>  On 9/22/2010 1:39 AM, Shashikant Kore wrote:
>
>> Hi,
>>
>> I'm using DIH to index records from a database. After every update on
>> (MySQL) DB, Solr DIH is invoked for delta import.  In my tests, I
have
>> observed that if db updates and DIH import is happening concurrently,
>> import
>> misses few records.
>>
>> Here is how it happens.
>>
>> The table has a column 'lastUpdated' which has default value of
current
>> timestamp. Many records are added to database in a single transaction
that
>> takes several seconds. For example, if 10,000 rows are being
inserted, the
>> rows may get timestamp values from '2010-09-20 18:21:20' to
'2010-09-20
>> 18:21:26'. These rows become visible only after transaction is
committed.
>> That happens at, say, '2010-09-20 18:21:30'.
>>
>> If Solr is import gets triggered at '18:20:29', it will use a
timestamp of
>> last import for delta query. This import will not see the records
added in
>> the aforementioned transaction as transaction was not committed at
that
>> instant. After this import, the dataimport.properties will have last
index
>> time as '18:20:29'.  The next import will not able to get all the
rows of
>> previously referred trasaction as some of the rows have timestamp
earlier
>> than '18:20:29'.
>>
>> While I am testing extreme conditions, there is a possibility of
missing
>> out
>> on some data.
>>
>> I could not find any solution in Solr framework to handle this. The
table
>> has an auto increment key, all updates are deletes followed by
inserts.
>> So,
>> having last_indexed_id would have helped, where last_indexed_id is
the max
>> value of id fetched in that import. The query would then become
"Select id
>> where id>last_indexed_id.' I suppose, Solr does not have any
provision
>> like
>> this.
>>
>> Two options I could think of are:
>> (a) Ensure at application level that there are no concurrent DB
updates
>> and
>> DIH import requests going concurrently.
>> (b) Use exclusive locking during DB update
>>
>> What is the best way to address this problem?
>>
>
> Shashi,
>
> I was not solving the same problem, but perhaps you can adapt my
solution
> to yours.  My main problem was that I don't have a modified date in my
> database, and due to the size of the table, it is impractical to add
one.
>  Instead, I chose to track the database primary key (a simple
autoincrement)
> outside of Solr and pass min/max values into DIH for it to use in the
SELECT
> statement.  You can see a simplified version of my entity here, with a
URL
> showing how to send the parameters in via the dataimport GET:
>
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg40466.html
>
> The update script that runs every two minutes gets MAX(did) from the
> database, retrieves the minDid from a file on an NFS share, and runs a
> delta-import with those two values.  When the import is reported
successful,
> it writes the maxDid value to the minDid file on the network share for
the
> next run.  If the import fails, it sends an alarm and doesn't update
the
> minDid.
>
> Shawn
>
>


RE: Is Solr right for our project?

2010-10-03 Thread Ephraim Ofir
The shards parameter can be added by the search handler if you configure it to 
do so, then the client doesn't have to know about it.  You can put your 
replicated shards behind a proxy/balancer which will check their health and 
that way fallback will be automatic.

Ephraim Ofir

-Original Message-
From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] 
Sent: Tuesday, September 28, 2010 3:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Is Solr right for our project?

Yes, in the latest released version (1.4.1), there is a shards= parameter but 
the client needs to fill it, i.e. the client needs to know what servers are 
indexers, searchers, shard masters and shard replicas...

The SolrCloud stuff is still not committed and only available as a patch right 
now. However, we encourage you to do a test install based on TRUNK+SOLR-1873 
and give it a try. But we cannot guarantee that the APIs will not change in the 
released version (hopefully 3.1 sometime this year).

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 28. sep. 2010, at 10.44, Mike Thomsen wrote:

> Interesting. So what you are saying, though, is that at the moment it
> is NOT there?
> 
> On Mon, Sep 27, 2010 at 9:06 PM, Jan Høydahl / Cominvent
>  wrote:
>> Solr will match this in version 3.1 which is the next major release.
>> Read this page: http://wiki.apache.org/solr/SolrCloud for feature 
>> descriptions
>> Coming to a trunk near you - see 
>> https://issues.apache.org/jira/browse/SOLR-1873
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>> On 27. sep. 2010, at 17.44, Mike Thomsen wrote:
>> 
>>> (I apologize in advance if I missed something in your documentation,
>>> but I've read through the Wiki on the subject of distributed searches
>>> and didn't find anything conclusive)
>>> 
>>> We are currently evaluating Solr and Autonomy. Solr is attractive due
>>> to its open source background, following and price. Autonomy is
>>> expensive, but we know for a fact that it can handle our distributed
>>> search requirements perfectly.
>>> 
>>> What we need to know is if Solr has capabilities that match or roughly
>>> approximate Autonomy's Distributed Search Handler. What it does it
>>> acts as a front-end for all of Autonomy's IDOL search servers (which
>>> correspond in this scenario to Solr shards). It is configured to know
>>> what is on each shard, which servers hold each shard and intelligently
>>> farms out queries based on that configuration. There is no need to
>>> specify which IDOL servers to hit while querying; the DiSH just knows
>>> where to go. Additionally, I believe in cases where an index piece is
>>> mirrored, it also monitors server health and falls back intelligently
>>> on other backup instances of a shard/index piece based on that.
>>> 
>>> I'd appreciate it if someone can give me a frank explanation of where
>>> Solr stands in this area.
>>> 
>>> Thanks,
>>> 
>>> Mike
>> 
>> 



RE: DIH sub-entity not indexing

2010-10-04 Thread Ephraim Ofir
The closest you can get to debugging (without actually debugging...) is
to look at the logs and use
http://wiki.apache.org/solr/DataImportHandler#Interactive_Development_Mo
de

Ephraim Ofir


-Original Message-
From: Allistair Crossley [mailto:a...@roxxor.co.uk] 
Sent: Monday, October 04, 2010 3:09 PM
To: solr-user@lucene.apache.org
Subject: Re: DIH sub-entity not indexing

Thanks Ephraim. I tried your suggestion with the ID but capitalising it
did not work. 

Indeed, I have a column that already works using a lower-case id. I wish
I could debug it somehow - see the SQL? Something particular about this
config it is not liking.

I read the post you linked to. This is more a performance-related thing
for him. I would be happy just to see low performance and my contacts
populated right now!! :D

Thanks again

On Oct 4, 2010, at 9:00 AM, Ephraim Ofir wrote:

> Make sure you're not running into a case sensitivity problem, some
stuff
> in DIH is case sensitive (and some stuff gets capitalized by the
jdbc).
> Try using listing.ID instead of listing.id.
> On a side note, if you're using mysql, you might want to look at the
> CONCAT_WS function.
> You might also want to look into a different approach than
sub-entities
> -
>
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3
>
c9f8b39cb3b7c6d4594293ea29ccf438b01702...@icq-mail.icq.il.office.aol.com
> %3E
> 
> Ephraim Ofir
> 
> -Original Message-
> From: Allistair Crossley [mailto:a...@roxxor.co.uk] 
> Sent: Monday, October 04, 2010 2:49 PM
> To: solr-user@lucene.apache.org
> Subject: Re: DIH sub-entity not indexing
> 
> I have tried a more elaborate join also following the features example
> of the DIH example but same result - SQL works fine directly but Solr
is
> not indexing the array of full_names per Listing, e.g.
> 
> 
> 
>   query="select * from listing_contacts where
> listing_id = '${listing.id}'">
>   query="select concat(first_name,
> concat(' ', last_name)) as full_name from contacts where id =
> '${listing_contact.contact_id}'">
>   
>   
>
> 
> 
> 
> Am I missing the obvious?
> 
> On Oct 4, 2010, at 8:22 AM, Allistair Crossley wrote:
> 
>> Hello list,
>> 
>> I've been successful with DIH to a large extent but a seemingly
simple
> extra column I need is posing problems. In a nutshell I have 2
entities
> let's say - Listing habtm Contact. I have copied the relevant parts of
> the configs below.
>> 
>> I have run my SQL for the sub-entity Contact and this is produces
> correct results. No errors are given by Solr on running the import.
Yet
> no records are being set with the contacts array.
>> 
>> I have taken out my sub-entity config and replaced it with a simple
> template value just to check and values then come through OK.
>> 
>> So it certainly seems limited to my query or query config somehow. I
> followed roughly the example of the DIH bundled example.
>> 
>> DIH.xml
>> ===
>> 
>> 
>> ...
>> > query="select concat(c.first_name, concat(' ', c.last_name)) as
> full_name from contacts c inner join listing_contacts lc on c.id =
> lc.contact_id where lc.listing_id = '${listing.id}'">
>> 
>> 
>> 
>> SCHEMA.XML
>> 
>>  multiValued="true" required="false" />
>> 
>> 
>> Any tips appreciated.
> 



RE: DIH sub-entity not indexing

2010-10-04 Thread Ephraim Ofir
Make sure you're not running into a case sensitivity problem, some stuff
in DIH is case sensitive (and some stuff gets capitalized by the jdbc).
Try using listing.ID instead of listing.id.
On a side note, if you're using mysql, you might want to look at the
CONCAT_WS function.
You might also want to look into a different approach than sub-entities
-
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3
c9f8b39cb3b7c6d4594293ea29ccf438b01702...@icq-mail.icq.il.office.aol.com
%3E

Ephraim Ofir

-Original Message-
From: Allistair Crossley [mailto:a...@roxxor.co.uk] 
Sent: Monday, October 04, 2010 2:49 PM
To: solr-user@lucene.apache.org
Subject: Re: DIH sub-entity not indexing

I have tried a more elaborate join also following the features example
of the DIH example but same result - SQL works fine directly but Solr is
not indexing the array of full_names per Listing, e.g.











Am I missing the obvious?

On Oct 4, 2010, at 8:22 AM, Allistair Crossley wrote:

> Hello list,
> 
> I've been successful with DIH to a large extent but a seemingly simple
extra column I need is posing problems. In a nutshell I have 2 entities
let's say - Listing habtm Contact. I have copied the relevant parts of
the configs below.
> 
> I have run my SQL for the sub-entity Contact and this is produces
correct results. No errors are given by Solr on running the import. Yet
no records are being set with the contacts array.
> 
> I have taken out my sub-entity config and replaced it with a simple
template value just to check and values then come through OK.
> 
> So it certainly seems limited to my query or query config somehow. I
followed roughly the example of the DIH bundled example.
> 
> DIH.xml
> ===
> 
> 
>  ...
>   query="select concat(c.first_name, concat(' ', c.last_name)) as
full_name from contacts c inner join listing_contacts lc on c.id =
lc.contact_id where lc.listing_id = '${listing.id}'">
> 
> 
> 
> SCHEMA.XML
> 
> 
> 
> 
> Any tips appreciated.



RE: multi level faceting

2010-10-04 Thread Ephraim Ofir
Take a look at "Mastering the Power of Faceted Search with Chris
Hostetter"
(http://www.lucidimagination.com/solutions/webcasts/faceting).  I think
there's an example of what you're looking for there.

Ephraim Ofir

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Tuesday, October 05, 2010 5:44 AM
To: solr-user@lucene.apache.org
Subject: Re: multi level faceting

Hi,

I *think* this is not what Vincent was after.  If I read the suggestions

correctly, you are saying to use &fq=x&fq=y -- multiple fqs.
But I think Vincent is wondering how to end up with something that will
let him 
create a UI with multi-level facets (with a single request), e.g.

Footwear (100)
  Sneakers (20)
Men (1)
Women (19)

  Dancing shoes (10)
Men (0)
Women (10)
...

If this is what Vincent was after, I'd love to hear suggestions myself.
:)

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Jason Brown 
> To: solr-user@lucene.apache.org
> Sent: Mon, October 4, 2010 11:34:56 AM
> Subject: RE: multi level faceting
> 
> Yes, by adding fq back into the main query you will get results
increasingly  
>filtered each time.
> 
> You may run into an issue if you are displaying facet  counts, as the
facet 
>part of the query will also obey the increasingly filtered  fq, and so
not 
>display counts for other categories anymore from the chosen facet
(depends if 
>you need to display counts from a facet once the first value from  the
facet has 
>been chosen if you get my drift). Local params are a way to deal  with
this by 
>not subjecting the facet count to the same fq restriction (but
allowing the 
>search results to obey it).
> 
> 
> 
> -Original  Message-
> From: Nguyen, Vincent (CDC/OD/OADS) (CTR) [mailto:v...@cdc.gov]
> Sent: Mon 04/10/2010  16:34
> To: solr-user@lucene.apache.org
> Subject:  RE: multi level faceting
> 
> Ok.  Thanks for the quick  response.
> 
> Vincent Vu Nguyen
> Division of Science Quality and  Translation
> Office of the Associate Director for Science
> Centers for  Disease Control and Prevention (CDC)
> 404-498-6154
> Century Bldg  2400
> Atlanta, GA 30329 
> 
> 
> -Original Message-
> From:  Allistair Crossley [mailto:a...@roxxor.co.uk] 
> Sent: Monday, October  04, 2010 9:40 AM
> To: solr-user@lucene.apache.org
> Subject:  Re: multi level faceting
> 
> I think that is just sending 2 fq facet queries  through. In Solr PHP
I
> would do that with, e.g.
> 
> $params['facet'] =  true;
> $params['facet.fields'] = array('Size');
> $params['fq'] =>  array('sex' => array('Men', 'Women'));
> 
> but yes i think you'd have to  send through what the current facet
query
> is and add it to your next  drill-down
> 
> On Oct 4, 2010, at 9:36 AM, Nguyen, Vincent (CDC/OD/OADS)  (CTR)
wrote:
> 
> > Hi,
> > 
> > 
> > 
> > I was wondering  if there's a way to display facet options based on
> > previous facet  values.  For example, I've seen many shopping sites
> where
> > a user  can facet by "Mens" or "Womens" apparel, then be shown
"sizes"
> to
> >  facet by (for Men or Women only - whichever they chose).  
> > 
> > 
> > 
> > Is this something that would have to be handled at the  application
> > level?
> > 
> > 
> > 
> > Vincent Vu  Nguyen
> > 
> > 
> > 
> > 
> > 
> 
> 
> 
> 
> If you  wish to view the St. James's Place email disclaimer, please
use the 
>link  below
> 
> http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
> 


RE: Speeding up solr indexing

2010-10-10 Thread Ephraim Ofir
Try running the query you're using in DIH from command line on the DB host and 
on the solr host to see what kind of times you get from the DB itself and from 
the network, you're bottleneck might be there.  If you find that's not it, take 
a look at this post regarding high performance DIH imports, you can get serious 
improvement in performance by not using sub-entities 
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3c9f8b39cb3b7c6d4594293ea29ccf438b01702...@icq-mail.icq.il.office.aol.com%3e
 

Ephraim Ofir

-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net] 
Sent: Saturday, October 09, 2010 10:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Speeding up solr indexing

Looking at it, and now knowing how much memory your other processes on your box 
use (nor how much memory you have set aside for Java), I would start with 
DOUBLING your ram. Make sure that you have enough Java memory.

You will know if it has some effect by using the 2:1 size ratio. 100mb for all 
that data ia pretty small, I think.


Use the scientific method; Change only one parameter at a time and check 
results.

It's always on of four things:
(in different order depending on task, but listed alphabetically here)
--
Memory (process assigned and/or actual physical memory)
Processor
Network Bandwidth
Hard Drive Bandwidth
(sometimes you can add motherboard I/O paths also.
 as of this date, AMD has much more I/O paths in their
 consumer line of processors.)

In order ease of experimenting with(Easiest to hardest):
---
Appication/process assigned memory
Physical memory
Network Bandwidth
HardDrive Bandwidth
  Screaming fast SCSI 15K rpm drives
  RAID arrays, casual
  RAID arrays, professional
  External DRAM drive 64 gig max/RAID them for more
Processor(s) 
  Put maximum speed/cache size motherboard will take.
  Otherwise, USUALLY requires changing motherboard/HOSTING setup
I/O channels
  USUALLY requires changing motherboard/HOSTING setup





Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Sat, 10/9/10, sivaprasad  wrote:

> From: sivaprasad 
> Subject: Re: Speeding up solr indexing
> To: solr-user@lucene.apache.org
> Date: Saturday, October 9, 2010, 8:09 AM
> 
> Hi,
> Please find the configurations below.
> 
> Machine configurations(Solr running here):
> 
> RAM - 4 GB
> HardDisk - 180GB
> Os - Red Hat linux version 5
> Processor-2x Intel Core 2 Duo CPU @2.66GHz
> 
> 
> 
> Machine configurations(Mysql server is running here):
> RAM - 4 GB
> HardDisk - 180GB
> Os - Red Hat linux version 5
> Processor-2x Intel Core 2 Duo CPU @2.66GHz
> 
> My sql Server deatils:
> My sql version - Mysql 5.0.22
> 
> Solr configuration details:
> 
>  
>   
>    
> false
> 
>     20
>    
>    
>  
>   
>    
> 100
>    
> 2147483647
>    
> 1
>    
> 1000
>    
> 1
>    
> 
>    
>    
> 
>     
>    
> 
>     single
>   
> 
>   
>     
>    
> false
>    
> 100
>     20
>    
>    
> 
>    
> 2147483647
>    
> 1
>    
> false
>   
> 
>   
>    class="solr.DirectUpdateHandler2">
>    
> 10
>      
>       1 
>       6
>     
>     
>     
>     
>   
> 
> Solr document details:
> 
> 21 fields are indexed and stored
> 3 fileds are indexed only.
> 3 fileds are stored only.
> 3 fileds are indexed,stored and multi valued
> 2 fileds indexed and multi valued
> 
> And i am copying some of the indexed fileds.In this 2
> fileds are multivalued
> and has thousands of values.
> 
> In db-config-file the main table contains 0.6 million
> records.
> 
> When i tested for the same records, the index has taken 1hr
> 30 min.In this
> case one of the multivalued filed table doesn't have
> records.After putting
> data into this table,for each main table record , this
> table has thousands
> of records and this filed is indexed and stored.It is
> taking more than 24
> hrs .
> 
> Solr is running on tomcat 6.0.26, jdk1.6.0_17 and solr
> 1.4.1
> 
> I am using JVM's default settings.
> 
> Why this is taking this much time?Any body has suggestions,
> where i am going
> wrong.
> 
> Thanks,
> JS
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Speeding-up-solr-indexing-tp1667054p1670737.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
> 


RE: multicore defaultCoreName not working

2010-10-13 Thread Ephraim Ofir
Which version of solr are you using?
I believe this is only available on trunk, not even on 1.4.1 (SOLR-1722).  
Also, watch out for SOLR-2127 bug, haven't gotten around to creating a patch 
yet...

Ephraim Ofir


-Original Message-
From: Ron Chan [mailto:rc...@i-tao.com] 
Sent: Wednesday, October 13, 2010 9:20 AM
To: solr-user@lucene.apache.org
Subject: multicore defaultCoreName not working

Hello 

I have this in my solr.xml


  


  



admin is working and the individual cores are working through

http://localhost:8080/solr/live/select/?q=abc
and
http://localhost:8080/solr/staging/select/?q=abc

returning the correct results from the right core

however, I wanted to keep the existing single core URLs and thought that the 
defaultCoreName attribute does this

i.e.
http://localhost:8080/solr/select/?q=abc

should give me the "live" core

but it gives me "Missing core name in path"

Is there anything else I need to do?

Thanks
Ron


RE: DIH delta-import question

2010-10-19 Thread Ephraim Ofir
According to the DIH wiki, delta-import is only supported by sql
(http://wiki.apache.org/solr/DataImportHandler#Using_delta-import_comman
d-1)


Ephraim Ofir

-Original Message-
From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de] 
Sent: Friday, October 15, 2010 8:20 AM
To: solr-user@lucene.apache.org
Subject: DIH delta-import question

Dear list,

I'm trying to delta-import with datasource FileDataSource and
processor FileListEntityProcessor. I want to load only files
which are newer than dataimport.properties -> last_index_time.
It looks like that newerThan="${dataimport.last_index_time}" is
without any function.

Can it be that newerThan is configured under FileListEntityProcessor
but used for the next following entity processor and not for
FileListEntityProcessor itself?

This is in my case the XPathEntityProcessor which doesn't support
newerThan.
Version is solr 4.0 from trunk.

Regards,
Bernd


RE: DIH - configure password in 1 place and store it in encrypted form?

2010-10-19 Thread Ephraim Ofir
You could include a common file with the JdbcDataSource
(http://wiki.apache.org/solr/SolrConfigXml#XInclude) or add the password
as a property in solr.xml in the container scope
(http://wiki.apache.org/solr/CoreAdmin#Configuration) so it will be
available to all cores.
Personally, I use a single configuration for all cores with soft-linked
config files, so I only have to change the config in one place.

Ephraim Ofir


-Original Message-
From: Gora Mohanty [mailto:g...@mimirtech.com] 
Sent: Sunday, October 17, 2010 7:05 PM
To: solr-user@lucene.apache.org
Subject: Re: DIH - configure password in 1 place and store it in
encrypted form?

On Sun, Oct 17, 2010 at 7:02 PM, Arunkumar Ayyavu
 wrote:
> Hi!
>
> I have multiple cores reading from the same database and I've provided
> the user credentials in all data-config.xml files. Is there a way to
> tell JdbcDataSource in data-config.xml to read the username and
> password from a file? This would help me not to change the
> username/password in multiple data-config.xml files.
>
> And is it possible to store the password in encrypted and let the DIH
> to call the decrypter to read the password?
[...]

As far as I am aware, it is not possible to do either of the two
options above. However, one could extend the JdbcDataSource
class to add such functionality.

Regards,
Gora


RE: How to index on basis of a condition?

2010-10-25 Thread Ephraim Ofir
Assuming you're talking about data that comes from a DB, I find it easiest to 
do this kind of logic on the DB's side (mssql example):
SELECT IF(someField = someValue, desiredValue, NULL) AS desiredName from 
someTable

If that's not possible, you can use 
RegexTransformer(http://wiki.apache.org/solr/DataImportHandler#RegexTransformer)
 or (worst case and worst performance) 
ScriptTransformer(http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer)
 and actually write a JS script to do your logic.

Ephraim Ofir

-Original Message-
From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] 
Sent: Monday, October 25, 2010 10:23 AM
To: solr-user@lucene.apache.org
Subject: Re: How to index on basis of a condition?

Do you want to use a field's content do decide whether the document should be 
indexed or not?
You could write an UpdateProcessor for that, simply aborting the chain for the 
docs that don't pass your test.

@Override
public void processAdd(AddUpdateCommand cmd) throws IOException {
SolrInputDocument doc = cmd.getSolrInputDocument();
String value = (String) doc.getFieldValue("myfield");
String condition = "foobar";
if(value == condition) {
super.processAdd(cmd);
}
}

But if what you meant was to skip only that field if it does not match 
condition, you could use doc.removeField(name) instead. Now you can feed your 
content using whatever method you like.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 25. okt. 2010, at 08.38, Pawan Darira wrote:

> Hi
> 
> I want to index a particular field on one if() condition. Can i do it
> through DIH?
> 
> Please suggest.
> 
> -- 
> Thanks,
> Pawan Darira



RE: How to index on basis of a condition?

2010-10-26 Thread Ephraim Ofir
This is probably just a date format problem, nothing to do with the IF()
statement.  Try applying this on your date:
DATE_FORMAT(yourDate, '%Y-%m-%dT00:00:00Z')

Ephraim Ofir

-Original Message-
From: Pawan Darira [mailto:pawan.dar...@gmail.com] 
Sent: Tuesday, October 26, 2010 12:26 PM
To: solr-user@lucene.apache.org
Subject: Re: How to index on basis of a condition?

I am using mysql database, and, field type is "date"

On Tue, Oct 26, 2010 at 2:56 PM, Gora Mohanty 
wrote:

> On Tue, Oct 26, 2010 at 2:37 PM, Pawan Darira 
> wrote:
> > Thanks Mr. Ephraim Ofir. I used the SELECT IF() for my requirement.
The
> > query result is correct. But when i see it in my index, the value
stored
> is
> > something unusual bunch of characters e.g. "*...@6628ad5a"*
> [...]
>
> Which database are you indexing from? The field type is probably
> a blob in the database. Check that, and look into the ClobTransformer:
> http://wiki.apache.org/solr/DataImportHandler#ClobTransformer
>
> Regards,
> Gora
>



-- 
Thanks,
Pawan Darira


RE: Does Solr reload schema.xml dynamically?

2010-10-26 Thread Ephraim Ofir
Note that usually when you change the schema.xml you have not only to
restart solr, but also rebuild the index, so the issue of how to reload
the file seems like a small problem...

Ephraim Ofir

-Original Message-
From: Peter Karich [mailto:peat...@yahoo.de] 
Sent: Tuesday, October 26, 2010 12:29 PM
To: solr-user@lucene.apache.org
Subject: Re: Does Solr reload schema.xml dynamically?

  Hi,

See this:
http://wiki.apache.org/solr/CoreAdmin#RELOAD

Solr will also load the new configuration (without restart the webapp) 
on the slaves when using replication:
http://wiki.apache.org/solr/SolrReplication

Regards,
Peter.

> Hi Everybody,
>
> If I change my schema.xml to, do I have to restart Solr. Is there some
way, I can apply the changes to schema.xml without restarting Solr?
>
> Swapnonil Mukherjee
>
>
>
>


-- 
http://jetwick.com twitter search prototype



RE: How to index on basis of a condition?

2010-10-26 Thread Ephraim Ofir
Try:
select IF(sub_cat_id=2002, DATE_FORMAT(ad_post_date,
'%Y-%m-%dT00:00:00Z/DAY'), null) as 'ad_sort_field' from
tcuser.ad_details where 

Ephraim Ofir

-Original Message-
From: Pawan Darira [mailto:pawan.dar...@gmail.com] 
Sent: Tuesday, October 26, 2010 1:29 PM
To: solr-user@lucene.apache.org
Subject: Re: How to index on basis of a condition?

My Sql is

select IF(sub_cat_id=2002, ad_post_date, null) as 'ad_sort_field' from
tcuser.ad_details where 

+---+
| ad_sort_field |
+---+
| 2010-05-30|
| 2010-05-02|
| 2010-10-07|
| NULL|
| 2010-10-15|
| NULL|
++

Thanks
Pawan


On Tue, Oct 26, 2010 at 4:36 PM, Gora Mohanty 
wrote:

> On Tue, Oct 26, 2010 at 3:56 PM, Pawan Darira 
> wrote:
> > I am using mysql database, and, field type is "date"
> [...]
>
> Could you show us the exact SELECT statement, and some example
> values returned by running the SELECT directly at a mysql console?
>
> Regards,
> Gora
>



-- 
Thanks,
Pawan Darira


RE: If I want to move a core from one physical machine to another....

2010-10-28 Thread Ephraim Ofir
How is this better than replication?

Ephraim Ofir


-Original Message-
From: Ken Stanley [mailto:doh...@gmail.com] 
Sent: Thursday, October 28, 2010 1:59 PM
To: solr-user@lucene.apache.org
Subject: Re: If I want to move a core from one physical machine to another

On Wed, Oct 27, 2010 at 6:12 PM, Ron Mayer  wrote:

> If I want to move a core from one physical machine to another,
> is it as simple as just
>   scp -r core5 otherserver:/path/on/other/server/
> and then adding
>
> on that other server's solr.xml file and restarting the server there?
>
>
>
> PS: Should have I been able to figure the answer to that
>out by RTFM somewhere?
>

Ron,

In our current environment I index all of our data on one machine, and to
save time with "replication", I use scp to copy the data directory over to
our other servers. On the server that I copy from, I don't turn SOLR off,
but on the servers that I copy to, I shutdown tomcat; remove the data
directory; mv the data directory I scp'd from the source; turn tomcat back
on. I do it this way (especially with mv, versus cp) because it is the
fastest way to get the data on the other servers. And, as Gora pointed out,
you need to make sure that your configuration files match (specifically the
schema.xml) the source.

- Ken


RE: Solr MySQL Adding new column to table

2010-11-02 Thread Ephraim Ofir
Not if you use 'SELECT * FROM person'

Ephraim Ofir

-Original Message-
From: nitin.vanaku...@gmail.com [mailto:nitin.vanaku...@gmail.com] 
Sent: Tuesday, November 02, 2010 11:19 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr MySQL Adding new column to table


Hi Sivaprasad,

first of all thanks for your kind response.
i gone through that link,
if i use the dynamicField concept,still  i need to alter the query in
data-config.xml right! 

thanks 
Nitin
-- 
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-MySQL-Adding-new-column-to-table
-tp1826759p1826865.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr MySQL Adding new column to table

2010-11-02 Thread Ephraim Ofir
Your uniqueKey field is defined as id (in schema.xml) and your query
doesn't return an id field.

Ephraim Ofir

-Original Message-
From: nitin.vanaku...@gmail.com [mailto:nitin.vanaku...@gmail.com] 
Sent: Tuesday, November 02, 2010 12:10 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr MySQL Adding new column to table


ok. i have one more issue.
i am getting following exception can you please explore on it


INFO: Creating a connection for entity person with URL:
jdbc:mysql://localhost:3306/example
Nov 2, 2010 3:34:11 PM
org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Time taken for getConnection(): 250
Nov 2, 2010 3:34:11 PM org.apache.solr.handler.dataimport.SolrWriter
upload
WARNING: Error creating document :
SolrInputDocument[{eage=eage(1.0)={28},
ename=ename(1.0)={shree}, eid=eid(1.0)={1}}]
org.apache.solr.common.SolrException: Document is missing uniqueKey
field id
at
org.apache.solr.update.UpdateHandler.getIndexedId(UpdateHandler.java:115
)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.
java:230)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdate
ProcessorFactory.java:61)
at
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImport
Handler.java:292)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:392)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
:242)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
0)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
r.java:331)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
:389)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
370)
Nov 2, 2010 3:34:11 PM org.apache.solr.handler.dataimport.SolrWriter
upload
WARNING: Error creating document :
SolrInputDocument[{eage=eage(1.0)={29},
ename=ename(1.0)={ramesh}, eid=eid(1.0)={2}}]
org.apache.solr.common.SolrException: Document is missing uniqueKey
field id
at
org.apache.solr.update.UpdateHandler.getIndexedId(UpdateHandler.java:115
)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.
java:230)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdate
ProcessorFactory.java:61)
at
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImport
Handler.java:292)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:392)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
:242)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
0)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
r.java:331)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
:389)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
370)
Nov 2, 2010 3:34:11 PM org.apache.solr.handler.dataimport.DocBuilder
finish
INFO: Import completed successfully
Nov 2, 2010 3:34:11 PM org.apache.solr.update.DirectUpdateHandler2
commit
INFO: start
commit(optimize=true,waitFlush=false,waitSearcher=true,expungeDeletes=fa
lse)
-- 
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-MySQL-Adding-new-column-to-table
-tp1826759p1827093.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Updating last_modified field when using DIH

2010-11-03 Thread Ephraim Ofir
Also, your deltaImportQuery should be:
deltaImportQuery='SELECT * FROM "Entities" WHERE
"ent_id"=${dataimporter.delta.id}"'

Otherwise you're just importing the ids and not the rest of the data.

If performance is important to you, you might also want to check out
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3
c9f8b39cb3b7c6d4594293ea29ccf438b01702...@icq-mail.icq.il.office.aol.com
%3E

Ephraim Ofir


-Original Message-
From: Stefan Matheis [mailto:matheis.ste...@googlemail.com] 
Sent: Wednesday, November 03, 2010 12:58 PM
To: solr-user@lucene.apache.org
Subject: Re: Updating last_modified field when using DIH

Juan,

that's correct .. solr will not touch your database, that's part of your
application-code. solr uses an updated timestamp (which is available
through dataimporter.last_index_time).

so, image the following situation, solr import runs every 10 minutes ..
last
run at 11:00, your entity gets updated at 11:03, next solr-run at 11:10
will
detect this as changed, import the entity and run again at 11:20 ..
then, no
entity will match the delta-query because solr will ask for a
modification_date > 11:10 (last solr-run at this time).

you'll only need to update the last_modified field (in your application)
when the entity is changed and you want solr to (re-)index your data.

HTH,
Stefan

On Tue, Nov 2, 2010 at 7:35 PM, Juan Manuel Alvarez
wrote:

> Hello everyone!
>
> I would like to ask you a question about DIH and delta import.
>
> I am trying to sync Solr with a PostgreSQL database and I have a field
> "ent_lastModified" of type "timestamp without timezone".
>
> Here is my xml file:
>
> 
> url="jdbc:postgresql://host" user="XXX" password="XXX" readOnly="true"
> autoCommit="false"
>transactionIsolation="TRANSACTION_READ_COMMITTED"
> holdability="CLOSE_CURSORS_AT_COMMIT"/>
>
>query=' SELECT * FROM Entities'
>deltaImportQuery='SELECT "ent_id" AS "id" FROM
> "Entities" WHERE "ent_id"=${dataimporter.delta.id}"'
>  deltaQuery=' SELECT "ent_id" AS "id" FROM "Entities" WHERE
> "ent_lastModified" > '${dataimporter.last_index_time}''
>>
>
>
> 
>
> Full-import works fine, but when I run a delta-import the
> "ent_lastModified" field, I get the corresponding records, but the
> "ent_lastModified" stays the same, so if I make another delta-import,
> the same records are retreived.
>
> I have read all the documentation at
> http://wiki.apache.org/solr/DataImportHandler but I could not find an
> update query for the "last_modified" field and Solr does not seem to
> do this automatically.
> I have also tried to name the field "last_modified" as in the example,
> but its value keeps unchanged after a delta-import.
>
> Can anyone point me in the right direction?
>
> Thanks in advance!
> Juan M.
>


RE: Corename after Swap in MultiCore

2010-11-07 Thread Ephraim Ofir
Do you mean solr.core.name has the wrong value after the swap? You
swapped doc-temp so now it's doc and solr.core.name is still doc-temp?
This completely contradicts my experience, what version of solr are you
using?
Why use postCommit? You're running the risk of performing a swap when
you don't mean to.  Are you using DIH? If so, I'd go with querying the
status of the import until it's done and then performing the swap.

Ephraim Ofir


-Original Message-
From: sivaram [mailto:yogendra.bopp...@gmail.com] 
Sent: Wednesday, November 03, 2010 4:46 PM
To: solr-user@lucene.apache.org
Subject: Corename after Swap in MultiCore


Hi everyone,

Long question but please hold on. I'm using a multicore Solr instance to
index different documents from different sources( around 4) and I'm
using a
common config for all the cores. So, for each source I have core and
temp
core like 'doc' and 'doc-temp'. So, everytime I want to get new data, I
do
dataimport to the temp core and then swap the cores. For swaping I'm
using
the postCommit event listener to make sure the swap is done after the
completing commit. 

After the first swap when I use solr.core.name on the doc-temp it is
returning doc as its name ( because the commit is done on the doc's data
dir
after the first swap ). How do I get the core name of the doc-temp here
in
order to swap again with .swap ? 

I'm stuck here. Please help me. Also if anyone know for sure if a
dataimport
is being done on a core then the next swap query will be executed only
after
this dataimport is finished?

Thanks in advance.
Ram.
-- 
View this message in context:
http://lucene.472066.n3.nabble.com/Corename-after-Swap-in-MultiCore-tp18
35325p1835325.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Dataimport performance

2010-12-16 Thread Ephraim Ofir
Check out 
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3c9f8b39cb3b7c6d4594293ea29ccf438b01702...@icq-mail.icq.il.office.aol.com%3e
This approach of not using sub entities really improved our load time.

Ephraim Ofir

-Original Message-
From: Robert Gründler [mailto:rob...@dubture.com] 
Sent: Wednesday, December 15, 2010 4:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Dataimport performance

i've benchmarked the import already with 500k records, one time without the 
artists subquery, and one time without the join in the main query:


Without subquery: 500k in 3 min 30 sec

Without join and without subquery: 500k in 2 min 30.

With subquery and with left join:   320k in 6 Min 30


so the joins / subqueries are definitely a bottleneck. 

How exactly did you implement the custom data import? 

In our case, we need to de-normalize the relations of the sql data for the 
index, 
so i fear i can't really get rid of the join / subquery.


-robert





On Dec 15, 2010, at 15:43 , Tim Heckman wrote:

> 2010/12/15 Robert Gründler :
>> The data-config.xml looks like this (only 1 entity):
>> 
>>  
>>
>>
>>
>>
>>
>>> name="sf_unique_id"/>
>> 
>>
>>  
>>
>> 
>>  
> 
> So there's one track entity with an artist sub-entity. My (admittedly
> rather limited) experience has been that sub-entities, where you have
> to run a separate query for every row in the parent entity, really
> slow down data import. For my own purposes, I wrote a custom data
> import using SolrJ to improve the performance (from 3 hours to 10
> minutes).
> 
> Just as a test, how long does it take if you comment out the artists entity?



RE: UPDATE query in deltaquery

2010-12-30 Thread Ephraim Ofir
This may sound silly, but are you sure the user you're using has
permissions to do the updates you want? Not sure about postgres but I
think some jdbc's require that the connection be defined as rw, maybe
you should try adding readOnly="false" to your jdbc definition.

Ephraim Ofir

-Original Message-
From: Juan Manuel Alvarez [mailto:naici...@gmail.com] 
Sent: Thursday, December 30, 2010 2:52 PM
To: solr-user@lucene.apache.org
Subject: Re: UPDATE query in deltaquery

Hi Erick!

Here is my DIH configuration:









I have tried two options for the deltaQuery:
UPDATE "Global"."Projects" SET "prj_lastSync" = now() WHERE "prj_id" =
'2'; < Throws a null pointer exception as described in the
previous email

The second option is a DB function that I am calling this way:
SELECT "get_deltaimport_items" AS "id" FROM
project.get_deltaimport_items(2, 'project');

The function inside executes the UPDATE query shown above and a SELECT
query for the ids.
The ids are returned ok, but the UPDATE has no effect on the database.

Cheers!
Juan M.


On Thu, Dec 30, 2010 at 1:32 AM, Erick Erickson
 wrote:
> Well, let's see the queries you're sending, and your DIH
configuration.
>
> Otherwise, we're just guessing...
>
> Best
> Erick
>
> On Wed, Dec 29, 2010 at 9:58 PM, Juan Manuel Alvarez
wrote:
>
>> Hi! I would like to ask you a question about using a deltaQuery in
DIH.
>> I am syncing with a PostgreSQL database.
>>
>> At first I was calling a function that made two queries: an UPDATE
and a
>> SELECT.
>> The select result was properly returned, but the UPDATE query did not
>> made any changes,
>> so I tried calling the same function from a PostgreSQL client and
>> everything went OK.
>>
>> So I tried calling a simple UPDATE query directly in the deltaQuery
>> and I receive a
>> NullPointerException that I traced to the line 251 of the
>> JdbcDataSource.java
>> colNames = readFieldNames(resultSet.getMetaData());
>>
>> The question is: is there a way I can make the update query work in
>> the deltaQuery
>> or am I doing something wrong?
>>
>> Happy new year
>> Cheers!
>> Juan M.
>>
>


RE: UPDATE query in deltaquery

2010-12-30 Thread Ephraim Ofir
Does your function get_deltaimport_items perform the update first and then the 
select? Does it make a difference if you change the order? Did you try omitting 
the TRANSACTION_SERIALIZABLE part?

Ephraim Ofir

-Original Message-
From: Juan Manuel Alvarez [mailto:naici...@gmail.com] 
Sent: Thursday, December 30, 2010 7:04 PM
To: solr-user@lucene.apache.org
Subject: Re: UPDATE query in deltaquery

Hi Ephraim! Thanks for the answer!

Actually the user has permissions to make UPDATE queries.

I changed the configuration to


and I still get the same results.

Cheers!
Juan M.

On Thu, Dec 30, 2010 at 12:40 PM, Ephraim Ofir  wrote:
> This may sound silly, but are you sure the user you're using has
> permissions to do the updates you want? Not sure about postgres but I
> think some jdbc's require that the connection be defined as rw, maybe
> you should try adding readOnly="false" to your jdbc definition.
>
> Ephraim Ofir
>
> -Original Message-
> From: Juan Manuel Alvarez [mailto:naici...@gmail.com]
> Sent: Thursday, December 30, 2010 2:52 PM
> To: solr-user@lucene.apache.org
> Subject: Re: UPDATE query in deltaquery
>
> Hi Erick!
>
> Here is my DIH configuration:
>
> 
>    
> url="jdbc:postgresql://${dataimporter.request.dbHost}:${dataimporter.req
> uest.dbPort}/${dataimporter.request.dbName}"
>        user="${dataimporter.request.dbUser}"
> password="${dataimporter.request.dbPassword}" autoCommit="false"
>        transactionIsolation="TRANSACTION_READ_UNCOMMITTED"
> holdability="CLOSE_CURSORS_AT_COMMIT"/>
>    
>                        query='  . '
>          deltaImportQuery='  . '
>                deltaQuery=' . '
>        >
>        
>    
> 
>
> I have tried two options for the deltaQuery:
> UPDATE "Global"."Projects" SET "prj_lastSync" = now() WHERE "prj_id" =
> '2'; < Throws a null pointer exception as described in the
> previous email
>
> The second option is a DB function that I am calling this way:
> SELECT "get_deltaimport_items" AS "id" FROM
> project.get_deltaimport_items(2, 'project');
>
> The function inside executes the UPDATE query shown above and a SELECT
> query for the ids.
> The ids are returned ok, but the UPDATE has no effect on the database.
>
> Cheers!
> Juan M.
>
>
> On Thu, Dec 30, 2010 at 1:32 AM, Erick Erickson
>  wrote:
>> Well, let's see the queries you're sending, and your DIH
> configuration.
>>
>> Otherwise, we're just guessing...
>>
>> Best
>> Erick
>>
>> On Wed, Dec 29, 2010 at 9:58 PM, Juan Manuel Alvarez
> wrote:
>>
>>> Hi! I would like to ask you a question about using a deltaQuery in
> DIH.
>>> I am syncing with a PostgreSQL database.
>>>
>>> At first I was calling a function that made two queries: an UPDATE
> and a
>>> SELECT.
>>> The select result was properly returned, but the UPDATE query did not
>>> made any changes,
>>> so I tried calling the same function from a PostgreSQL client and
>>> everything went OK.
>>>
>>> So I tried calling a simple UPDATE query directly in the deltaQuery
>>> and I receive a
>>> NullPointerException that I traced to the line 251 of the
>>> JdbcDataSource.java
>>> colNames = readFieldNames(resultSet.getMetaData());
>>>
>>> The question is: is there a way I can make the update query work in
>>> the deltaQuery
>>> or am I doing something wrong?
>>>
>>> Happy new year
>>> Cheers!
>>> Juan M.
>>>
>>
>


RE: DataImportHanlder - Multiple entities will step into each other

2011-01-05 Thread Ephraim Ofir
You could get around that by doing the concatenation at the SQL level, that way 
deletes would work as well.

Ephraim Ofir

-Original Message-
From: Matti Oinas [mailto:matti.oi...@gmail.com] 
Sent: Tuesday, January 04, 2011 3:57 PM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHanlder - Multiple entities will step into each other

I managed to do that by using TemplateTransformer


  
 
...
  
 
...


Only problem is that delta import fails to perform delete to the
index. It seems that TemplateTransformer is not used when performing
delete so delete by id doesn't work.



2011/1/4 yu shen :
> Hi All,
>
> I have a dataimporthandler config file as below. It contains multiple
> entities:
> 
>        
> url="jdbc:mysql://localhost:1521/changan?useUnicode=true&characterEncoding=utf8&autoReconnect=true"...
> />
>        
>                
>                
>                
>        
> 
>
> All data are from a database. Problem is item/company and other entity all
> have the field 'id', with value start from 1 to n. In this case,
> item/company etc. will step into each other.
> Is there a way to prevent is from happening. Such as designate different
> entity to different partition.
>
> One way I can think of is to seperate different entity to different
> instance, which is not ideal solution IMO.
>
> Would some one point me to a reference? And also give some instructions?
>


RE: multicore controlled by properties

2011-01-09 Thread Ephraim Ofir
I use a script to generate the appropriate solr.xml for each host according to 
a config file.  You could also prepare separate files and create a soft link 
from solr.xml to the appropriate one on each host.

Ephraim Ofir

-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: Sunday, January 09, 2011 6:03 AM
To: solr-user@lucene.apache.org; Zach Friedland
Subject: Re: multicore controlled by properties

The config files support XInclude. Some sites use this to include a
local configuration that affects your single global file.

On Sat, Jan 8, 2011 at 10:53 AM, Zach Friedland  wrote:
> We have a large number of solr cores that are used by different groups for
> different purposes.  To make the source control simple, we keep a single
> 'multicore' directory and solr.xml references all cores.  We deploy the same
> configuration to all servers (shared NFS mount), and then only populate the
> indexes of the cores that we want running on that server.  However, it still
> seems wasteful to have the cores running where we know they won't be used.  
> What
> I'd like to be able to do is define properties that will allow me to enable 
> and
> disable cores via JVM params on startup.  I was hoping to use the 'enable'
> parameter that is supported elsewhere in solr, but it didn't seem to be
> respected in solr.xml.  Here's the syntax I tried in my solr.xml file:
>
>  
>     enable="${solr.enable.core.businessUnit1:true}"/>
>     enable="${solr.enable.core.businessUnit2:true}"/>
>     enable="${solr.enable.core.businessUnit3:true}"/>
>     enable="${solr.enable.core.businessUnit4:true}"/>
>     enable="${solr.enable.core.businessUnit5:true}"/>
>  
>
> Another idea is that I have solr1.xml, solr2.xml, solr3.xml, solr4.xml (etc);
> and then have some property that tells the JVM which solr.xml version to load
> (and each xml file would have only the cores that that instance needs).  But I
> couldn't find any property that controls which xml file is loaded for
> multicore.  Is the code hard-coded to look for solr.xml?
>
> Thanks
>
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com


RE: How to read values from dataimport.properties in a production environment

2011-11-18 Thread Ephraim Ofir
You can add it yourself in admin-extra.html

Ephraim Ofir

-Original Message-
From: Nico Luna [mailto:nicolaslun...@gmail.com] 
Sent: Friday, November 11, 2011 7:57 PM
To: solr-user@lucene.apache.org
Subject: How to read values from dataimport.properties in a production 
environment

I'm trying to know the values stored in the dataimport.properties file in a
production environment using the solr admin feature, so I copied the same
behaviour as:
[Schema] and [Config] but changing the contentType property (from
contentType=text/xml to contentType=text):

http://localhost:8080/solr/admin/file/?contentType=text;charset=utf-8&file=dataimport.properties

I want to know if there is other way to know the dataimport.properties
values using the solr admin and if that solution may be a possible feature
to add in the solr admin. For example adding into he index.jsp file:

[< a
href="file/?contentType=text;charset=utf-8&file=dataimport.properties">dataimport.properties
]

Thanks, Nicolás

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-read-values-from-dataimport-properties-in-a-production-environment-tp3500453p3500453.html
Sent from the Solr - User mailing list archive at Nabble.com.