date:20130725

Facet at zappos.com

2013-07-25 Thread Ifnu bima

Hi,

I'm currently looking at zappos solr implementation on their website.
One thing make me curious is how their facet filter works.

If you see zappos facet filter, there are some facet that allow us
filter using multiple value, for example size and brands. The
behaviour allow user to select multiple facet value without removing
other value in same facet filter. If you compare this behaviour with,
for example, solr sample /browse handler in solr distribution, it is
quite different, since it will only allow selection of single facet
value per facet filter.

is zappos multiple facet value can be achieved using only
configuration at solrconfig.xml? or it needs custom code while writing
solr client?

thanks and regards

-- 
http://ifnubima.org/indo-java-podcast/
http://project-template.googlecode.com/
@ifnubima

regards

Re: Duplicate documents based on attribute

2013-07-25 Thread Aditya

You need to store the color field as multi valued stored field. You have to
do pagination manually. If you worried, then use database. Have a table
with Product Name and Color. You could retrieve data with pagination.

Still if you want to achieve it via Solr. Have a separate record for every
product and color. ProductName, Color, RecordType. Since Solr is NoSQL, you
could have different fields and not all records should have all the fields.
You could store different type of document. Filter the record by its type.

Regards
Aditya
www.findbestopensource.com






On Thu, Jul 25, 2013 at 11:01 PM, Alexandre Rafalovitch
wrote:

> Look for the presentations online. You are not the first store to use Solr,
> there are some explanations around. Try one from Gilt, but I think there
> were more.
>
> You will want to store data at the lowest meaningful level of search
> granularity. So, in your case, it might be ProductVariation (shoes+color).
> Some examples I have seen, even store it down to availability level or
> price-difference level. Then, you do some post-search normalization either
> by doing groups or by doing filtering.
>
> Solr is not a database, store what you want to find.
>
> Regards,
>Alex.
>
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Thu, Jul 25, 2013 at 12:42 PM, Mark  wrote:
>
> > How would I go about doing something like this. Not sure if this is
> > something that can be accomplished on the index side or its something
> that
> > should be done in our application.
> >
> > Say we are an online store for shoes and we are selling Product A in red,
> > blue and green. Is there a way when we search for Product A all three
> > results can be returned even though they are logically the same item
> (same
> > product in our database).
> >
> > Thoughts on how this can be accomplished?
> >
> > Thanks
> >
> > - M
>

Re: SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

2013-07-25 Thread Tim Vaillancourt


Thanks Shawn and Yonik!

Yonik: I noticed this error appears to be fairly trivial, but it is not 
appearing after a previous crash. Every time I run this high-volume test 
that produced my stack trace, I zero out the logs, Solr data and 
Zookeeper data and start over from scratch with a brand new collection 
and zero'd out logs.


The test is mostly high volume (2000-4000 updates/sec) and at the start 
the SolrCloud runs decently for a good 20-60~ minutes, no errors in the 
logs at all. Then that stack trace occurs on all 3 nodes (staggered), I 
immediately get some replica down messages and then some "cannot 
connect" errors to all other cluster nodes, who have all crashed the 
same way. The tlog error could be a symptom of the problem of running 
out of threads perhaps.


Shawn: thanks so much for sharing those details! Yes, they seem to be 
nice servers, for sure - I don't get to touch/see them but they're fast! 
I'll look into firmwares for sure and will try again after updating 
them. These Solr instances are not-bare metal and are actually KVM VMs 
so that's another layer to look into, although it is consistent between 
the two clusters.


I am not currently increasing the 'nofiles' ulimit to above default like 
you are, but does Solr use 10,000+ file handles? It won't hurt to try it 
I guess :). To rule out Java 7, I'll probably also try Jetty 8 and Java 
1.6 as an experiment as well.


Thanks!

Tim

On 25/07/13 05:55 PM, Yonik Seeley wrote:

On Thu, Jul 25, 2013 at 7:44 PM, Tim Vaillancourt  wrote:

"ERROR [2013-07-25 19:34:24.264] [org.apache.solr.common.SolrException]
Failure to open existing log file (non fatal)


That itself isn't necessarily a problem (and why it says "non fatal")
- it just means that most likely the a transaction log file was
truncated from a previous crash.  It may be unrelated to the other
issues you are seeing.

-Yonik
http://lucidworks.com

Re: Solr 4.2.1 limit on number of rows or number of hits per shard?

2013-07-25 Thread Chris Hostetter


: Thanks for your help.   I found a workaround for this use case, which is to
: avoid using a shards query and just asking each shard for a dump of the

that would be (step#1 in) the method i would recomend for your usecase of 
"check whats in the entire index" because it drasitcally reduces the 
amount of work that needed in each query -- you're just tlaking to one 
node at a time, not doing multiplexing and mergeing of results from all 
the nodes.

: do any ranking or sorting.   What I am now seeing is that qtimes have gone
: up from about 5 seconds per request to nearly a minute as the start
: parameter gets higher.  I don't know if this is actually because of the
: start parameter or if something is happening with memory use and/or caching

it's because in order to give you results 3600-3700 it has to 
collect all the results are from 1-3700 in order to then pull out the 
last 100 (or to put it another way: the request for start=3600 
doesn't know what the 3600 it already gave you were, it has to figure 
it out again)

step #2 in hte method i would use to deal with your situation would be to 
not use "start" at all -- sort the docs on your uniqeuKey field, make rows 
as big as you are willing to handle in a single request, and then instead 
of incrementing "start" on each request add an fq on to each subsequent 
query after the first one where you filtered my results to docs with a 
uniqueKey field greater then the last one seen in my previous response.

this is similiar to what a lot of REST APIs seem to do (twitter comes to 
mind) to avoid the problem of dealing with deep paging efficiently or 
trying to keep track of "cursor" reservations on the the server side -- 
they just they don't offer either, and instead they let the client keep 
track of the the state (ie: "max_id") between requests.


-Hoss

Re: SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

2013-07-25 Thread Shawn Heisey

On 7/25/2013 6:53 PM, Tim Vaillancourt wrote:
> Thanks for the reply Shawn, I can always count on you :).
> 
> We are using 10GB heaps and have over 100GB of OS cache free to answer the
> JVM question, Young has about 50% of the heap, all CMS. Our max number of
> processes for the JVM user is 10k, which is where Solr dies when it blows
> up with 'cannot create native thread'.
> 
> I also want to say this is system related, but I am seeing this occur on
> all 3 servers, which are brand-new Dell R720s. I'm not saying this is
> impossible, but I don't see much to suggest that, and it would need to be
> one hell of a coincidence.

Nice hardware.  I have some R720xd servers for another project unrelated
to Solr, love them.

I know a little about Dell servers.  If you haven't done so already, I
would install the OpenManage repo and get the firmware fully updated -
BIOS, RAID, and LAN in particular.  Instructions that are pretty easy to
follow:

http://linux.dell.com/repo/hardware/latest/

For process/file limits, I have the following in
/etc/security/limits.conf on systems that aren't using Cloud:

ncindex hardnproc   6144
ncindex softnproc   4096

ncindex hardnofile  65535
ncindex softnofile  49151

> To add more confusion to the mix, we actually run a 2nd SolrCloud cluster
> on the same Solr, Jetty and JVM versions that do not exhibit this issue,
> although using a completely different schema, servers and access-patterns,
> although it is also at high-TPS. That is some evidence to say the current
> software stack is OK, or maybe this only occurs under an extreme load that
> 2nd cluster does not see, or lastly only with a certain schema.

This is a big reason why I think you should make sure you're fully up to
date on your firmware, as the hardware seems to be one strong
difference.  As much as I love Dell server hardware, firmware issues are
relatively common, especially on early versions of the latest
generation, which includes the R720.

> Lastly, to add a bit more detail to my original description, so far I have
> tried:
> 
> - Entirely rebuilding my cluster from scratch, reinstalling all deps,
> configs, reindexing the data (in case I screwed up somewhere). The EXACT
> same issue occurs under load about 20-45 minutes in.
> - Moving to Java 1.7.0_21 from _25 due to some known bugs. Same issue
> occurs after some load.
> - Restarting SolrCloud / forcing rebuilds or cores. Same issue occurs after
> some load.

The only other thing I can think of is increasing your zkClientTimeout
to 30 seconds or so and trying Solr 4.4 so you have SOLR-4899 and
SOLR-4805.  That's very definitely a shot in the dark.

Thanks,
Shawn

Re: problems about solr replication in 4.3

2013-07-25 Thread xiaoqi

thank u for replying very much .

in fact ,we make a process for this problem , we found when master building 
index, it will clean self index when building index . so slave every minute
to sync index, destroy self index folder.  

by the way : we building index using
dataimport0?command=full-import&clean=false
,dataimport1?command=full-import&clean=false, 
dataimport2?command=full-import&clean=false .

 when i using in solr3.6 has no problem ,never delete at first . 

does solr 4 need to special config anything ? 

thanks a lot .



--
View this message in context: 
http://lucene.472066.n3.nabble.com/problems-about-solr-replication-in-4-3-tp4079665p4080480.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

2013-07-25 Thread Yonik Seeley

On Thu, Jul 25, 2013 at 7:44 PM, Tim Vaillancourt  wrote:
> "ERROR [2013-07-25 19:34:24.264] [org.apache.solr.common.SolrException]
> Failure to open existing log file (non fatal)
>

That itself isn't necessarily a problem (and why it says "non fatal")
- it just means that most likely the a transaction log file was
truncated from a previous crash.  It may be unrelated to the other
issues you are seeing.

-Yonik
http://lucidworks.com

Re: SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

2013-07-25 Thread Tim Vaillancourt

Thanks for the reply Shawn, I can always count on you :).

We are using 10GB heaps and have over 100GB of OS cache free to answer the
JVM question, Young has about 50% of the heap, all CMS. Our max number of
processes for the JVM user is 10k, which is where Solr dies when it blows
up with 'cannot create native thread'.

I also want to say this is system related, but I am seeing this occur on
all 3 servers, which are brand-new Dell R720s. I'm not saying this is
impossible, but I don't see much to suggest that, and it would need to be
one hell of a coincidence.

To add more confusion to the mix, we actually run a 2nd SolrCloud cluster
on the same Solr, Jetty and JVM versions that do not exhibit this issue,
although using a completely different schema, servers and access-patterns,
although it is also at high-TPS. That is some evidence to say the current
software stack is OK, or maybe this only occurs under an extreme load that
2nd cluster does not see, or lastly only with a certain schema.

Lastly, to add a bit more detail to my original description, so far I have
tried:

- Entirely rebuilding my cluster from scratch, reinstalling all deps,
configs, reindexing the data (in case I screwed up somewhere). The EXACT
same issue occurs under load about 20-45 minutes in.
- Moving to Java 1.7.0_21 from _25 due to some known bugs. Same issue
occurs after some load.
- Restarting SolrCloud / forcing rebuilds or cores. Same issue occurs after
some load.

Cheers,

Tim


On 25 July 2013 17:13, Shawn Heisey  wrote:

> On 7/25/2013 5:44 PM, Tim Vaillancourt wrote:
>
>> The transaction log error I receive after about 10-30 minutes of load
>> testing is:
>>
>> "ERROR [2013-07-25 19:34:24.264] [org.apache.solr.common.**SolrException]
>> Failure to open existing log file (non fatal)
>> /opt/easw/easw_apps/easo_solr_**cloud/solr/xmshd_shard3_**
>> replica2/data/tlog/tlog.**078:org.**apache.solr.common.**
>> SolrException:
>> java.io.EOFException
>>
>
> 
>
>
>  Caused by: java.io.EOFException
>>  at
>> org.apache.solr.common.util.**FastInputStream.**readUnsignedByte(**
>> FastInputStream.java:73)
>>  at
>> org.apache.solr.common.util.**FastInputStream.readInt(**
>> FastInputStream.java:216)
>>  at
>> org.apache.solr.update.**TransactionLog.readHeader(**
>> TransactionLog.java:266)
>>  at
>> org.apache.solr.update.**TransactionLog.(**TransactionLog.java:160)
>>  ... 25 more
>> "
>>
>
> This looks to me like a system problem.  RHEL should be pretty solid, I
> use CentOS without any trouble.  My initial guesses are a corrupt
> filesystem, failing hardware, or possibly a kernel problem with your
> specific hardware.
>
> I'm running Jetty 8, which is the version that the example uses.  Could
> Jetty 9 be a problem here?  I couldn't really say, though my initial guess
> is that it's not a problem.
>
> I'm running Oracle Java 1.7.0_13.  Normally later releases are better, but
> Java bugs do exist and do get introduced in later releases.  Because you're
> on the absolute latest, I'm guessing that you had the problem with an
> earlier release and upgraded to see if it went away.  If that's what
> happened, it is less likely that it's Java.
>
> My first instinct would be to do a 'yum distro-sync' followed by 'touch
> /forcefsck' and reboot with console access to the server, so that you can
> deal with any fsck problems.  Perhaps you've already tried that. I'm aware
> that this could be very very hard to get pushed through strict change
> management procedures.
>
> I did some searching.  SOLR-4519 is a different problem, but it looks like
> it has a similar underlying exception, with no resolution.  It was filed
> When Solr 4.1.0 was current.
>
> Could there be a resource problem - heap too small, not enough OS disk
> cache, etc?
>
> Thanks,
> Shawn
>
>

Re: SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

2013-07-25 Thread Shawn Heisey

On 7/25/2013 5:44 PM, Tim Vaillancourt wrote:

The transaction log error I receive after about 10-30 minutes of load
testing is:

"ERROR [2013-07-25 19:34:24.264] [org.apache.solr.common.SolrException]
Failure to open existing log file (non fatal)
/opt/easw/easw_apps/easo_solr_cloud/solr/xmshd_shard3_replica2/data/tlog/tlog.078:org.apache.solr.common.SolrException:
java.io.EOFException

Caused by: java.io.EOFException
at
org.apache.solr.common.util.FastInputStream.readUnsignedByte(FastInputStream.java:73)
at
org.apache.solr.common.util.FastInputStream.readInt(FastInputStream.java:216)
at
org.apache.solr.update.TransactionLog.readHeader(TransactionLog.java:266)
at
org.apache.solr.update.TransactionLog.(TransactionLog.java:160)
... 25 more
"

This looks to me like a system problem. RHEL should be pretty solid, I
use CentOS without any trouble. My initial guesses are a corrupt
filesystem, failing hardware, or possibly a kernel problem with your
specific hardware.

I'm running Jetty 8, which is the version that the example uses. Could
Jetty 9 be a problem here? I couldn't really say, though my initial
guess is that it's not a problem.

I'm running Oracle Java 1.7.0_13. Normally later releases are better,
but Java bugs do exist and do get introduced in later releases. Because
you're on the absolute latest, I'm guessing that you had the problem
with an earlier release and upgraded to see if it went away. If that's
what happened, it is less likely that it's Java.

My first instinct would be to do a 'yum distro-sync' followed by 'touch
/forcefsck' and reboot with console access to the server, so that you
can deal with any fsck problems. Perhaps you've already tried that.
I'm aware that this could be very very hard to get pushed through strict
change management procedures.

I did some searching. SOLR-4519 is a different problem, but it looks
like it has a similar underlying exception, with no resolution. It was
filed When Solr 4.1.0 was current.

83 matches

Mail list logo