[jira] [Commented] (ASTERIXDB-1704) Fuzzy-join query is slow

Wenhai (JIRA) Sun, 23 Oct 2016 23:57:07 -0700

    [ 
https://issues.apache.org/jira/browse/ASTERIXDB-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15601164#comment-15601164
 ]


Wenhai commented on ASTERIXDB-1704:
-----------------------------------

Ok, then I guess your slow down is not a real "SLOW DOWN", maybe another reason 
is the parallelism derived from your computation threads is not strong enough. 
If you decrease the threshold, I think increasing the computing nodes will 
almost linearly reduce the running time.

> Fuzzy-join query is slow
> ------------------------
>
>                 Key: ASTERIXDB-1704
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1704
>             Project: Apache AsterixDB
>          Issue Type: Bug
>            Reporter: Taewoo Kim
>
> I have an issue regarding the prefix-based fuzzy join (non-index based fuzzy 
> join) on a small dataset. The following query runs forever even for a dataset 
> with 200K records on 9 nodes. So, each node only has 20,000 records. Also, 
> the record size is not that big. 
> {code}
> count(
> for $o in dataset AmazonReview
> for $i in dataset AmazonReview
> where similarity-jaccard(word-tokens($o.reviewText), 
> word-tokens($i.reviewText)) >= 0.2 and $o.id < $i.id
> return {"oid":$o.reviewrID, "iid":$i.reviewID}
> );
> {code}
> An example record is as follows.  
> {code}
> {
>   "reviewerID": "A2SUAM1J3GNN3B",
>   "asin": "0000013714",
>   "reviewerName": "J. McDonald",
>   "helpful": [2, 3],
>   "reviewText": "I bought this for my husband who plays the piano.  He is 
> having a wonderful time playing these old hymns.  The music  is at times hard 
> to read because we think the book was published for singing from more than 
> playing from.  Great purchase though!",
>   "overall": 5.0,
>   "summary": "Heavenly Highway Hymns",
>   "unixReviewTime": 1252800000,
>   "reviewTime": "09 13, 2009"
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ASTERIXDB-1704) Fuzzy-join query is slow

Reply via email to