[jira] [Commented] (MAHOUT-1853) Improvements to CCO (Correlated Cross-Occurrence)

ASF GitHub Bot (JIRA) Mon, 22 Aug 2016 09:17:06 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431100#comment-15431100
 ]


ASF GitHub Bot commented on MAHOUT-1853:
----------------------------------------

Github user dlyubimov commented on a diff in the pull request:

    https://github.com/apache/mahout/pull/251#discussion_r75708308
  
    --- Diff: 
spark/src/test/scala/org/apache/mahout/cf/SimilarityAnalysisSuite.scala ---
    @@ -191,14 +193,115 @@ class SimilarityAnalysisSuite extends FunSuite with 
MahoutSuite with Distributed
     
         //cross similarity
         val matrixCrossCooc = drmCooc(1).checkpoint().collect
    -    val diff2Matrix = matrixCrossCooc.minus(matrixLLRCoocBtANonSymmetric)
    +    val diff2Matrix = matrixCrossCooc.minus(matrixLLRCoocAtBNonSymmetric)
         n = (new MatrixOps(m = diff2Matrix)).norm
     
         //cooccurrence without LLR is just a A'B
         //val inCoreAtB = a.transpose().times(b)
         //val bp = 0
       }
     
    +  test("Cross-occurrence two IndexedDatasets"){
    +    val a = dense(
    +      (1, 1, 0, 0, 0),
    +      (0, 0, 1, 1, 0),
    +      (0, 0, 0, 0, 1),
    +      (1, 0, 0, 1, 0))
    +
    +    val b = dense(
    +      (0, 1, 1, 0),
    +      (1, 1, 1, 0),
    +      (0, 0, 1, 0),
    +      (1, 1, 0, 1))
    +
    +    val users = Seq("u1", "u2", "u3", "u4")
    +    val itemsA = Seq("a1", "a2", "a3", "a4", "a5")
    +    val itemsB = Seq("b1", "b2", "b3", "b4")
    +    val userDict = new BiDictionary(users)
    +    val itemsADict = new BiDictionary(itemsA)
    +    val itemsBDict = new BiDictionary(itemsB)
    +
    +    // this is downsampled to the top 2 values per row to match the calc
    +    val matrixLLRCoocAtBNonSymmetric = dense(
    +      (0.0,                1.7260924347106847, 1.7260924347106847, 0.0),
    --- End diff --
    
    this is our accepted convention, good


> Improvements to CCO (Correlated Cross-Occurrence)
> -------------------------------------------------
>
>                 Key: MAHOUT-1853
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1853
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.12.0
>            Reporter: Andrew Palumbo
>            Assignee: Pat Ferrel
>             Fix For: 0.13.0
>
>
> Improvements to CCO (Correlated Cross-Occurrence) to include auto-threshold 
> calculation for LLR downsampling, and possible multiple fixed thresholds for 
> A’A, A’B etc. This is to account for the vast difference in dimensionality 
> between indicator types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAHOUT-1853) Improvements to CCO (Correlated Cross-Occurrence)

Reply via email to