[ https://issues.apache.org/jira/browse/SPARK-11084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joseph K. Bradley updated SPARK-11084: -------------------------------------- Shepherd: Joseph K. Bradley Assignee: Maciej Szymkiewicz Target Version/s: 1.6.0 > SparseVector.__getitem__ should check if value can be non-zero before > executing searchsorted > -------------------------------------------------------------------------------------------- > > Key: SPARK-11084 > URL: https://issues.apache.org/jira/browse/SPARK-11084 > Project: Spark > Issue Type: Improvement > Components: MLlib, PySpark > Affects Versions: 1.3.0, 1.4.0, 1.5.0, 1.6.0 > Reporter: Maciej Szymkiewicz > Assignee: Maciej Szymkiewicz > Priority: Minor > > At this moment SparseVector.\_\_getitem\_\_ executes np.searchsorted first > and checks if result is in an expected range after that: > {code} > insert_index = np.searchsorted(inds, index) > if insert_index >= inds.size: > return 0. > row_ind = inds[insert_index] > ... > {code} > See: https://issues.apache.org/jira/browse/SPARK-10973 > It is possible to check if index can contain non-zero value before binary > search: > {code} > if (inds.size == 0) or (index > inds.item(-1)): > return 0. > insert_index = np.searchsorted(inds, index) > row_ind = inds[insert_index] > ... > {code} > It is not a huge improvement but should save some work on large vectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org