GitHub user lukovnikov opened a pull request: https://github.com/apache/spark/pull/4650
RDF Loader added + documentation Have been testing it with DBpedia dumps, works well so far. Any help with custom partitioning and optimization is welcome. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lukovnikov/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4650.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4650 ---- commit 10436d252ad4876d28c91c77036e3d993050438a Author: lukovnikov <lukovnikov@denis> Date: 2015-02-03T19:41:58Z fast forward from upstream commit 595aed098fb423514b73263f96dfcaf1edbc72f5 Author: lukovnikov <lukovnikov@denis> Date: 2015-02-03T21:41:00Z dictionary builder done commit c2399023825e804476527f7e159b182a1b5c91c8 Author: lukovnikov <lukovnikov@denis> Date: 2015-02-03T21:44:07Z [SPARK 5280] commit f14e4835cf365fcbe5dd0979e61464b7cecb8774 Author: lukovnikov <lukovnikov@denis> Date: 2015-02-03T22:50:06Z done dictionary version commit 43cc53ab6d99a4a96a0764cc306f38fdce3a7e00 Author: lukovnikov <lukovnikov@denis> Date: 2015-02-03T23:25:07Z [SPARK 5280] rdfloader using hashes as VertexIds commit 2e1220d0938aee7d190439253e3b9bb1e73c77e8 Author: lukovnikov <lukovnikov@denis> Date: 2015-02-04T00:04:48Z cleaned up + fixed style TODO: test + comment commit 54e2c6eb24dade70753320a3ab2b3a64fef7a6d4 Author: lukovnikov <lukovnikov@denis> Date: 2015-02-04T00:26:30Z made custom 64bit hash commit b454560508c9d50c60e067d7e67405ca1e13c165 Author: lukovnikov <lukovnikov@denis> Date: 2015-02-04T00:32:57Z proper commit 45a9f57695e76c09c20fa99a1010168f63ef1da8 Author: lukovnikov <lukovnikov@denis> Date: 2015-02-03T19:41:58Z fast forward from upstream commit 6ee9a2b675d06675b5b591f16e8d52e63d2dc049 Author: lukovnikov <lukovnikov@denis> Date: 2015-02-03T21:41:00Z dictionary builder done commit 45c22160c52111066109f57a0d773aca211c2068 Author: lukovnikov <lukovnikov@denis> Date: 2015-02-03T21:44:07Z [SPARK 5280] commit fa5c0da9ea4f6ca662406b380432901022d6de55 Author: lukovnikov <lukovnikov@denis> Date: 2015-02-03T22:50:06Z done dictionary version commit c036f98476e96ac03124f758ed7f17c4a464cf86 Author: lukovnikov <lukovnikov@denis> Date: 2015-02-03T23:25:07Z [SPARK 5280] rdfloader using hashes as VertexIds commit 57553797f7404e686674b0bfb39d80bb24d6520c Author: lukovnikov <lukovnikov@denis> Date: 2015-02-04T00:04:48Z cleaned up + fixed style TODO: test + comment commit e00123eae4a84108af2c84cf253b1f4fb1fb69f1 Author: lukovnikov <lukovnikov@denis> Date: 2015-02-04T00:26:30Z made custom 64bit hash commit 6af9a7ad6198174597ae7d86ec5c15fc8467a082 Author: lukovnikov <lukovnikov@denis> Date: 2015-02-04T00:32:57Z proper commit 1ee34c9474bcf4500edecb08a848d15f3549055d Author: lukovnikov <lukovnikov@denis> Date: 2015-02-04T03:31:05Z Merge branch 'master' of github.com:lukovnikov/spark into rdfloaderhash commit 9000a4713d286d5078c16f62b5fadf480941bc82 Author: lukovnikov <lukovnikov@denis> Date: 2015-02-04T03:31:18Z Merge branch 'rdfloaderhash' of github.com:lukovnikov/spark into rdfloaderhash commit 70eb725a102ae711a59c6d45794d191c18778c4b Author: lukovnikov <lukovnikov@denis> Date: 2015-02-04T23:02:48Z RDF Loader with hash, tested on small RDF dumps (more tests in progress) commit 4398d93712777442ba0f2e8920423fcdd7b67d1f Author: Denis <lukovni...@users.noreply.github.com> Date: 2015-02-04T23:27:01Z added documentation for RDFLoader commit 273a1b30dee1630333e0f7e683378b6dbb13c3a5 Author: Denis <lukovni...@users.noreply.github.com> Date: 2015-02-04T23:29:05Z small update to RDFLoader description commit 202ccf86901c3d2435564e544f90d6a49cda66fb Author: lukovnikov <lukovnikov@denis> Date: 2015-02-04T23:31:10Z sdf commit 2d990cec1d48f62f4f1d9f9cf8082308a4eaf9e4 Author: lukovnikov <lukovnikov@denis> Date: 2015-02-03T19:41:58Z fast forward from upstream commit 4a9b6222176749bee4a14e4b6d035b665c6ac7ea Author: lukovnikov <lukovnikov@denis> Date: 2015-02-04T23:43:31Z Merge branch 'master' of github.com:lukovnikov/spark commit 062996c45d0443836c1b4b2bb714d8f459ea6980 Author: lukovnikov <lukovnikov@denis> Date: 2015-02-04T23:43:52Z Merge branch 'rdfloaderhash' commit 121bf14140573349424e7888da13ee2e8ea4f6f0 Author: lukovnikov <lukovnikov@denis> Date: 2015-02-04T23:45:48Z [SPARK 5280] commit 67ada514b98292ff647d8354545d37cc111499ba Author: lukovnikov <lukovnikov@denis> Date: 2015-02-04T23:47:21Z Merge branch 'rdfloaderhash' of github.com:lukovnikov/spark into rdfloaderhash commit e5fcf758c0e4b54a38b2a01709681e11bbb6eae8 Author: lukovnikov <lukovnikov@denis> Date: 2015-02-04T23:47:45Z Merge branch 'rdfloaderhash' commit c5960af7b14d65b1d290c3af11d722075a54ad2d Author: lukovnikov <lukovnikov@denis> Date: 2015-02-04T23:54:37Z Merge remote-tracking branch 'upstream/master' commit 91361f3f760dbc78467f8e2b87a1d77061aa59de Author: lukovnikov <lukovnikov@denis> Date: 2015-02-05T00:01:33Z undone unnecessary changes ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org