D2057: rust implementation of hg status

2018-03-21 Thread Ivzhh (Sheng Mao)
Ivzhh marked 4 inline comments as done.
Ivzhh added a comment.


  In https://phab.mercurial-scm.org/D2057#46726, @yuja wrote:
  
  > >> I think the only place where you would need to do os-specific code is 
when
  > >>  doing serialization and serialization
  > > 
  > > Yes, that will be feasible in strictly typed language like Rust.
  >
  > To be clear, I meant serialization/deserialization between filesystem path 
and
  >  internal dirstate/manifest path, not between dirstate storage and in-memory
  >  dirstate object.
  
  
  I guess your suggestion is like this: @yuja
  
  1. if it is windows and the code page is MBCS, try to decode the paths read 
from manifest and dirstate into unicode equivalent
  2. use utf internally and with rust IO api
  3. when writing back to dirstate and manifest, encode utf to MBCS
  
  Please let me if I have misunderstanding. Thank you!

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers, kevincox
Cc: quark, yuja, glandium, krbullock, indygreg, durin42, kevincox, 
mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-21 Thread Ivzhh (Sheng Mao)
Ivzhh marked 45 inline comments as done.
Ivzhh added inline comments.

INLINE COMMENTS

> kevincox wrote in base85.rs:23
> `&str` can only hold valid utf8 data? Does it make more sense to have `&[u8]` 
> here for a list of bytes?

It should be any &[u8], but the current cpython crate doesn't wrap for &[u8]. I 
think I need to fork and add that part. I put it in my checklist now.

> kevincox wrote in base85.rs:23
> Would it be possible to separate the decode from the python objects. I'm 
> thinking two helper functions.
> 
>   fn b85_required_len(text: &str) -> usize
>   fn b85_encode(text: &str, pad: i32, out: &mut [u8]) -> Result<()>

This crate is my previous try to integrate rust into hg. Right now I guess mine 
main pursue is to add hg r-* commands for rust. I will follow your suggestion 
when I am implementing the wire protocol and reuse the code for pure rust crate.

> kevincox wrote in base85.rs:23
> IIUC pad is only ever checked `== 0`. Can it be made into a bool?

pad is a bool, however when I checked it in hg-python, int are passed to the 
function. I guess I need to update cpython wrapper for this, a more broad logic 
conversion.

> kevincox wrote in base85.rs:46
> Why the braces here?

I guess it is because NLL. When I started the work, rust compiler reported 
borrow check error on this part. I later read an article talking about NLL 
update in rust. But before that, I use the braces to avoid the error.

> kevincox wrote in base85.rs:91
> This is probably worth a comment that this is safe because D85DEC is required 
> to be initialized before this function is called.

when I removed the unsafe, I got error: error[E0133]: use of mutable static 
requires unsafe function or block

> indygreg wrote in main.rs:233-261
> This is definitely nifty and an impressive achievement \o/
> 
> The `r-` commands for testing pure Rust code paths are an interesting idea!
> 
> I think I'm OK with including support for this in `hgcli`. But I think the 
> code should live in a separate file so it doesn't pollute `main()`. And it 
> should be behind a Cargo feature flag so we maintain compatibility with `hg` 
> as much as possible by default.
> 
> Also, Mercurial's command line parser is extremely wonky and has some 
> questionable behavior. If the intent is to make `rhg` compatible with `hg`, 
> we would need to preserve this horrible behavior. We'll likely have to write 
> a custom argument parser because of how quirky Mercurial's argument parser is 
> :(

Thank you for the suggestion! I guess I need to extend clap later to support hg 
style command line. Right now whenever clap cannot handle the argument parsing, 
I will redirect the arguments to hg directly.

> kevincox wrote in changelog.rs:31
> If you aren't using the value I would prefer `truncate(NodeId::hex_len())`

I guess I will use the rest info later. hg seems put some meta data in the 
commit comments. I will keep it for now. Thank you!

> kevincox wrote in config.rs:78
> If you are just going to convert to String I would recommend taking a String 
> argument.
> 
> Also prefer `.to_owned()` over `.to_string()`.

I like to_owned(), I will them in later occasions. Thank you!

> indygreg wrote in config.rs:95
> I would not worry about supporting v0 or v2 at this time. v0 is only 
> important for backwards compatibility with ancient repos. And v2 never got 
> off the ground.

Sure, I will use v1 only for now. In the beginning I kinda over designed this 
part.

> kevincox wrote in dirstate.rs:48
> This could have a better name.

I remember the python hg uses the name, in the beginning, I tried to replicate 
py-hg's behaviour. But I think it needs to be renamed. I agree with you.

> kevincox wrote in dirstate.rs:108
> 1. Does this function need to be public? It seems internal to the constructor.
> 2. If it doesn't need to be I would prefer it return the Map so that you 
> don't have a partial-constructed DirState.

I think dir state needs to 1. read existing one; 2. create one if not exits; 
maybe private for now.

> kevincox wrote in dirstate.rs:152
> I would prefer doing the filter before the loop and storing it in a variable.

For the filter, I follow the example in the walkdir doc. I guess what I want is 
to skip the dir for later recursive visiting.

> kevincox wrote in dirstate.rs:161
> Please explain why you are ignoring the error condition.

I add the error handling back

> kevincox wrote in dirstate.rs:170
> You could do the following for a slight performance win and save a line.
> 
>   if let Occupied(entry) = self.dmap.entry(relpath) {
>  ...
>   }

I kind of get borrow check compile error here. Later I use Occupied() when 
possible.

> kevincox wrote in local_repo.rs:136
> Why does it need to be mutable to clone?

I think LRU will update reference count (or timestamp) when the data is 
accessed.

> kevincox wrote in matcher.rs:14
> Can you manage a `&[u8]` rather then pointer arithmetic for the whole string. 
> It will make me feel better and 

D2057: rust implementation of hg status

2018-03-21 Thread Ivzhh (Sheng Mao)
Ivzhh updated this revision to Diff 7188.
Ivzhh added a comment.


  - add revlog and mpatch facilities
  - add changelog parsing
  - add manifest parsing
  - path encoding for data store
  - add dirstate and matcher facilities
  - add local repository and the supporting modules
  - use cargo fmt to format code
  - add hg r-status command
  - bincode 1.0.0 is a bit slow in my test
  - delay pattern matching during dir walk
  - optimize out trie and enable CoreXL profiling
  - use hashmap
  - remove thread pool
  - rust default read is not buf-ed, this is the key of slowness
  - change to globset
  - convert glob to regex
  - hg ignore patterns are all converted to regex (as hg does), and now it is 
faster
  - filter dir early to prevent walking
  - Update matcher mod after testing Mozilla unified repo
  - bug fix: use byte literals instead of numbers
  - hg store path encoding is per byte style, update code according to Kevin 
Cox's comments
  - update matcher testing according to Match interface change
  - If clap fails to recognize r-* subcommands, then run python-version hg
  - changelog coding style revised
  - remove legacy revlog v0 and unfinished v2.
  - partially revise the dirstate reviews
  - remove duplicated build.rs, let the executable module guarantee the python
  - use cursor in base85 encoding, reducing raw index-math
  - use cursor in base85 decoding, reducing raw index-math
  - dirstate update according to review comments
  - config update according to review comments
  - mpatch rename to more meaningful names
  - simplify matcher as when there is no syntax named in the beginning, use 
regexp
  - local repo coding style update
  - dirstate coding style update
  - manifest coding style update

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D2057?vs=6724&id=7188

BRANCH
  rust-hg-optimize (bookmark) on default (branch)

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

AFFECTED FILES
  rust/Cargo.lock
  rust/Cargo.toml
  rust/hgbase85/Cargo.toml
  rust/hgbase85/src/base85.rs
  rust/hgbase85/src/cpython_ext.rs
  rust/hgbase85/src/lib.rs
  rust/hgcli/Cargo.toml
  rust/hgcli/build.rs
  rust/hgcli/src/main.rs
  rust/hgstorage/Cargo.toml
  rust/hgstorage/src/changelog.rs
  rust/hgstorage/src/config.rs
  rust/hgstorage/src/dirstate.rs
  rust/hgstorage/src/lib.rs
  rust/hgstorage/src/local_repo.rs
  rust/hgstorage/src/manifest.rs
  rust/hgstorage/src/matcher.rs
  rust/hgstorage/src/mpatch.rs
  rust/hgstorage/src/path_encoding.rs
  rust/hgstorage/src/repository.rs
  rust/hgstorage/src/revlog.rs
  rust/hgstorage/src/revlog_v1.rs
  rust/hgstorage/src/working_context.rs

CHANGE DETAILS

diff --git a/rust/hgstorage/src/working_context.rs 
b/rust/hgstorage/src/working_context.rs
new file mode 100644
--- /dev/null
+++ b/rust/hgstorage/src/working_context.rs
@@ -0,0 +1,114 @@
+use std::path::PathBuf;
+use std::io::prelude::*;
+use std::fs;
+use std::collections::HashMap;
+use std::collections::HashSet as Set;
+use std::sync::{Arc, Mutex, RwLock};
+
+use threadpool::ThreadPool;
+use num_cpus;
+
+use dirstate::{CurrentState, DirState};
+use local_repo::LocalRepo;
+use manifest::{FlatManifest, ManifestEntry};
+use changelog::ChangeLog;
+
+pub struct WorkCtx {
+pub dirstate: Arc>,
+pub file_revs: HashMap,
+}
+
+impl WorkCtx {
+pub fn new(
+dot_hg_path: Arc,
+manifest: Arc,
+changelog: Arc,
+) -> Self {
+let dirstate = match DirState::new(dot_hg_path.join("dirstate")) {
+Some(dir_state) => dir_state,
+None => {
+unimplemented!("creating dirstate is not supported yet.");
+}
+};
+
+let manifest_id = changelog.get_commit_info(&dirstate.p1);
+
+let rev = manifest
+.inner
+.read()
+.unwrap()
+.node_id_to_rev(&manifest_id.manifest_id)
+.unwrap();
+
+let file_revs = manifest.build_file_rev_mapping(&rev);
+
+let dirstate = Arc::new(RwLock::new(dirstate));
+
+Self {
+dirstate,
+file_revs,
+}
+}
+
+pub fn status(&self, repo: &LocalRepo) -> CurrentState {
+let mut state = self.dirstate
+.write()
+.unwrap()
+.walk_dir(repo.repo_root.as_path(), &repo.matcher)
+.unwrap();
+
+if !state.lookup.is_empty() {
+let ncpus = num_cpus::get();
+
+let nworkers = if state.lookup.len() < ncpus {
+state.lookup.len()
+} else {
+ncpus
+};
+
+let pool = ThreadPool::new(nworkers);
+
+let clean = Arc::new(Mutex::new(Set::new()));
+let modified = Arc::new(Mutex::new(Set::new()));
+
+for f in state.lookup.drain() {
+let rl = repo.get_filelog(f.as_path());
+let fl = Arc::new(repo.repo_root.join(f.as_path()));
+
+let (id, 

D2057: rust implementation of hg status

2018-03-09 Thread Ivzhh (Sheng Mao)
Ivzhh added a comment.


  In https://phab.mercurial-scm.org/D2057#44269, @yuja wrote:
  
  > >> Reading that page it seems to claim that filenames should be utf8, not 
bytes. If utf8, this is what the code does, but if it is bytes that definitely 
won't work.
  > > 
  > > IIRC it's bytes everyplace except Windows, where we pretend utf8 is real?
  >
  > It's MBCS (i.e. ANSI multi-byte characters) on Windows. The plain was to 
support
  >  both MBCS and UTF-8-variant on Windows, but that isn't a thing yet.
  >
  > Perhaps we'll have to write a platform compatibility layer (or 
serialization/deserialization
  >  layer) on top of the Rust's file API, something like vfs.py we have in 
Python code.
  
  
  Thank you for confirming that, I am a bit confusing when I read Encoding Plan 
wiki page. I am looking at Mozilla's rust winapi bindings, let me see if I can 
directly wrap around winapi::um::fileapi::FindFirstFileA 


REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers, kevincox
Cc: yuja, glandium, krbullock, indygreg, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-08 Thread Ivzhh (Sheng Mao)
Ivzhh added a comment.


  Hi everyone,
  
  Thank you for your encouragements and comments! I will follow up with all 
comments and update the code soon.
  
  @indygreg It is a great idea to test on Mozilla repo, actually I found 
several things interesting:
  
  1. I found a bug in my code (shame on me): because I did not use byte 
literal, and I made a typo. This triggers problem in Mozilla unified repo
  2. A regexp pattern in hgignore in Mozilla unified repo is not supported by 
rust's regex crate, a.k.a. "(?!)". I choose to ignore these unsupported 
patterns.
  3. My version is slower in this repo: 70s (hg) and 90s (mine). CodeXL reveals 
that the mpatch::collect() function uses 63% of the running time. I think I 
need to optimize it somehow.
  
  I totally agree with @kevincox that I did not sort well on 
char/u8/str/String/Path/PathBuf. The first bug is caused by this. I need to 
improve them.
  
  Thank you everyone!

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers, kevincox
Cc: glandium, krbullock, indygreg, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-07 Thread Ivzhh (Sheng Mao)
Ivzhh added a comment.


  Hi all,
  
  Based on the discussion a few weeks ago, I come up with a solution for review 
and discussion. After reading the Oxidation plan, the first thought is to 
bypass python engine and current plugin system IFF on request. If user (maybe 
background checker of IDE) request r-* subcommands, then hg gives rust 
implementations instead of python's. So I try to make hg r-status as fast as 
possible. The submitted version has comparable performance (as an example of 
the performance, not evidence, on my MacBook, in hg's own repo, hg r-status 
150ms, and hg status 220ms). I am using CodeXL to profile the performance, and 
plan to use Future.rs to make the loading parallel and maybe 30ms faster.
  
  The implementation originates from hg python implementation, because the 
python version is really fast. I tried to split into small changes, however, I 
eventually to combine all hgstorage module as one commit.
  
  Thank you for your comments!

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers
Cc: krbullock, indygreg, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: translate base85.c into rust code

2018-03-07 Thread Ivzhh (Sheng Mao)
Ivzhh updated this revision to Diff 6724.
Ivzhh added a comment.


  - merge with stable
  - translate base85.c into rust code
  - move hgbase85 into independent module
  - add hgstorage crate
  - hg status implementation in rust

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D2057?vs=5238&id=6724

BRANCH
  phab-submit-D2057-2018-02-05 (bookmark) on default (branch)

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

AFFECTED FILES
  rust/Cargo.lock
  rust/Cargo.toml
  rust/hgbase85/Cargo.toml
  rust/hgbase85/build.rs
  rust/hgbase85/src/base85.rs
  rust/hgbase85/src/cpython_ext.rs
  rust/hgbase85/src/lib.rs
  rust/hgcli/Cargo.toml
  rust/hgcli/build.rs
  rust/hgcli/src/main.rs
  rust/hgstorage/Cargo.toml
  rust/hgstorage/src/changelog.rs
  rust/hgstorage/src/config.rs
  rust/hgstorage/src/dirstate.rs
  rust/hgstorage/src/lib.rs
  rust/hgstorage/src/local_repo.rs
  rust/hgstorage/src/manifest.rs
  rust/hgstorage/src/matcher.rs
  rust/hgstorage/src/mpatch.rs
  rust/hgstorage/src/path_encoding.rs
  rust/hgstorage/src/repository.rs
  rust/hgstorage/src/revlog.rs
  rust/hgstorage/src/revlog_v1.rs
  rust/hgstorage/src/working_context.rs

CHANGE DETAILS

diff --git a/rust/hgstorage/src/working_context.rs 
b/rust/hgstorage/src/working_context.rs
new file mode 100644
--- /dev/null
+++ b/rust/hgstorage/src/working_context.rs
@@ -0,0 +1,108 @@
+use std::path::PathBuf;
+use std::io::prelude::*;
+use std::fs;
+use std::collections::HashMap;
+use std::collections::HashSet as Set;
+use std::sync::{Arc, Mutex, RwLock};
+
+use threadpool::ThreadPool;
+use num_cpus;
+
+use dirstate::{CurrentState, DirState};
+use local_repo::LocalRepo;
+use manifest::{FlatManifest, ManifestEntry};
+use changelog::ChangeLog;
+
+pub struct WorkCtx {
+pub dirstate: Arc>,
+pub file_revs: HashMap,
+}
+
+impl WorkCtx {
+pub fn new(
+dot_hg_path: Arc,
+manifest: Arc,
+changelog: Arc,
+) -> Self {
+let dirstate = DirState::new(dot_hg_path.join("dirstate"));
+
+let manifest_id = changelog.get_commit_info(&dirstate.p1);
+
+let rev = manifest
+.inner
+.read()
+.unwrap()
+.node_id_to_rev(&manifest_id.manifest_id)
+.unwrap();
+
+let file_revs = manifest.build_file_rev_mapping(&rev);
+
+let dirstate = Arc::new(RwLock::new(dirstate));
+
+Self {
+dirstate,
+file_revs,
+}
+}
+
+pub fn status(&self, repo: &LocalRepo) -> CurrentState {
+let mut state = self.dirstate
+.write()
+.unwrap()
+.walk_dir(repo.repo_root.as_path(), &repo.matcher);
+
+if !state.lookup.is_empty() {
+let ncpus = num_cpus::get();
+
+let nworkers = if state.lookup.len() < ncpus {
+state.lookup.len()
+} else {
+ncpus
+};
+
+let pool = ThreadPool::new(nworkers);
+
+let clean = Arc::new(Mutex::new(Set::new()));
+let modified = Arc::new(Mutex::new(Set::new()));
+
+for f in state.lookup.drain() {
+let rl = repo.get_filelog(f.as_path());
+let fl = Arc::new(repo.repo_root.join(f.as_path()));
+
+let (id, p1, p2) = {
+let id = &self.file_revs[f.as_path()].id;
+let gd = rl.read().unwrap();
+let rev = gd.node_id_to_rev(id).unwrap();
+
+let p1 = gd.p1_nodeid(&rev);
+let p2 = gd.p2_nodeid(&rev);
+(id.clone(), p1, p2)
+};
+
+let clean = clean.clone();
+let modified = modified.clone();
+
+pool.execute(move || {
+let mut wfile = fs::File::open(fl.as_path()).unwrap();
+let mut content = Vecnew();
+wfile.read_to_end(&mut content).unwrap();
+if rl.read().unwrap().check_hash(&content, &p1, &p2) == id 
{
+clean.lock().unwrap().insert(f);
+} else {
+modified.lock().unwrap().insert(f);
+}
+});
+}
+
+pool.join();
+assert_eq!(pool.panic_count(), 0);
+
+let mut gd = modified.lock().unwrap();
+state.modified.extend(gd.drain());
+let mut gd = clean.lock().unwrap();
+state.clean.extend(gd.drain());
+}
+
+return state;
+}
+}
diff --git a/rust/hgstorage/src/revlog_v1.rs b/rust/hgstorage/src/revlog_v1.rs
new file mode 100644
--- /dev/null
+++ b/rust/hgstorage/src/revlog_v1.rs
@@ -0,0 +1,422 @@
+use std::path::{Path, PathBuf};
+use std::io;
+use std::io::{BufReader, Read, Seek, SeekFrom};
+use std::fs;
+use std::cell::RefCell;
+use std::sync::{Arc, RwLock};
+use std::collections::HashMap as Map;
+
+us

D2057: translate base85.c into rust code

2018-02-07 Thread Ivzhh (Sheng Mao)
Ivzhh added a comment.


  Thank you @indygreg!
  
  The OxidationPlan is my best reference when I started to make a move, and 
this thread is even more helpful. I am really interested in exploring this ;-) 
In 2014 I was trying to change the hg backend storage to Postgres, a silly and 
failed experiment.
  
  Anyway, I will save everyone's time and stop talking. I will come back later 
with a more meaningful implementation.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers
Cc: krbullock, indygreg, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: translate base85.c into rust code

2018-02-07 Thread Ivzhh (Sheng Mao)
Ivzhh added a comment.


  As the author of this patch, actually I have the same concern. I started to 
translate base85 as baby steps to find a way of integrating rust and cpython, 
on my side, Today I modify setup.py, policy.py and makefile to run hg's test 
suit with the new base85. For myself, it is only proof of concept.
  
  Maybe I should take another way: translate more python modules into 
CFFI-style, and let CFFI call rust implementation. And gradually change more 
implementations of python modules with corresponding cffi-style, while keep the 
python interface the same. My own hope is the rust routines will be able to 
call each other and eventually run some __basic__ tasks without calling python 
part. And the rust still lazily provides info to python interface for 
extensions etc.
  
  I am exploring this way now, and hope the findings will be useful for 
community to make decision.
  
  Thank you all for the comments!

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers
Cc: krbullock, indygreg, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: translate base85.c into rust code

2018-02-06 Thread Ivzhh (Sheng Mao)
Ivzhh added a comment.


  Thank you @indygreg for your detailed explanation!
  
  I understand the process now, and I will go back reading the developer's 
guide thoroughly again. I will try my best to provide a relatively clean stack 
of patches.
  
  Thank you for you time!

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers
Cc: indygreg, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: translate base85.c into rust code

2018-02-06 Thread Ivzhh (Sheng Mao)
Ivzhh added a comment.


  Sure, thank you for the comments! I can definitely prepare makefile and 
setup.py to make the building process work with rust part. I am planning to 
change the policy.py module to support and try to load rust modules and run all 
the tests. I will submit a new patch after finishing these two tasks.
  
  After reading wiki/OxidationPlan again, I plan to change to cffi for better 
compatibility (pypy and others), and try to build algorithms in pure rust. 
Shall I wait till migrating to cffi based solution now and resubmit this patch 
with all three changes (building, testing, and cffi)?
  
  Thank you!

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers
Cc: indygreg, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: translate base85.c into rust code

2018-02-06 Thread Ivzhh (Sheng Mao)
Ivzhh added a comment.


  I am open to the three-crates plan. Oirginally I have hgcli and hgext 
separately, and I was planning to replace CFFI. I am a pypy user too, so I will 
be willing to provide a python C API free crate for pypy and others.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers
Cc: indygreg, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: translate base85.c into rust code

2018-02-05 Thread Ivzhh (Sheng Mao)
Ivzhh created this revision.
Herald added subscribers: mercurial-devel, kevincox, durin42.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  - python extension to encode/decode base85
  - add test suits to call encode/decode base85 in rust-/python- convention
  - add proper python environmental setup for developer with multiple python
  
  environment (e.g. conda 2/3 for data processing etc.). Environmental version 
is
  more controllable.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

AFFECTED FILES
  rust/hgcli/src/hgext/base85.rs
  rust/hgcli/src/hgext/cpython_ext.rs
  rust/hgcli/src/hgext/mod.rs
  rust/hgcli/src/main.rs

CHANGE DETAILS

diff --git a/rust/hgcli/src/main.rs b/rust/hgcli/src/main.rs
--- a/rust/hgcli/src/main.rs
+++ b/rust/hgcli/src/main.rs
@@ -6,9 +6,11 @@
 // GNU General Public License version 2 or any later version.
 
 extern crate libc;
-extern crate cpython;
+#[macro_use] extern crate cpython;
 extern crate python27_sys;
 
+pub mod hgext;
+
 use cpython::{NoArgs, ObjectProtocol, PyModule, PyResult, Python};
 use libc::{c_char, c_int};
 
diff --git a/rust/hgcli/src/hgext/mod.rs b/rust/hgcli/src/hgext/mod.rs
new file mode 100644
--- /dev/null
+++ b/rust/hgcli/src/hgext/mod.rs
@@ -0,0 +1,129 @@
+extern crate libc;
+
+pub mod base85;
+pub mod cpython_ext;
+
+use std;
+use std::{env, sync};
+use std::path::{PathBuf};
+use std::ffi::{CString, OsStr};
+use python27_sys as ffi;
+use cpython;
+
+#[cfg(target_family = "unix")]
+use std::os::unix::ffi::{OsStrExt};
+
+static HG_EXT_REG: sync::Once = sync::ONCE_INIT;
+
+#[no_mangle]
+pub fn init_all_hg_ext(_py: cpython::Python) {
+HG_EXT_REG.call_once(|| {
+unsafe {
+base85::initoxidized_base85();
+}
+});
+}
+
+#[derive(Debug)]
+pub struct Environment {
+_exe: PathBuf,
+python_exe: PathBuf,
+python_home: PathBuf,
+mercurial_modules: PathBuf,
+}
+
+// On UNIX, platform string is just bytes and should not contain NUL.
+#[cfg(target_family = "unix")]
+fn cstring_from_os>(s: T) -> CString {
+CString::new(s.as_ref().as_bytes()).unwrap()
+}
+
+#[cfg(target_family = "windows")]
+fn cstring_from_os>(s: T) -> CString {
+CString::new(s.as_ref().to_str().unwrap()).unwrap()
+}
+
+fn set_python_home(env: &Environment) {
+let raw = cstring_from_os(&env.python_home).into_raw();
+unsafe {
+ffi::Py_SetPythonHome(raw);
+}
+}
+
+static PYTHON_ENV_START: sync::Once = sync::ONCE_INIT;
+
+/// the second half initialization code are copied from rust-cpython
+/// fn pythonrun::prepare_freethreaded_python()
+/// because this function is called mainly by `cargo test`
+/// and the multi-thread nature requires to properly
+/// set up threads and GIL. In the corresponding version,
+/// prepare_freethreaded_python() is turned off, so the cargo
+/// test features must be properly called.
+pub fn set_py_env() {
+PYTHON_ENV_START.call_once(|| {
+let env = {
+let exe = env::current_exe().unwrap();
+
+let mercurial_modules = std::env::var("HGROOT").expect("must set 
mercurial's root folder (one layer above mercurial folder itself");
+
+let python_exe = std::env::var("HGRUST_PYTHONEXE").expect("set 
PYTHONEXE to the full path of the python.exe file");
+
+let python_home = std::env::var("HGRUST_PYTHONHOME").expect("if 
you don't want to use system one, set PYTHONHOME according to python doc");
+
+Environment {
+_exe: exe.clone(),
+python_exe: PathBuf::from(python_exe),
+python_home: PathBuf::from(python_home),
+mercurial_modules: PathBuf::from(mercurial_modules),
+}
+};
+
+//println!("{:?}", env);
+
+// Tell Python where it is installed.
+set_python_home(&env);
+
+// Set program name. The backing memory needs to live for the duration 
of the
+// interpreter.
+//
+// TODO consider storing this in a static or associating with lifetime 
of
+// the Python interpreter.
+//
+// Yes, we use the path to the Python interpreter not argv[0] here. The
+// reason is because Python uses the given path to find the location of
+// Python files. Apparently we could define our own ``Py_GetPath()``
+// implementation. But this may require statically linking Python, 
which is
+// not desirable.
+let program_name = cstring_from_os(&env.python_exe).as_ptr();
+unsafe {
+ffi::Py_SetProgramName(program_name as *mut i8);
+}
+
+unsafe {
+//ffi::Py_Initialize();
+
+if ffi::Py_IsInitialized() != 0 {
+// If Python is already initialized, we expect Python 
threading to also be initialized,
+// as we can't make the existing Python main thread acquire 
the GIL.
+assert!(ffi::PyEval_ThreadsInitialized() != 0);
+