D2057: rust implementation of hg status

2020-01-24 Thread baymax (Baymax, Your Personal Patch-care Companion)
This revision now requires changes to proceed.
baymax added a comment.
baymax requested changes to this revision.


  There seems to have been no activities on this Diff for the past 3 Months.
  
  By policy, we are automatically moving it out of the `need-review` state.
  
  Please, move it back to `need-review` without hesitation if this diff should 
still be discussed.
  
  :baymax:need-review-idle:

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D2057/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers, kevincox, baymax
Cc: quark, yuja, glandium, krbullock, indygreg, durin42, kevincox, 
mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-24 Thread kevincox (Kevin Cox)
kevincox added a comment.


  The latest changes are looking really good. I have a couple more comments but 
I didn't have time for a full review. I'll try to get more reviewed tomorrow. 
It seems that you still have a lot of stuff still in-flight so I'll try to 
slowly review the changes as I have time. If you want input/feedback on any 
particular part just ask and I will prioritize it.
  
  This change is very large so it might be worth splitting off a smaller 
component and getting that submitted first. However I do realize that for 
starting out it is often helpful to get some actual use cases implemented 
before committing the base structures.

INLINE COMMENTS

> Ivzhh wrote in base85.rs:23
> This crate is my previous try to integrate rust into hg. Right now I guess 
> mine main pursue is to add hg r-* commands for rust. I will follow your 
> suggestion when I am implementing the wire protocol and reuse the code for 
> pure rust crate.

I get that, but I still think it makes the code easier to read when the 
python-interop and the logic as separated where it is easy to do so.

> Ivzhh wrote in base85.rs:91
> when I removed the unsafe, I got error: error[E0133]: use of mutable static 
> requires unsafe function or block

I meant safe not as it it didn't need the unsafe keyword, but in that the use 
of the `unsafe` block is safe.

It should really be called the `trust_me,_I_know_this_is_safe` block. But since 
you are not getting the compiler checking it is often useful to add a comment 
why the action you are performing is correct. In this case it is correct 
because the caller initializes this variable before the function is called.

> base85.rs:105
> +}
> +};
> +

If this computation only depends on `len` it would be nice to put it in a 
helper function.

> main.rs:260
> +let res = repo.status();
> +for f in res.modified.iter() {
> +println!("M {}", f.to_str().unwrap());

These are `HashSet`'s which don't have a defined iterator order. IIRC the 
python implementation sorts the results which is probably desirable.

> config.rs:50
> +pub revlog_format: RevlogFormat,
> +/// where to get delta base, please refer to wiki page
> +pub delta: DeltaPolicy,

A link to the mentioned wiki page would be very helpful to readers.

> Ivzhh wrote in dirstate.rs:170
> I kind of get borrow check compile error here. Later I use Occupied() when 
> possible.

Sorry, I misunderstood the logic. You can do this:

  diff -r ccc683587fdb rust/hgstorage/src/dirstate.rs
  --- a/rust/hgstorage/src/dirstate.rs  Sat Mar 24 10:05:53 2018 +
  +++ b/rust/hgstorage/src/dirstate.rs  Sat Mar 24 10:14:58 2018 +
  @@ -184,8 +184,7 @@
   continue;
   }
   
  -if self.dir_state_map.contains_key(rel_path) {
  -let dir_entry = _state_map[rel_path];
  +if let Some(dir_entry) = 
self.dir_state_map.get(rel_path) {
   files_not_in_walkdir.remove(rel_path);
   DirState::check_status( res, abs_path, rel_path, 
dir_entry);
   } else if 
!matcher.check_path_ignored(rel_path.to_str().unwrap()) {

> dirstate.rs:103
> +if let Ok(meta_data) = meta_data {
> +let mtime = meta_data.modified().expect("default mtime must 
> exist");
> +

Switch the return type to `std::io::Result` and then you can have

  let metadata = p.metadata()?;
  let mtime = metadata.modified()?;
  // ...

> dirstate.rs:263
> +.unwrap()
> +.as_secs() != dir_entry.mtime as u64
> +{

Does it make sense to make `DirStateEntry.mtime` be a `std::time::SystemTime` 
and convert upon reading the structure in?

If not I would prefer doing the conversion here:

  else if mtd.modified().unwrap() == UNIX_EPOCH + 
Duration::from_secs(dir_entry.mtime as u64) {

(Maybe extract the system time to higher up, or even a helper function on 
dir_entry)

> Ivzhh wrote in local_repo.rs:136
> I think LRU will update reference count (or timestamp) when the data is 
> accessed.

Actually I didn't realize that RwLock doesn't get a regular `get()` since it is 
doing a compile time borrow check. 
https://doc.rust-lang.org/std/sync/struct.RwLock.html#method.get_mut. My 
mistake, the code is fine.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers, kevincox
Cc: quark, yuja, glandium, krbullock, indygreg, durin42, kevincox, 
mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-22 Thread yuja (Yuya Nishihara)
yuja added a comment.


  In https://phab.mercurial-scm.org/D2057#46980, @Ivzhh wrote:
  
  > In https://phab.mercurial-scm.org/D2057#46726, @yuja wrote:
  >
  > > >> I think the only place where you would need to do os-specific code is 
when
  > > >>  doing serialization and serialization
  > > > 
  > > > Yes, that will be feasible in strictly typed language like Rust.
  > >
  > > To be clear, I meant serialization/deserialization between filesystem 
path and
  > >  internal dirstate/manifest path, not between dirstate storage and 
in-memory
  > >  dirstate object.
  >
  >
  > I guess your suggestion is like this: @yuja
  >
  > 1. if it is windows and the code page is MBCS, try to decode the paths read 
from manifest and dirstate into unicode equivalent
  > 2. use utf internally and with rust IO api
  > 3. when writing back to dirstate and manifest, encode utf to MBCS
  
  
  No. My suggestion is:
  
  1. keep manifest/dirstate paths as bytes (which are probably wrapped by some 
type, say HgPath)
  2. but we want to use Rust's standard library for I/O
  3. so, add utility function/trait to convert HgPath to Path/PathBuf, where 
MBCS-Wide conversion will occur.
  
  I think raw byte paths will be needed to build store paths (e.g. 
`.hg/store/data/~2eclang-format.i`).
  
  https://www.mercurial-scm.org/repo/hg/file/4.5.2/mercurial/store.py

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers, kevincox
Cc: quark, yuja, glandium, krbullock, indygreg, durin42, kevincox, 
mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-21 Thread Ivzhh (Sheng Mao)
Ivzhh marked 4 inline comments as done.
Ivzhh added a comment.


  In https://phab.mercurial-scm.org/D2057#46726, @yuja wrote:
  
  > >> I think the only place where you would need to do os-specific code is 
when
  > >>  doing serialization and serialization
  > > 
  > > Yes, that will be feasible in strictly typed language like Rust.
  >
  > To be clear, I meant serialization/deserialization between filesystem path 
and
  >  internal dirstate/manifest path, not between dirstate storage and in-memory
  >  dirstate object.
  
  
  I guess your suggestion is like this: @yuja
  
  1. if it is windows and the code page is MBCS, try to decode the paths read 
from manifest and dirstate into unicode equivalent
  2. use utf internally and with rust IO api
  3. when writing back to dirstate and manifest, encode utf to MBCS
  
  Please let me if I have misunderstanding. Thank you!

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers, kevincox
Cc: quark, yuja, glandium, krbullock, indygreg, durin42, kevincox, 
mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-21 Thread Ivzhh (Sheng Mao)
Ivzhh marked 45 inline comments as done.
Ivzhh added inline comments.

INLINE COMMENTS

> kevincox wrote in base85.rs:23
> `` can only hold valid utf8 data? Does it make more sense to have `&[u8]` 
> here for a list of bytes?

It should be any &[u8], but the current cpython crate doesn't wrap for &[u8]. I 
think I need to fork and add that part. I put it in my checklist now.

> kevincox wrote in base85.rs:23
> Would it be possible to separate the decode from the python objects. I'm 
> thinking two helper functions.
> 
>   fn b85_required_len(text: ) -> usize
>   fn b85_encode(text: , pad: i32, out:  [u8]) -> Result<()>

This crate is my previous try to integrate rust into hg. Right now I guess mine 
main pursue is to add hg r-* commands for rust. I will follow your suggestion 
when I am implementing the wire protocol and reuse the code for pure rust crate.

> kevincox wrote in base85.rs:23
> IIUC pad is only ever checked `== 0`. Can it be made into a bool?

pad is a bool, however when I checked it in hg-python, int are passed to the 
function. I guess I need to update cpython wrapper for this, a more broad logic 
conversion.

> kevincox wrote in base85.rs:46
> Why the braces here?

I guess it is because NLL. When I started the work, rust compiler reported 
borrow check error on this part. I later read an article talking about NLL 
update in rust. But before that, I use the braces to avoid the error.

> kevincox wrote in base85.rs:91
> This is probably worth a comment that this is safe because D85DEC is required 
> to be initialized before this function is called.

when I removed the unsafe, I got error: error[E0133]: use of mutable static 
requires unsafe function or block

> indygreg wrote in main.rs:233-261
> This is definitely nifty and an impressive achievement \o/
> 
> The `r-` commands for testing pure Rust code paths are an interesting idea!
> 
> I think I'm OK with including support for this in `hgcli`. But I think the 
> code should live in a separate file so it doesn't pollute `main()`. And it 
> should be behind a Cargo feature flag so we maintain compatibility with `hg` 
> as much as possible by default.
> 
> Also, Mercurial's command line parser is extremely wonky and has some 
> questionable behavior. If the intent is to make `rhg` compatible with `hg`, 
> we would need to preserve this horrible behavior. We'll likely have to write 
> a custom argument parser because of how quirky Mercurial's argument parser is 
> :(

Thank you for the suggestion! I guess I need to extend clap later to support hg 
style command line. Right now whenever clap cannot handle the argument parsing, 
I will redirect the arguments to hg directly.

> kevincox wrote in changelog.rs:31
> If you aren't using the value I would prefer `truncate(NodeId::hex_len())`

I guess I will use the rest info later. hg seems put some meta data in the 
commit comments. I will keep it for now. Thank you!

> kevincox wrote in config.rs:78
> If you are just going to convert to String I would recommend taking a String 
> argument.
> 
> Also prefer `.to_owned()` over `.to_string()`.

I like to_owned(), I will them in later occasions. Thank you!

> indygreg wrote in config.rs:95
> I would not worry about supporting v0 or v2 at this time. v0 is only 
> important for backwards compatibility with ancient repos. And v2 never got 
> off the ground.

Sure, I will use v1 only for now. In the beginning I kinda over designed this 
part.

> kevincox wrote in dirstate.rs:48
> This could have a better name.

I remember the python hg uses the name, in the beginning, I tried to replicate 
py-hg's behaviour. But I think it needs to be renamed. I agree with you.

> kevincox wrote in dirstate.rs:108
> 1. Does this function need to be public? It seems internal to the constructor.
> 2. If it doesn't need to be I would prefer it return the Map so that you 
> don't have a partial-constructed DirState.

I think dir state needs to 1. read existing one; 2. create one if not exits; 
maybe private for now.

> kevincox wrote in dirstate.rs:152
> I would prefer doing the filter before the loop and storing it in a variable.

For the filter, I follow the example in the walkdir doc. I guess what I want is 
to skip the dir for later recursive visiting.

> kevincox wrote in dirstate.rs:161
> Please explain why you are ignoring the error condition.

I add the error handling back

> kevincox wrote in dirstate.rs:170
> You could do the following for a slight performance win and save a line.
> 
>   if let Occupied(entry) = self.dmap.entry(relpath) {
>  ...
>   }

I kind of get borrow check compile error here. Later I use Occupied() when 
possible.

> kevincox wrote in local_repo.rs:136
> Why does it need to be mutable to clone?

I think LRU will update reference count (or timestamp) when the data is 
accessed.

> kevincox wrote in matcher.rs:14
> Can you manage a `&[u8]` rather then pointer arithmetic for the whole string. 
> It will make me feel better and will probably 

D2057: rust implementation of hg status

2018-03-21 Thread Ivzhh (Sheng Mao)
Ivzhh updated this revision to Diff 7188.
Ivzhh added a comment.


  - add revlog and mpatch facilities
  - add changelog parsing
  - add manifest parsing
  - path encoding for data store
  - add dirstate and matcher facilities
  - add local repository and the supporting modules
  - use cargo fmt to format code
  - add hg r-status command
  - bincode 1.0.0 is a bit slow in my test
  - delay pattern matching during dir walk
  - optimize out trie and enable CoreXL profiling
  - use hashmap
  - remove thread pool
  - rust default read is not buf-ed, this is the key of slowness
  - change to globset
  - convert glob to regex
  - hg ignore patterns are all converted to regex (as hg does), and now it is 
faster
  - filter dir early to prevent walking
  - Update matcher mod after testing Mozilla unified repo
  - bug fix: use byte literals instead of numbers
  - hg store path encoding is per byte style, update code according to Kevin 
Cox's comments
  - update matcher testing according to Match interface change
  - If clap fails to recognize r-* subcommands, then run python-version hg
  - changelog coding style revised
  - remove legacy revlog v0 and unfinished v2.
  - partially revise the dirstate reviews
  - remove duplicated build.rs, let the executable module guarantee the python
  - use cursor in base85 encoding, reducing raw index-math
  - use cursor in base85 decoding, reducing raw index-math
  - dirstate update according to review comments
  - config update according to review comments
  - mpatch rename to more meaningful names
  - simplify matcher as when there is no syntax named in the beginning, use 
regexp
  - local repo coding style update
  - dirstate coding style update
  - manifest coding style update

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D2057?vs=6724=7188

BRANCH
  rust-hg-optimize (bookmark) on default (branch)

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

AFFECTED FILES
  rust/Cargo.lock
  rust/Cargo.toml
  rust/hgbase85/Cargo.toml
  rust/hgbase85/src/base85.rs
  rust/hgbase85/src/cpython_ext.rs
  rust/hgbase85/src/lib.rs
  rust/hgcli/Cargo.toml
  rust/hgcli/build.rs
  rust/hgcli/src/main.rs
  rust/hgstorage/Cargo.toml
  rust/hgstorage/src/changelog.rs
  rust/hgstorage/src/config.rs
  rust/hgstorage/src/dirstate.rs
  rust/hgstorage/src/lib.rs
  rust/hgstorage/src/local_repo.rs
  rust/hgstorage/src/manifest.rs
  rust/hgstorage/src/matcher.rs
  rust/hgstorage/src/mpatch.rs
  rust/hgstorage/src/path_encoding.rs
  rust/hgstorage/src/repository.rs
  rust/hgstorage/src/revlog.rs
  rust/hgstorage/src/revlog_v1.rs
  rust/hgstorage/src/working_context.rs

CHANGE DETAILS

diff --git a/rust/hgstorage/src/working_context.rs 
b/rust/hgstorage/src/working_context.rs
new file mode 100644
--- /dev/null
+++ b/rust/hgstorage/src/working_context.rs
@@ -0,0 +1,114 @@
+use std::path::PathBuf;
+use std::io::prelude::*;
+use std::fs;
+use std::collections::HashMap;
+use std::collections::HashSet as Set;
+use std::sync::{Arc, Mutex, RwLock};
+
+use threadpool::ThreadPool;
+use num_cpus;
+
+use dirstate::{CurrentState, DirState};
+use local_repo::LocalRepo;
+use manifest::{FlatManifest, ManifestEntry};
+use changelog::ChangeLog;
+
+pub struct WorkCtx {
+pub dirstate: Arc,
+pub file_revs: HashMap,
+}
+
+impl WorkCtx {
+pub fn new(
+dot_hg_path: Arc,
+manifest: Arc,
+changelog: Arc,
+) -> Self {
+let dirstate = match DirState::new(dot_hg_path.join("dirstate")) {
+Some(dir_state) => dir_state,
+None => {
+unimplemented!("creating dirstate is not supported yet.");
+}
+};
+
+let manifest_id = changelog.get_commit_info();
+
+let rev = manifest
+.inner
+.read()
+.unwrap()
+.node_id_to_rev(_id.manifest_id)
+.unwrap();
+
+let file_revs = manifest.build_file_rev_mapping();
+
+let dirstate = Arc::new(RwLock::new(dirstate));
+
+Self {
+dirstate,
+file_revs,
+}
+}
+
+pub fn status(, repo: ) -> CurrentState {
+let mut state = self.dirstate
+.write()
+.unwrap()
+.walk_dir(repo.repo_root.as_path(), )
+.unwrap();
+
+if !state.lookup.is_empty() {
+let ncpus = num_cpus::get();
+
+let nworkers = if state.lookup.len() < ncpus {
+state.lookup.len()
+} else {
+ncpus
+};
+
+let pool = ThreadPool::new(nworkers);
+
+let clean = Arc::new(Mutex::new(Set::new()));
+let modified = Arc::new(Mutex::new(Set::new()));
+
+for f in state.lookup.drain() {
+let rl = repo.get_filelog(f.as_path());
+let fl = Arc::new(repo.repo_root.join(f.as_path()));
+
+let (id, p1, p2) = {
+

D2057: rust implementation of hg status

2018-03-21 Thread kevincox (Kevin Cox)
kevincox added a comment.


  Ah, I forgot to consider the python interop. Now the need for that crate 
makes sense. Thanks for explaining.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers, kevincox
Cc: quark, yuja, glandium, krbullock, indygreg, durin42, kevincox, 
mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-21 Thread yuja (Yuya Nishihara)
yuja added a comment.


  >> I think the only place where you would need to do os-specific code is when
  >>  doing serialization and serialization
  > 
  > Yes, that will be feasible in strictly typed language like Rust.
  
  To be clear, I meant serialization/deserialization between filesystem path and
  internal dirstate/manifest path, not between dirstate storage and in-memory
  dirstate object.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers, kevincox
Cc: quark, yuja, glandium, krbullock, indygreg, durin42, kevincox, 
mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-20 Thread yuja (Yuya Nishihara)
yuja added a comment.


  > I think the only place where you would need to do os-specific code is when
  >  doing serialization and serialization
  
  Yes, that will be feasible in strictly typed language like Rust.
  
  > which I think should be handled by 
https://doc.rust-lang.org/std/os/unix/ffi/trait.OsStringExt.html
  >  and https://doc.rust-lang.org/std/os/windows/ffi/trait.OsStringExt.html.
  
  Not true for Windows because Rust uses Unicode (UTF-16-ish) API, whereas
  Python 2 does ANSI. We need to convert a "wide" string to a locale-dependent 
string.
  
  Maybe the local-encoding crate will do that for us?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers, kevincox
Cc: quark, yuja, glandium, krbullock, indygreg, durin42, kevincox, 
mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-20 Thread kevincox (Kevin Cox)
kevincox added a comment.


  I'm not a windows expert but it seems like the rust OsStr, Path and 
filesystem APIs should handle these conversions for you. I think the only place 
where you would need to do os-specific code is when doing serialization and 
serialization which I think should be handled by 
https://doc.rust-lang.org/std/os/unix/ffi/trait.OsStringExt.html and 
https://doc.rust-lang.org/std/os/windows/ffi/trait.OsStringExt.html.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers, kevincox
Cc: quark, yuja, glandium, krbullock, indygreg, durin42, kevincox, 
mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-20 Thread quark (Jun Wu)
quark added a comment.


  https://crates.io/crates/local-encoding seems to be the right choice.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers, kevincox
Cc: quark, yuja, glandium, krbullock, indygreg, durin42, kevincox, 
mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-10 Thread yuja (Yuya Nishihara)
yuja added a comment.


  > I am looking at Mozilla's rust winapi bindings, let me see if I can 
directly wrap around winapi::um::fileapi::FindFirstFileA 

  
  That's probably a hard way. I was thinking of something converting
  between OsStr (i.e. Path) and MBCS bytes by using Win32 API, instead
  of calling out the "A" API.
  
  
https://msdn.microsoft.com/en-us/library/windows/desktop/dd319072(v=vs.85).aspx
  
  We don't do that in Python, but Rust's type system will help making it right.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers, kevincox
Cc: yuja, glandium, krbullock, indygreg, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-09 Thread Ivzhh (Sheng Mao)
Ivzhh added a comment.


  In https://phab.mercurial-scm.org/D2057#44269, @yuja wrote:
  
  > >> Reading that page it seems to claim that filenames should be utf8, not 
bytes. If utf8, this is what the code does, but if it is bytes that definitely 
won't work.
  > > 
  > > IIRC it's bytes everyplace except Windows, where we pretend utf8 is real?
  >
  > It's MBCS (i.e. ANSI multi-byte characters) on Windows. The plain was to 
support
  >  both MBCS and UTF-8-variant on Windows, but that isn't a thing yet.
  >
  > Perhaps we'll have to write a platform compatibility layer (or 
serialization/deserialization
  >  layer) on top of the Rust's file API, something like vfs.py we have in 
Python code.
  
  
  Thank you for confirming that, I am a bit confusing when I read Encoding Plan 
wiki page. I am looking at Mozilla's rust winapi bindings, let me see if I can 
directly wrap around winapi::um::fileapi::FindFirstFileA 


REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers, kevincox
Cc: yuja, glandium, krbullock, indygreg, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-09 Thread yuja (Yuya Nishihara)
yuja added a comment.


  >> Reading that page it seems to claim that filenames should be utf8, not 
bytes. If utf8, this is what the code does, but if it is bytes that definitely 
won't work.
  > 
  > IIRC it's bytes everyplace except Windows, where we pretend utf8 is real?
  
  It's MBCS (i.e. ANSI multi-byte characters) on Windows. The plain was to 
support
  both MBCS and UTF-8-variant on Windows, but that isn't a thing yet.
  
  Perhaps we'll have to write a platform compatibility layer (or 
serialization/deserialization
  layer) on top of the Rust's file API, something like vfs.py we have in Python 
code.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers, kevincox
Cc: yuja, glandium, krbullock, indygreg, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-08 Thread Ivzhh (Sheng Mao)
Ivzhh added a comment.


  Hi everyone,
  
  Thank you for your encouragements and comments! I will follow up with all 
comments and update the code soon.
  
  @indygreg It is a great idea to test on Mozilla repo, actually I found 
several things interesting:
  
  1. I found a bug in my code (shame on me): because I did not use byte 
literal, and I made a typo. This triggers problem in Mozilla unified repo
  2. A regexp pattern in hgignore in Mozilla unified repo is not supported by 
rust's regex crate, a.k.a. "(?!)". I choose to ignore these unsupported 
patterns.
  3. My version is slower in this repo: 70s (hg) and 90s (mine). CodeXL reveals 
that the mpatch::collect() function uses 63% of the running time. I think I 
need to optimize it somehow.
  
  I totally agree with @kevincox that I did not sort well on 
char/u8/str/String/Path/PathBuf. The first bug is caused by this. I need to 
improve them.
  
  Thank you everyone!

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers, kevincox
Cc: glandium, krbullock, indygreg, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-08 Thread kevincox (Kevin Cox)
kevincox added a comment.


  Rust has platform independent types `PathBuf` 
 and `` 
 for paths and 
`OsString`  and 
``  for 
strings (owned and references respectively. They do have os-specific extensions 
but as long as you don't use them it should be cross platform. That being said, 
if you are serializing and deserializing them you may need to write some 
platform dependant code.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers, kevincox
Cc: glandium, krbullock, indygreg, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-08 Thread indygreg (Gregory Szorc)
indygreg added a comment.


  In https://phab.mercurial-scm.org/D2057#43989, @durin42 wrote:
  
  > In https://phab.mercurial-scm.org/D2057#43988, @kevincox wrote:
  >
  > > In https://phab.mercurial-scm.org/D2057#43987, @durin42 wrote:
  > >
  > > > Mercurial tries to be principled about always treating filenames as 
bytes. AIUI https://www.mercurial-scm.org/wiki/WindowsUTF8Plan is still the 
plan of record there?
  > >
  > >
  > > Reading that page it seems to claim that filenames should be utf8, not 
bytes. If utf8, this is what the code does, but if it is bytes that definitely 
won't work.
  >
  >
  > IIRC it's bytes everyplace except Windows, where we pretend utf8 is real?
  >
  > We may have to make adjustments to this plan on macOS with APFS, but I'm 
not sure about that yet.
  
  
  I think we want to express a path as a dedicated type which has different 
underlying storage depending on the platform (bytes on Linux, UTF-16 on 
Windows). All filesystem operations should take a `Path` instance to operate 
on. This is the only way to cleanly round trip filenames between the OS, the 
application, and back to the OS. That leaves us with the hard problem of 
normalizing Mercurial's storage representation of paths (bytes) with the 
operating system's. But this world is strictly better than today, where we lose 
path data from the OS because we use POSIX APIs.
  
  FWIW, Python 3 rewrote the I/O layer to use Win32 APIs everywhere. Combined 
with the `pathlib` types, I'm pretty sure Python 3 can round trip paths on 
Windows. I also think Rust's path type(s) have OS-dependent functionality.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers, kevincox
Cc: glandium, krbullock, indygreg, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-08 Thread durin42 (Augie Fackler)
durin42 added a comment.


  In https://phab.mercurial-scm.org/D2057#43988, @kevincox wrote:
  
  > In https://phab.mercurial-scm.org/D2057#43987, @durin42 wrote:
  >
  > > Mercurial tries to be principled about always treating filenames as 
bytes. AIUI https://www.mercurial-scm.org/wiki/WindowsUTF8Plan is still the 
plan of record there?
  >
  >
  > Reading that page it seems to claim that filenames should be utf8, not 
bytes. If utf8, this is what the code does, but if it is bytes that definitely 
won't work.
  
  
  IIRC it's bytes everyplace except Windows, where we pretend utf8 is real?
  
  We may have to make adjustments to this plan on macOS with APFS, but I'm not 
sure about that yet.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers, kevincox
Cc: glandium, krbullock, indygreg, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-08 Thread kevincox (Kevin Cox)
kevincox added a comment.


  In https://phab.mercurial-scm.org/D2057#43987, @durin42 wrote:
  
  > Mercurial tries to be principled about always treating filenames as bytes. 
AIUI https://www.mercurial-scm.org/wiki/WindowsUTF8Plan is still the plan of 
record there?
  
  
  Reading that page it seems to claim that filenames should be utf8, not bytes. 
If utf8, this is what the code does, but if it is bytes that definitely won't 
work.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers, kevincox
Cc: glandium, krbullock, indygreg, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-08 Thread durin42 (Augie Fackler)
durin42 added a comment.


  In https://phab.mercurial-scm.org/D2057#43892, @kevincox wrote:
  
  > On a higher level, all of these code appears to be treating file names as 
strings. This isn't really true and will disallow some valid file names. Maybe 
we should stick with bytes throughout. Of course this makes windows filepaths 
difficult because they are actually (utf16) strings.
  
  
  Mercurial tries to be principled about always treating filenames as bytes. 
AIUI https://www.mercurial-scm.org/wiki/WindowsUTF8Plan is still the plan of 
record there?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers, kevincox
Cc: glandium, krbullock, indygreg, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-08 Thread kevincox (Kevin Cox)
kevincox requested changes to this revision.
kevincox added a comment.
This revision now requires changes to proceed.


  I will try to finish the review later, but it might be quicker if you 
incorporate some of the changes first since a lot of them are repeated many 
times. Overall it looks good, there are a couple of things that i would 
highlight to make the code easier to read.
  
  - Prefer more descriptive variable names.
  - If you can, avoid "pointer" arithmetic. Cursors and slices are nice and 
have great convenience methods.
  - Group your flow control and filtering more.
  - Try to keep your types straight. In rust using the right types helps catch 
errors. So be aware of `char` vs `u8` vs `String` vs `Vec` vs `Vec`.
  
  On a higher level, all of these code appears to be treating file names as 
strings. This isn't really true and will disallow some valid file names. Maybe 
we should stick with bytes throughout. Of course this makes windows filepaths 
difficult because they are actually (utf16) strings.

INLINE COMMENTS

> indygreg wrote in build.rs:1
> I see this file was copied. There's nothing wrong with that. But does this 
> mean we will need a custom build.rs for each Rust package doing Python? If 
> that's the case, then I would prefer to isolate all our rust-cpython code to 
> a single package, if possible. I'm guessing that could be challenging due to 
> crossing create boundaries. I'm sure there are placed where we don't want to 
> expose symbols outside the crate.
> 
> I'm curious how others feel about this.

If this is going to be reused I would move it into it's own crate. It seems 
like everything here could be boiled down to a single function call in main.

> base85.rs:21
> +});
> +}
> +

I prefer something like this: 
https://play.rust-lang.org/?gist=5ca18a5b95335600e911b8f9310ea5c7=stable

I doubt lazy_static is too slow. Otherwise we can stay like this until const 
functions get implemented.

Either way note:

- I changed the type of B85CHARS to an array as opposed to an array ref.
- The loop condition is much nicer.

> base85.rs:23
> +
> +pub fn b85encode(py: Python, text: , pad: i32) -> PyResult {
> +let text = text.as_bytes();

Would it be possible to separate the decode from the python objects. I'm 
thinking two helper functions.

  fn b85_required_len(text: ) -> usize
  fn b85_encode(text: , pad: i32, out:  [u8]) -> Result<()>

> base85.rs:23
> +
> +pub fn b85encode(py: Python, text: , pad: i32) -> PyResult {
> +let text = text.as_bytes();

`` can only hold valid utf8 data? Does it make more sense to have `&[u8]` 
here for a list of bytes?

> base85.rs:23
> +
> +pub fn b85encode(py: Python, text: , pad: i32) -> PyResult {
> +let text = text.as_bytes();

IIUC pad is only ever checked `== 0`. Can it be made into a bool?

> base85.rs:45
> +
> +let mut ptext = [..];
> +let mut len = { ptext.len() };

`ptext` isn't very descriptive.

> base85.rs:46
> +let mut ptext = [..];
> +let mut len = { ptext.len() };
> +let mut dst_off: usize = 0;

Why the braces here?

> base85.rs:47
> +let mut len = { ptext.len() };
> +let mut dst_off: usize = 0;
> +

I suspect this type annotation isn't required.

> base85.rs:47
> +let mut len = { ptext.len() };
> +let mut dst_off: usize = 0;
> +

It might be best to use a `std::io::Cursor` 
 and let it drack 
`dst_off` for your.

> base85.rs:52
> +break;
> +}
> +

while !ptext.is_empty()

> base85.rs:54
> +
> +let mut acc: u32 = 0;
> +

I would prefer the name `chunk` or even `accum` is a lot mode obvious to me 
than `acc`.

> base85.rs:56
> +
> +for i in [24, 16, 8, 0].iter() {
> +let ch = ptext[0] as u32;

for i in &[24, 16, 8, 0]

> base85.rs:58
> +let ch = ptext[0] as u32;
> +acc |= ch << i;
> +

I would just combine these into one line as the name `ch` isn't adding much.

> base85.rs:63
> +
> +if len == 0 {
> +break;

Tracking len manually is a smell. Why not drop it and use `ptest.is_empty()`.

> base85.rs:91
> +pub fn b85decode(py: Python, text: ) -> PyResult {
> +let b85dec = unsafe { B85DEC };
> +

This is probably worth a comment that this is safe because D85DEC is required 
to be initialized before this function is called.

> base85.rs:152
> +
> +acc *= 85;
> +

Let rust do the overflow checking.

  acc = acc.checked_mul(85)
  .ok_or_else(|| {
  PyErr::new::(
  py,
  format!("bad base85 character at position {}", i))
  })?;

> changelog.rs:24
> +
> +assert_eq!(content[NodeId::hex_len()], '\n' as u8);
> +

Passing a message as a third argument is really useful.

> changelog.rs:31
> +
> +content.drain(..NodeId::hex_len());
> +

If you aren't using the value I would prefer `truncate(NodeId::hex_len())`

> changelog.rs:33
> +
> +let msg 

D2057: rust implementation of hg status

2018-03-07 Thread glandium (Mike Hommey)
glandium added a comment.


  Doesn't mononoke have code to read revlogs already?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers
Cc: glandium, krbullock, indygreg, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-07 Thread indygreg (Gregory Szorc)
indygreg added a comment.


  First of all - wow! Thanks for writing all this code. There's definitely a 
lot to work with. And work with it we will!
  
  This is definitely too big to review as one commit. If you could do *any* 
work to split it up, it would be greatly appreciated. I'd focus on the pure 
Rust pieces first. Everything needed to open revlogs would be great!
  
  You may find our custom Phabricator extensions (linked from 
https://www.mercurial-scm.org/wiki/Phabricator) useful for submitting series of 
commits to Phabricator.
  
  Regarding the performance, that's pretty good! The dirstate code is some of 
the most optimized code in Mercurial. There are some gnarly Python C hacks to 
make it fast. Some of those tricks involve using special system calls to walk 
directories to minimize the number of system calls. I'm not sure if the crate 
you imported has those optimizations. (I wouldn't be surprised either way.) I 
wouldn't worry too much about performance at this juncture. But I suspect we 
could make the Rust code another 50% faster with some tweaking. It would also 
be interesting to test on a larger repo, say 
https://hg.mozilla.org/mozilla-unified. Also, I believe there are hooks in the 
dirstate code to use Watchman (fsmonitor). Those hooks are critical in order to 
achieve peak performance on large repositories.
  
  Since you seem to be proficient at writing lots of Rust code, if you are 
looking for another project, may I suggest porting `chg` to Rust? That code is 
in `contrib/chg`. That might be the easiest component to actually ship in Rust 
since it is a standalone binary that doesn't link against Python. But we 
shouldn't get ahead of ourselves :)
  
  Anyway, it is late for me and I need to detach from my computer. I'm sure 
others will have things to say as well...

INLINE COMMENTS

> build.rs:1
> +// build.rs -- Configure build environment for `hgcli` Rust package.
> +//

I see this file was copied. There's nothing wrong with that. But does this mean 
we will need a custom build.rs for each Rust package doing Python? If that's 
the case, then I would prefer to isolate all our rust-cpython code to a single 
package, if possible. I'm guessing that could be challenging due to crossing 
create boundaries. I'm sure there are placed where we don't want to expose 
symbols outside the crate.

I'm curious how others feel about this.

> main.rs:233-261
> +let matches = clap::App::new("hg rust oxidation")
> +.arg(
> +clap::Arg::with_name("repository")
> +.short("c")
> +.long("repository")
> +.value_name("dash_r"),
> +)

This is definitely nifty and an impressive achievement \o/

The `r-` commands for testing pure Rust code paths are an interesting idea!

I think I'm OK with including support for this in `hgcli`. But I think the code 
should live in a separate file so it doesn't pollute `main()`. And it should be 
behind a Cargo feature flag so we maintain compatibility with `hg` as much as 
possible by default.

Also, Mercurial's command line parser is extremely wonky and has some 
questionable behavior. If the intent is to make `rhg` compatible with `hg`, we 
would need to preserve this horrible behavior. We'll likely have to write a 
custom argument parser because of how quirky Mercurial's argument parser is :(

> config.rs:95
> +} else {
> +RevlogFormat::V0
> +}

I would not worry about supporting v0 or v2 at this time. v0 is only important 
for backwards compatibility with ancient repos. And v2 never got off the ground.

> revlog_v1.rs:279
> +let mut fhandle = BufReader::new(match self.dflag {
> +Inline => fs::File::open(self.index_file.as_path()).unwrap(),
> +Separated(ref dfile) => fs::File::open(dfile).unwrap(),

IIRC, core Mercurial keeps an open file handle on revlogs and ensures we don't 
run out of file handles by not keeping too many revlogs open at the same time. 
For scanning operations, not having to open and close the file handles all the 
time will make a difference for performance. Also, core Mercurial loads the 
entirety of the `.i` file into memory. That's a scaling problem for large 
revlogs. But it does make performance of index lookups really fast.

> revlog_v1.rs:290-293
> +while let Some(ref chld_r) = it.next() {
> +if let Some(bin) = self.get_content( fhandle, chld_r) {
> +bins.push(bin);
> +} else {

A thread pool to help with zlib decompression should go a long way here.

Probably too early to think about this, but we'll likely eventually want a 
global thread pool for doing I/O and CPU expensive tasks, such as reading 
chunks from a revlog and decompressing them.

FWIW, we're going to radically alter the storage format in order to better 
support shallow clones. But that work hasn't started yet. I still think there 
is a benefit to implementing the revlog code in Rust 

D2057: rust implementation of hg status

2018-03-07 Thread Ivzhh (Sheng Mao)
Ivzhh added a comment.


  Hi all,
  
  Based on the discussion a few weeks ago, I come up with a solution for review 
and discussion. After reading the Oxidation plan, the first thought is to 
bypass python engine and current plugin system IFF on request. If user (maybe 
background checker of IDE) request r-* subcommands, then hg gives rust 
implementations instead of python's. So I try to make hg r-status as fast as 
possible. The submitted version has comparable performance (as an example of 
the performance, not evidence, on my MacBook, in hg's own repo, hg r-status 
150ms, and hg status 220ms). I am using CodeXL to profile the performance, and 
plan to use Future.rs to make the loading parallel and maybe 30ms faster.
  
  The implementation originates from hg python implementation, because the 
python version is really fast. I tried to split into small changes, however, I 
eventually to combine all hgstorage module as one commit.
  
  Thank you for your comments!

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers
Cc: krbullock, indygreg, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel